Deep Learning in Medical Image Processing and Analysis
Deep Learning in Medical Image Processing and Analysis
Book Series Editor: Professor Joel J.P.C. Rodrigues, College of Computer Science and
Technology, China University of Petroleum (East China), Qingdao, China; Senac Faculty of
Ceará, Fortaleza-CE, Brazil and Instituto de Telecomunicações, Portugal
Book Series Advisor: Professor Pranjal Chandra, School of Biochemical Engineering, Indian
Institute of Technology (BHU), Varanasi, India
While the demographic shifts in populations display significant socio-economic challenges, they
trigger opportunities for innovations in e-Health, m-Health, precision and personalized
medicine, robotics, sensing, the Internet of things, cloud computing, big data, software defined
networks, and network function virtualization. Their integration is however associated with
many technological, ethical, legal, social, and security issues. This book series aims to
disseminate recent advances for e-health technologies to improve healthcare and people’s
wellbeing.
Topics considered include intelligent e-Health systems, electronic health records, ICT-enabled
personal health systems, mobile and cloud computing for e-Health, health monitoring,
precision and personalized health, robotics for e-Health, security and privacy in e-Health,
ambient assisted living, telemedicine, big data and IoT for e-Health, and more.
To download our proposal form or find out more information about publishing with us, please
visit https://www.theiet.org/publishing/publishing-with-iet-books/.
Please email your completed book proposal for the IET Book Series on e-Health Technologies
to: Amber Thomas at [email protected] or [email protected].
Deep Learning in Medical
Image Processing and
Analysis
Edited by
Khaled Rabie, Chandran Karthik, Subrata Chowdhury
and Pushan Kumar Dutta
This publication is copyright under the Berne Convention and the Universal Copyright
Convention. All rights reserved. Apart from any fair dealing for the purposes of research
or private study, or criticism or review, as permitted under the Copyright, Designs and
Patents Act 1988, this publication may be reproduced, stored or transmitted, in any
form or by any means, only with the prior permission in writing of the publishers, or in
the case of reprographic reproduction in accordance with the terms of licences issued
by the Copyright Licensing Agency. Enquiries concerning reproduction outside those
terms should be sent to the publisher at the undermentioned address:
While the authors and publisher believe that the information and guidance given in this
work are correct, all parties must rely upon their own skill and judgement when making
use of them. Neither the author nor publisher assumes any liability to anyone for any
loss or damage caused by any error or omission in the work, whether such an error or
omission is the result of negligence or any other cause. Any and all such liability is
disclaimed.
The moral rights of the author to be identified as author of this work have been
asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
4.6.7 Bio-signals 62
4.7 Application of deep learning in medical image processing
and analysis 62
4.7.1 Segmentation 62
4.7.2 Classification 63
4.7.3 Detection 63
4.7.4 Deep learning-based tracking 64
4.7.5 Using deep learning for image reconstruction 65
4.8 Training testing validation of outcomes 69
4.9 Challenges in deploying deep learning-based solutions 70
4.10 Conclusion 72
References 73
Index 345
About the editors
Over the past few decades, dental care has made tremendous strides. Recent
scientific discoveries and diagnostic tools have allowed for a sea change in the
practice of conventional dentistry. Medical imaging techniques, including X-rays,
MRIs, ultrasounds, mammograms, and CT scans, have come a long way in helping
doctors diagnose and treat a wide range of illnesses in recent decades. Machines
now may imitate human intellect via a process called artificial intelligence (AI), in
which they can learn from data and then act on those learnings to produce outcomes
(AI). AI has several potential applications in the healthcare industry. The use of AI
in dentistry could improve efficiency and lower expenses while decreasing the need
for specialists and the likelihood of mistakes being made by healthcare providers.
Diagnosis, differential diagnosis, imaging, management of head and neck diseases,
dental emergencies, etc. are just some of the many uses of AI in the dental sciences.
While it is clear that AI will not ever be able to fully replace dentists, understanding
how this technology might be used in the future is crucial. Orofacial disorders may
be diagnosed and treated more effectively as a result of this. A doctor’s diagnostic
ability and outcome may be jeopardized by factors like increased workload, the
complexity of work, and possible fatigue. Including AI features and deep learning
in imaging equipment would facilitate greater productivity while simultaneously
decreasing workload. Furthermore, they can detect various diseases with greater
accuracy than humans and have access to a plethora of data that humans lack.
Recent AI advancements and deep learning by use of image analysis in pathology
and possible applications in the future were discussed in this chapter.
1
Faculty of Oral Pathology, Department of OMFS and Diagnostic Sciences, Riyadh Elm University,
Kingdom of Saudi Arabia
2
Department of Oral Pathology and Microbiology, People’s College of Dental Sciences and Research
Center, People’s University – Bhopal, India
3
Department of Oral Pathology and Microbiology, Rishiraj College of Dental Sciences – Bhopal, India
4
Oral Medicine and Radiology, People’s College of Dental Sciences and Research Center, People’s
University – Bhopal, India
2 Deep learning in medical image processing and analysis
1.1 Introduction
There has been talk of a 48% annual increase in the volume of medical records. In
light of the data deluge and the difficulties in making good use of it to enhance
patient care, several different AI- and ML-based solutions are now in development
(machine learning (ML)). The field of artificial intelligence (AI) known as ML has
the potential to give computers human-level intelligence by allowing them to learn
independently from experience without any human intervention or programming
[1]. Thus, AI is defined as the subfield of computer science whose central research
interest is the development of intelligent computers able to execute activities that
traditionally have required human intellect [2].
Researchers all across the globe are fascinated by the prospect of creating arti-
ficially intelligent computers that can learn and reason like humans. Even though it is
an application in dentistry is still relatively new, it is already producing impressive
outcomes. We have to go back as far as 400 BC when Plato envisioned a vital model
of brain function [3]. AI has had a significant influence in recent years across many
areas of dentistry, but notably oral pathology. When used in dentistry, AI has the
potential to alleviate some of the difficulties currently associated with illness detec-
tion and prognosis forecasting. An AI system is a framework that can learn from
experience, make discoveries, and produce outcomes via the application of knowl-
edge it has gleaned [4]. The first stage of AI is called “training,” and the second is
called “testing.” In order to calibrate the model, it is first fed its training data. The
model backtracks and utilizes historical instances, such as patient data or data with a
variety of other examples. These settings are then used on the test data [3]. Oral
cancer (OC) prognostic variables, documented in a number of research, may be
identified using AI and a panel of diverse biomarkers. Successful treatment and
increased chances of survival both benefit from detecting malignant lesions as soon
as possible [5,6]. Image analysis of smartphone-based OC detectors based on AI
algorithms has been the subject of several investigations. AI aids in OC patients’
diagnosis, treatment, and management. By simplifying complicated data and alle-
viating doctors’ weariness, AI facilitates quicker and more accurate diagnoses [7,8].
The word “AI” may have a certain meaning, but it really encompasses a vast
variety of methods. For instance, deep learning (DL) aims to model high-level
abstractions in medical imaging and infer diagnostic interpretations. Keep in
mind that “AI” covers a wide range of technologies, including both classic
“classical” machine learning and more recent “deep” forms of the same. Through
the use of pre-programmed algorithms and data recognition procedures, conven-
tional machine learning provides a quantitative judgment on the lesion’s type and
behavior as a diagnostic result [8]. Supervised and unsupervised approaches are
subcategories of classic machine learning techniques. The diagnostic input is
checked against the model’s ground truth, which is validated by the training data
and outputs in the supervised method. Unsupervised methods, on the other hand,
are machine learning models that are not based on a set of predetermined values,
and therefore they use techniques like data extraction and mining to discover
hidden patterns in the data or specimen being studied. Using nonlinear processing
Diagnosing and imaging in oral pathology 3
“In order to detect nodal metastasis and tumor extra-nodal extension involvement,
Kann et al. used deep-learning computers to a dataset of 106 OC patients [8]. The
data set included 2875 lymph node samples that were segmented using computed
tomography (CT). Here, we investigated how useful a deep-learning model may be
in improving the treatment of head and neck cancer. Deep neural networks (DNN)
were rated more accurate with an AUC of 0.91.” Measurements of areas under the
receiver operating characteristic (ROC) curve (AUC) may be made in two dimen-
sions. Similar results were found by Chang et al., who used AI trained on genomic
markers to predict the presence of OC with an AUC of 0.90 [29]. The research
compared AI using a logistic regression analysis. There was a significant lack of
statistical power since the study only included 31 participants. Future research
should include a bigger sample size [3].
Cancer research has long made use of ML. Evidence linking ML to cancer
outcomes has grown steadily over the previous two decades. Typically, these
investigations use gene expression patterns, clinical factors, and histological data as
inputs to the prognostic process [30]. There are three main areas of focus in cancer
prognosis and prediction: (i) the prediction of cancer susceptibility (risk assess-
ment), (ii) the prediction of cancer recurrence, and (iii) the possibility of redeve-
loping a kind of cancer after full or partial remission Predictions might be made
using large-scale data including ancestry, age, nutrition, body mass index, high-risk
behaviors, and environmental carcinogen exposure. However, there is not enough
data on these characteristics to make sound judgments. It has become clear that new
types of molecular information based on molecular biomarkers and cellular char-
acteristics are very useful indicators for cancer prognosis, thanks to the develop-
ment of genomic, proteomic, and imaging technologies. Research shows that
combining clinicopathologic and genetic data improves cancer prediction findings.
The OC prognosis study conducted by Chang et al. employed a hybrid approach,
including feature selection and ML methods. Both clinicopathologic and genetic
indications were shown to be related to a better prognosis, as shown by their
research. [29] Exarchos et al. set out to identify the factors that influence the pro-
gression of oral squamous cell carcinoma so that they might predict future recur-
rences. They pushed for a multiparametric decision support system that takes into
account data from a variety of fields, such as clinical data, imaging results, and
genetic analysis. This study clearly demonstrated how data from several sources
may be integrated using ML classifiers to provide accurate results in the prediction
of cancer recurrence [31]. Integrating multidimensional heterogeneous data and
using various methodologies might provide useful inference tools in the cancer
area, as has become obvious [23].
factors and visual examinations are used together, community health professionals
may almost half the death rate from OC in high-risk populations. Cost-
effectiveness analysis has shown that this kind of screening is beneficial in those
at high risk for OC. There was no effect on mortality, morbidity, or cost in previous
large-scale OC screening investigations. Despite the fact that conventional OC
screening should be beneficial in LMICs, substantial populations in sectors with a
high OC risk in LMICs often lack access to healthcare, necessitating alternative
techniques adapted to the specific restrictions and features of each region. OC
screening accuracy may be improved with the use of many AI-based algorithms
and methodologies that have emerged in the recent decade. They may be as
effective and accurate as traditional methods of screening, if not more so, while
eliminating the requirement for highly trained and regularly retrained human
screeners. In 1995, researchers began using AI to predict who would develop OC.
Researchers found that a trained ANN could identify oral lesions with a sensitivity
of 0.80 and a specificity of 0.77 [32]. Subsequent research confirmed that by
screening only 25% of the population with this method, high-risk people could be
identified and 80% of lesions could be detected. In 2010, researchers conducted a
case-control study that compared the accuracy of prediction models based on fuzzy
regression and fuzzy neural networks to that of professional doctors.
AI’s ability to facilitate remote healthcare interactions has the potential to
increase the speed with which screenings may be implemented, which is particu-
larly important in LMICs, where their influence is most felt. The potential of AI as
a tool for remote oral screening has been underlined in recent years, and there has
been a surge of interest in AI-based telehealth applications. For high-risk popula-
tions in areas with few resources, researchers at many institutions worked together
to create a very affordable smartphone-based OC probe using deep learning
[13,25,33]. Images of autofluorescence and polarization captured by the test, as
well as OSCC risk variables, were analyzed using an innovative DL-based algo-
rithm to provide an evaluative output that provides emergency data for the screener.
In the most important clinical preliminary, 86% of cases were found to match the
screening calculation and the highest quality level result. After being primed fur-
ther, the calculation’s overall awareness, explicitness, positive predictive value,
and negative predictive incentive for spotting intraoral sores all increased from
81% to 95%. The accuracy of automated screening was estimated to be over 85% in
a variety of studies, which is much higher than the accuracy of traditional screening
by community health workers. These findings are quite promising, especially in
light of the fact that there will be 3.5 billion mobile phone users worldwide by
2020. Particularly important in underserved and rural areas, studies like these show
that non-expert medical care providers like attendants, general specialists, dental
hygienists, and local area well-being laborers can effectively screen patients using
man-made intelligence-supported applications integrated into cell phones. When it
comes to intraoral photos of mucosal sores taken with a smartphone, the degree of
concordance between the image and the clinical evaluation is considered to be
moderate to high, whereas it is lower for low-goal shots. Nonetheless, supported AI
at low cost and cell phone-based innovations for early screening of oral wounds
8 Deep learning in medical image processing and analysis
could serve as a viable and reasonable approach to reducing delays in the master and
clinical consideration framework and allowing patients to be triaged to seek okay and
ideal treatment. Using a separate approach taking soft inducing into account, intraoral
photographs of OSCC, leukoplakia, and lichen planus wounds were correctly iden-
tified as OSCC and lichen planus bruises 87% of the time, and lichen planus wounds
70% of the time. Similarly, deep convolutional neural network (DCNN) models
achieved performance levels comparable to human professionals in recognizing OC’s
first phases when trained on a small collection of pictures of tongue sores. A newly
designed robotized DL technique, trained on 44,409 pictures of biopsy-proven OSCC
lesions and solid mucosa, produced an AUC of 0.983 (95% CI 0.973–0.991), with
a responsiveness of 94.9% and an explicitness of 88.7% on the internal approval
dataset.
In an early work by van Staveren et al. [34], autofluorescence spectra were taken
from 22 oral leukoplakia lesions and 6 healthy mucosal areas to see how well an
ANN-based ordering algorithm performed. According to the published data, ANN
exhibits 86% responsiveness and 100% explicitness when analyzing phantom pic-
tures of healthy and sick tissues [33,34]. Wang et al. [35] autofluorescence spectra of
premalignant (epithelial dysplasia) and harmful (SCC) sores were separated from
those of benign tissues using a half-way least squares and artificial neural network
(PLS-ANN) order calculation, with 81% responsiveness, 96% explicitness, and 88%
positive prescient value achieved. As explained by others, using an ANN classifier as
an exploration method may lead to high levels of responsiveness (96.5% or more) and
explicitness (100% particularity). De Veld et al.’s investigation on autofluorescence
spectra for sore order using ANN indicated that although the approach was effective
in differentiating healthy mucosa from disease, it was less effective in differentiating
benign tissue from premalignant sores. By performing 8 solid, 16 leukoplakia, and 23
OSCC tests utilizing Fourier-transform infrared spectroscopy (FTIR) spectroscopy on
paraffin-inserted tissue slices, we were able to develop an SVM-based strategy for
diagnosing oral leukoplakia and OSCC based on the biomarker choice. It was stated
that the authors had success in locating discriminating spectral markers that indicated
significant bio-molecular alterations on both the qualitative and quantitative levels
and that these markers were useful in illness classification. The malignant areas’
boundaries were also accurately delineated in both the positive- and negative-ion
modes by an ML-based diagnostic algorithm for head and neck SCC utilizing mass
spectra, with accuracies of 90.48% and 95.35%, respectively. A DCCN-based algo-
rithm was recently examined for its ability to detect OC in hyperspectral pictures
taken from individuals diagnosed with the disease. When comparing photographs of
cancerous and benign oral tissues, researchers found a classification accuracy of
94.5%. Recent animal research and another investigation using imaging of human
tissue specimens also reported similar findings. Indicative accuracy was increased to
an average of 88.3% (awareness 86.6%, explicitness 90%) by using DL methods to
assess cell structure with confocal laser endomicroscopy for the location of OSCC.
The most basic optical coherence tomography (OCT) models were used, together
with a robotized finding computation and an image management system with an
intuitive user interface. When comparing the robot-assisted disease screening stage to
Diagnosing and imaging in oral pathology 9
mining, are useful for tasks like categorizing and forecasting. To differentiate
between the symptoms shown by patients who died from and survived OC in the
past, Tseng et al. [23] created a unified technique that incorporates clustering and
classifying aspects of data mining technology [38].
The prognosis and survival rate for those with OC is improved with early
diagnosis. The mortality and morbidity rates from OC may be reduced with the use
of AI by helping with early detection. Nayak et al. (2005) employed ANN to
classify laser-induced autofluorescence spectra recordings of normal, premalignant,
and malignant tissues. This was contrasted with a PCA of the identical problems.
The findings demonstrated a 98.3% accuracy, a 100% specificity, and a 96.5%
sensitivity, all of which are promising for the method’s potential use in real-time
settings. CNN was utilized by Uthoff et al. (2017) to identify precancerous and
cancerous lesions in autofluorescence and white light pictures. When comparing
CNN to medical professionals, it was shown that CNN was superior in identifying
precancerous and cancerous growths. With more data, the CNN model can function
more effectively [25]. Using confocal laser endomicroscopy (CLE) images,
Aubreville et al. (2017) trained a DL model to detect OC. Results showed that this
approach was 88.3% accurate and 90% specific. Comparative research was
undertaken by Shams et al. (2017) utilizing deep neural networks to forecast the
progression of OC from precancerous lesions in the mouth (DNN). DNNs were
compared to SVMs, RLS, and multilayer perceptron (MLP). DNN’s 96% accuracy
was the highest of all of the systems tested. Additionally, Jeyraj et al. validated
these results (2019). “Using hyperspectral pictures, malignant and noncancerous
tissues were identified using convolutional neural networks. CNN seems to be
useful for image-based categorization and the detection of OC without the need for
human intervention. Research on OC has exploded in recent years.” Several studies
have achieved their goals by creating AI models that can accurately forecast the
onset and progression of OC. Research comparing DL algorithms to human radi-
ologists has produced mixed findings. The accuracy of DL for detecting cervical
node metastases from CT scans was evaluated by Ariji et al., 2014. From 45
patients with oral squamous cell carcinoma, CT scans of 137 positive and 314
negative lymph nodes in the neck were utilized. Two experienced radiologists were
used to evaluate the DL method’s output. In terms of accuracy, the DL network was
on par with human radiologists. The researchers also used DL to identify tumors
that have spread beyond the cervical lymph nodes. Among the 703 CT scans we
obtained from 51 individuals, 80% were utilized as training data, and 20% were
used as test data to determine whether or not the disease had spread beyond the
nodes. The DL system outperformed the radiologist, indicating it might be utilized
as a diagnostic tool for spotting distant metastases. When it comes to diagnosing
dental conditions including cavities, sinusitis, periodontal disease, and temporo-
mandibular joint dysfunction, neural networks, and ML seem to be just as good, if
not more so, than professional radiologists and clinicians. Using AI models for
cancer detection enables the consolidation of disparate data streams for the purpose
of making decisions, evaluating risks, and referring patients to specialized care.
Indications are promising for the diagnostic and prognostic utility of AI in studies
Diagnosing and imaging in oral pathology 11
of premalignant lesions, lymph nodes, salivary gland tumors, and squamous cell
carcinoma. By facilitating early diagnosis and treatment measures, these approa-
ches have the potential to lower death rates. In order to provide an accurate and
inexpensive diagnosis, these platforms will need access to massive amounts of data
and the means to evaluate it. These models need to be fine-tuned until they are both
highly accurate and very sensitive before they can be successfully adopted into
conventional clinical practice. More so, regulatory frameworks are required to put
these models into clinical practice [37].
Clinical evaluations, imaging, and articulation quality data were all used by
Exarchos et al. [42] to identify characteristics that foreshadow the onset of OC and
predict relapse. Classifiers were built independently for each dataset, and then they
were combined into one cohesive model. Moreover, a dynamic Bayesian network
(DBN) was used for genetic data in order to develop disease evolution tracking
software. The authors were able to provide more customized therapy by separating
patients into those at high risk of recurrence (an accuracy of 86%) and those at low
risk (a sensitivity of 100%) based on the DBN data from the first visit. “The
association between HPV infection and the presence of apoptotic and proliferative
markers in persons with oral leukoplakia has been studied by others using a parti-
cular sort of ML called a fuzzy neural network (FNN).” Clinical and immunohis-
tochemical test data, demographics, and lifestyle habits of 21 patients with oral
leukoplakia were input into an FNN system, with HPV presence/absence acting as a
“output” variable. Researchers used this method to relate a positive proliferating
cell nuclear antigen result to a history of smoking and a history of human papil-
lomavirus infection to survival in people with oral leukoplakia. Transcriptome
biomarkers in OSCC were found by support vector machine classifier-based
bioinformatics analysis of a case-control dataset. Sputum samples from 124 healthy
people, 124 people with premalignant diseases, and 125 people with OC sores were
studied using conductive polymer shower ionization mass spectrometry in novel
research to detect and confirm dysregulated chemicals and reveal changed meta-
bolic pathways (CPSI-MS). Evidence suggests that ML to CPSI-MS of spit samples
might provide a simple, fast, affordable, and painless alternative for OC localiza-
tion since the Rope approach when applied in combination with CPSI-MS, has been
shown to yield a sub-atomic result with an accuracy of 86.7%. AI research has
shown a connection between alterations in the spit microbiome and the develop-
ment of dental disease. The salivary microbiome of individuals with OSF was
compared to that of individuals with OSF and oral squamous cell carcinoma using
high-throughput sequencing of bacterial 16S rRNA by Chen et al. (OSCC). When
comparing OSF and OSF + OSCC instances, the AUC was 0.88 and the mean
5-fold cross-approval exactness was 85.1% thanks to the ML analysis’s effective
coordination of elements of the bacterial species with those of the host’s clinical
findings and lifestyle. Man-made intelligence applications in omics are aimed at
completing tasks that are beyond the scope of human capability or conventional
fact-based methods of investigation. Through the use of AI and ML methods,
which enable the coordinated translation of omics information with clin-
icopathologic and imaging features, we may be able to enhance clinical treatment
and broaden our understanding of OC [8].
lack of resources and human capital. Healthcare efforts powered by AI have the
potential to provide high-quality medical attention to underserved areas. AI’s
influence on therapy, along with its efficacy and cost-effectiveness, must be
assessed via prospective randomized control trials and cohort studies [50,51].
1.2 Conclusion
AI is developing fast to meet a growing need in the healthcare and dental industries.
Much of the study of AI remains in its infancy [52]. There are now just a small
number of dental practices that have implemented internal real-time AI technologies.
Data-driven AI has been proven to be accurate, open, and even superior to human
doctors in several diagnostic situations [53]. AI is capable of performing cognitive
tasks including planning, problem-solving, and thinking. Its implementation may cut
down on archival space and labor costs, as well as on human error in diagnosis. A
new era of affordable, high-quality dental treatment that is more accessible to more
people is on the horizon, thanks to the proliferation of AI in the dental field.
References
[1] Rashidi HH, Tran NK, Betts EV, Howell LP, and Green R. Artificial intel-
ligence and machine learning in pathology: the present landscape of super-
vised methods. Acad. Pathol. 2019;6:2374289519873088. doi:10.1177/
2374289519873088. PMID: 31523704; PMCID: PMC6727099
[2] Alabi RO, Elmusrati M, Sawazaki-Calone I, et al. Comparison of supervised
machine learning classification techniques in prediction of locoregional recur-
rences in early oral tongue cancer. Int. J. Med. Inform. 2020;136:104068.
[3] Khanagar SB, Naik S, Al Kheraif AA, et al. Application and performance of
artificial intelligence technology in oral cancer diagnosis and prediction of
prognosis: a systematic review. Diagnostics 2021;11:1004.
[4] Kaladhar D, Chandana B, and Kumar P. Predicting cancer survivability
using classification algorithms. Books 1 view project protein interaction
networks in metallo proteins and docking approaches of metallic compounds
with TIMP and MMP in control of MAPK pathway view project predicting
cancer. Int. J. Res. Rev. Comput. Sci. 2011;2:340–343.
[5] Bànkfalvi A and Piffkò J. Prognostic and predictive factors in oral cancer:
the role of the invasive tumour front. J. Oral Pathol. Med. 2000;29:291–298.
[6] Schliephake H. Prognostic relevance of molecular markers of oral cancer—a
review. Int. J. Oral Maxillofac. Surg. 2003;32:233–245.
[7] Ilhan B, Lin, K, Guneri P, and Wilder-Smith P. Improving oral cancer outcomes
with imaging and artificial intelligence. J. Dent. Res. 2020;99:241–248.
[8] Kann BH, Aneja S, Loganadane GV, et al. Pretreatment identification of
head and neck cancer nodal metastasis and extranodal extension using deep
learning neural networks. Sci. Rep. 2018;8:1–11.
Diagnosing and imaging in oral pathology 17
[25] Krishna AB, Tanveer A, Bhagirath PV, and Gannepalli A. Role of artificial
intelligence in diagnostic oral pathology – a modern approach. J. Oral
Maxillofac. Pathol. 2020;24:152–156.
[26] Sunny S, Baby A, James BL, et al. A smart tele-cytology point-of-care
platform for oral cancer screening. PLoS One 2019;14:1–16.
[27] Uthoff RD, Song B, Sunny S, et al. Point-of-care, smartphone-based, dual-
modality, dual-view, oral cancer screening device with neural network
classification for low-resource communities. PLoS One 2018;13:1–21.
[28] Nayak GS, Kamath S, Pai KM, et al. Principal component analysis and
artificial neural network analysis of oral tissue fluorescence spectra: classi-
fication of normal premalignant and malignant pathological conditions.
Biopolymers 2006;82:152–166.
[29] Musulin J, Štifanić D, Zulijani A, Cabov T, Dekanić A, and Car Z. An
enhanced histopathology analysis: an AI-based system for multiclass grad-
ing of oral squamous cell carcinoma and segmenting of epithelial and stro-
mal tissue. Cancers 2021;13:1784.
[30] Kirubabai MP and Arumugam G. View of deep learning classification
method to detect and diagnose the cancer regions in oral MRI images. Med.
Legal Update 2021;21:462–468.
[31] Chang SW, Abdul-Kareem S, Merican AF, and Zain RB. Oral cancer
prognosis based on clinicopathologic and genomic markers using a hybrid of
feature selection and machine learning methods. BMC Bioinform.
2013;14:170–185.
[32] Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, and Fotiadis DI.
Machine learning applications in cancer prognosis and prediction. Comput.
Struct. Biotechnol. J. 2015;13:8–17.
[33] Exarchos KP, Goletsis Y, and Fotiadis DI. Multiparametric decision support
system for the prediction of oral cancer reoccurrence. IEEE Trans. Inf.
Technol. Biomed. 2012;16:1127–1134.
[34] Speight PM, Elliott A, Jullien JA, Downer MC, and Zakzrewska JM. The use
of artificial intelligence to identify people at risk of oral cancer and pre-
cancer. Br. Dent. J. 1995;179:382–387.
[35] Uthoff RD, Song B, Birur P, et al. Development of a dual-modality, dual-
view smartphone-based imaging system for oral cancer detection. In
Proceedings of SPIE 10486, Design and Quality for Biomedical
Technologies XI, 2018. 10486. https://doi.org/10.1117/12.2296435.
[36] van Staveren HJ, van Veen RL, Speelman OC, Witjes MJ, Star WM, and
Roodenburg JL. Classification of clinical autofluorescence spectra of oral
leukoplakia using an artificial neural network: a pilot study. Oral Oncol.
2000;36:286–293.
[37] Wang CY, Tsai T, Chen HM, Chen CT, and Chiang CP. PLS-ANN based
classification model for oral submucous fibrosis and oral carcinogenesis.
Lasers Surg. Med. 2003;32:318–326.
[38] Bowling M, Fürnkranz J, Graepel T, and Musick R. Machine learning and
games. Mach. Learn. 2006;63:211–215.
Diagnosing and imaging in oral pathology 19
1
Department of Periodontology, Rama Dental College, India
2
Department of Oral Medicine and Radiology, Rama Dental College, India
3
Department of Periodontology, Government Dental College and Hospital – Raipur, India
22 Deep learning in medical image processing and analysis
is necessary for dentists to keep in mind the potential consequences that it may have
for a prosperous clinical practice in the years to come.
2.1 Introduction
Since the 1950s, researchers have been actively studying artificial intelligence (AI),
one of the newest subfields of computer science [1]. The use of deep learning (DL)
and AI is spreading across the medical and dental communities. AI, as described by
John McCarthy, was one of the field’s first pioneers “the science and engineering of
making intelligent machines” [2]. It is not hard to find places where AI has been put
to use. The use of AI in healthcare has been on the rise in recent years, and its
results have been encouraging. AI has already found uses in several areas of
healthcare, including human biology and dental implants [1].
AI is a subfield of man-made intelligence in which a framework learns how to
use factual examples found in a dataset to make forecasts about the way of
behaving of new information tests, while man-made reasoning (man-made intelli-
gence) refers to the review, improvement, and examination of any PC framework
showing “wise way of behaving” [3]. Simon Cowell coined this term in 1959 [4].
Machine learning’s foundational purpose is to discover regularities in new data
(test data) for the purpose of performing tasks like classification, regression, and
clustering. Training for machine learning algorithms may be done in two ways:
supervised and unsupervised. “Classification (deciding what category a given data
point belongs to) and regression are two examples of tasks that are often accom-
plished through supervised training, in which the learning model is fed a collection
of input–output pairs of training data (finding a numerical relationship between a
set of independent and dependent variables).” However, unsupervised training is
often used for tasks like clustering and dimensionality reduction, in which the goal
is to merely collect the essential characteristics in a given data set. Third, ML uses
algorithms like artificial neural networks (ANNs) to make predictions based on the
data it has been fed. These networks are modeled after the human brain and use
artificial neurons joined together to process incoming data signals. The idea was
proposed in 1943 by Warren McCulloch and Walter Pitts. A stochastic neural
analog reinforcement calculator was then developed by Minsky and Dean Edmunds
in 1951 [4].
DL, a specialized subfield of machine learning that employs sophisticated
techniques based on ANEs, has seen a surge in popularity in recent years (ANN).
Because of its superior generalization capabilities, DL has found usage in other
fields outside data analytics, including engineering and healthcare. It was not until
2006 that Hinton et al. presented the concept of a convolutional neural network
(CNN), now often referred to as DL. When processing information, it employs
neural networks with several layers. Using data to examine patterns, DL systems
may be able to provide better results. In 1969, the backpropagation algorithm was
created, and it was this innovation that cleared the door for DL systems. Important
turning points in the development of AI are shown in Figure 2.1.
Oral implantology with AI and applications of image analysis 23
conclusions from our analyses, and so on. Unlabeled dental patient datasets may
nevertheless be able to detect labels, such as those linked with certain patterns of bone
loss owing to periodontal disease. This may help form groups for further study. The
accuracy of algorithms based on reinforcement learning has recently been apparent in
dental clinical practice via the use of image processing apps [5].
may be valuable throughout the training phase. If new discoveries are to be made in
this area, researchers with an interest should pool their resources and share the
information they have gathered [5].
goal. Further, the characteristics and pre-processing of the datasets may affect the
evaluation of their performances. Selection case-based reasoning (CBR) has been
used in analysis, according to other research. CBR gives input by accumulating and
learning from past situations. Thus, even if new cases may be introduced, new rules
may be established. Similar clinical manifestations are seen in a wide variety of
oral cavity disorders, which may make accurate diagnosis challenging. The diag-
nostic accuracy and patient compliance with the prescribed treatment plan are
compromised as a result. The CBR technology has been helpful in creating a
thorough and methodical strategy for the one-of-a-kind identification of these ill-
nesses, with the goal being a more precise definition of similarities and differences
between them. Although preliminary, the findings demonstrate that the algorithms
have promise for enhancing the standard of care and facilitating more effective
treatment. These methods have also been put to use in other contexts, such as the
identification and segmentation of structures, the classification of dental artifacts
for use in image verification, and the classification of maxillary sinus diseases.
There has also been a lot of talk about how to apply ML algorithms to anticipate
perioperative blood loss in orthognathic surgery. It is feasible that a random forest
classifier might be used to estimate the expected perioperative blood loss and so
prevent any unanticipated issues during surgery. This forecast has the potential to
aid in the management of elective surgical operations and improve decision-
making for both medical professionals and their patients. Researchers have
employed ANNs to make accurate diagnoses in situations involving orthognathic
surgery, with 96% accuracy.
adult root caries. When compared to the other algorithms employed for root caries
diagnosis, the SVM-based technique performed the best, with a 97.1% accuracy
rate, a 95.1% precision rate, a 99.6% sensitivity rate, and a 94.3% specificity rate.
Cross-sectional data were used in the research, which raised several red flags
regarding the model’s predictive power. The use of longitudinal data in research is
recommended for greater generalization and validation of findings. In addition, a
study with a solid methodological basis examined the use of graphical regression
neural networks (GRNNs) to predict caries in the elderly, and the results were
promising: the model’s sensitivity was 91.41% on the training set and 85.16% on
the test set. A GRNN is a sophisticated kind of nonparametric regression-based
neural network. The ability of these algorithms to generate predictions and compare
the performance of systems in practice is greatly enhanced by the fact that they
need just a few number of training samples to converge on the underlying function
of the data. With just a tiny amount of supplementary information, the modification
may be obtained effectively and with no more intervention from the user. The cost-
effectiveness of these technologies was analyzed for the first time in a ground-
breaking research; this is a crucial factor in deciding whether or not to use them in a
clinical setting. In conclusion, the research showed promise for the use of auto-
mated approaches to the detection of caries.
optimum embed sizes (i.e., length and breadth) prior to surgery in cases when there
is insufficient bone at the cautious site. Nevertheless, the doctor’s expertise in
reading CBCT pictures is crucial for the success of the implant design process.
DL and other recent developments in machine learning are facilitating the
recognition, categorization, and quantification of patterns in medical pictures,
which aids in the diagnosis and treatment of many illnesses [8].
2.3.1 Use of AI in radiological image analysis for
implant placement
X-rays and computerized tomography (CT) scans are just two examples of the
medical imaging tools that dentists have been employing for decades to diagnose
issues and plan treatment. Today, dental professionals rely heavily on computer
tools to aid them in the diagnosis and treatment of such conditions [9].
Using AI has allowed for the development of CAD systems for use in radiology
clinics for the purpose of making accurate diagnoses. An effective DL application
applied on medical diagnostic pictures is the deep convolutional neural network
(DCNN) approach. Tooth numbering, periapical pathosis, and mandibular canal recog-
nition are just a few of the dental diagnoses that have benefited from this technique,
which also allows the analysis of more complicated pictures like CBCT imaging.
Despite the importance of radiographic image assessment and precise implant design
and interpretation of anatomical data, many experts and general practitioners lack the
necessary expertise in these areas. This scenario creates difficulties for dentists and has
yet to be resolved. The use of AI systems in radiographic interpretation offers several
benefits to the doctor and may help with this issue. In dentistry, this might also mean less
time wasted on incorrect diagnoses and treatment plans and less work for you [10].
Several DL-based algorithms have also been studied in medical image analysis
procedures involving a wide range of organs, including the brain, the pancreatic,
breast cancer diagnostics, and the identification and diagnosis of COVID-19.
Dental implant identification might benefit from DL’s established efficacy in the
field of medical imaging. Recognizing dental implants is critical for many areas of
dentistry, including forensic identification and reconstructing damaged teeth and
jaws. Implants in the context of implant dentistry provide patients enticing pros-
thetic repair options. The accurate classification of a dental implant placed in a
patient’s jaw prior to the availability of dental records is a significant challenge in
clinical practice. To determine the manufacturer, design, and size of an implant,
dentists will commonly examine an X-ray picture of the device. This data is useful
for determining the implant’s connection type. As soon as the tooth is extracted, the
dentist may place an order for a new abutment and a replacement tooth. When the
improper abutment or replacement tooth is purchased, it may be quite expensive for
dentists. Therefore, it stands to reason that dentists might benefit greatly from an
automated system that analyzes X-rays of patients’ jaws to determine which cate-
gory the patient’s dental implant best fits into [8]. Using periapical and panoramic
radiographs, several AI models have been built for implant image identification.
Additionally, dental radiographs have been exploited by AI models to identify
periodontal disease and dental cavities. AI has also been used to optimize dental
30 Deep learning in medical image processing and analysis
coordinating efforts across disciplines and making more use of digital dentistry,
they might better ensure patients received treatment that was both timely and
predictable.
3Shape has created a suite of specialized software applications that provide an
end-to-end digital workflow, from diagnosis to treatment planning to prosthetic
process and implant design and visualization. More importantly, it provides suffi-
cient adaptability for the dental practitioner to make any necessary adjustments. An
intraoral digital scanner is required for use with these applications, which process
digital stills and moving pictures. In addition to being able to see and alter teeth, the
program also allows for the construction of 3D implants using a wide range of pre-
existing manufacturers and customization choices. In addition, it works with spe-
cialized printers to produce the final output. To create dental implants and other
dental applications in a completely digital workflow, Exocad may be used as a
CAD (computer-aided design) tool. A custom tooth set may be created from scratch
using a variety of methods, one of which is by importing a single tooth or a whole
set of teeth from one of numerous dental libraries. When working with a 3D model
created using 3Shape, it is simple to make adjustments like moving the teeth around
or enlarging them. The program facilitates implant design with an intuitive inter-
face that walks the user through the process’s numerous sophisticated possibilities.
The whole facial structure may be scanned in 3D with Bellus 3D Dental Pro
Integration. The primary goal of this program is to streamline the patient accep-
tance process and improve the efficacy of dental treatment by integrating the
treatment plan with the patient’s facial configuration in full 3D [11].
work was required to refine AI calculations for designing implants and assess their
efficacy in in-vitro, animal, and clinical settings [3].
2.6 Discussion
AI systems have been shown to accurately identify a wide variety of dental
anomalies, including dental caries, root fractures, root morphologies, jaw patholo-
gies, periodontal bone damages, periapical lesions, and tooth count, according to the
dental literature. Before DCNN applications were used in dentistry, research like
these analyzed data from several dental radiographic images such periapical,
panoramic, bitewing cephalometric, CT, and CBCT. To be sure, there is not a lot of
research using CT and CBCT. According to a CBCT study by Johari et al., the
probabilistic neural network (PNN) method is effective in identifying vertical root
fractures [17]. Hiraiwa et al. also found that AI was able to detect impacted teeth in
CBCT with acceptable results [17]. Good news for the future of this field was
revealed in a study of periapical lesions in CBCT images by Orhan et al. (2019),
who discovered that volume estimations predicted using the CNN approach are
congruent with human measurements [18]. In both dentistry and medicine, treatment
planning is a crucial process stage. If the therapy is to be effective, it is necessary to
first arrive at the proper diagnosis, which may then be used to develop the most
appropriate treatment plan for the individual patient. Planning a course of treatment
requires extensive organization and is heavily dependent on a number of variables,
including the doctor’s level of expertise. Over the last several years, AI systems have
been utilized to help doctors with anything from diagnosis to treatment planning.
Promising outcomes were achieved using the neural network machine learning
system in conjunction with a variety of treatment modalities, including radiation
therapy and orthognathic surgery. Dental implant planning relies heavily on radio-
graphic imaging, as is common knowledge. Before a surgery, it is advisable to use
3D imaging technology to inspect the area and make precise preparations by taking a
number of measures in accordance with the anatomical differences expected. The
key anatomic variables that influence the implant planning are the mandibular canal,
sinuses, and nasal fossa that were examined in the present research. In a recent
paper, Kwak et al. found that the CNN approach worked well for identifying the
mandibular canal in CBCT images, suggesting that this might be a future potential
for dental planning [19]. To find out where the third mandibular molar should be in
relation to the mandibular canal, Fukuda et al. examined 600 panoramic radio-
graphs. According to our best estimation, Jaskari et al. have successfully used the
CNN method to segment the mandibular canal in all CBCT images. AI algorithms,
they said, provide sensitive and trustworthy findings in canal determination, sug-
gesting a potential role for AI in implant design in the future [20].
The accuracy of the measures used in implant planning will improve along
with the accuracy of AI’s ability to identify anatomical components. But there is at
least one research that has been done, and the findings of which use AI to suc-
cessfully identify sinus diseases in panoramic photos [21]. Bone thickness and
34 Deep learning in medical image processing and analysis
height were measured for this investigation to see how well the implant planning
had gone. This research demonstrates the need for a DL system to enhance AI bone
thickness assessments. As a result, doctors will appreciate the use of these tech-
nologies in implant design, and the field of implantology will benefit from the
added stability they provide [10].
References
[1] Alharbi MT and Almutiq MM. Prediction of dental implants using machine
learning algorithms. J Healthc Eng 2022;2022:7307675. doi:10.1155/2022/
7307675. PMID: 35769356; PMCID: PMC9236838.
[2] Reddy S, Fox J, and Purohit MP. Artificial intelligence enabled healthcare
delivery. J R Soc Med 2019;112(1):22–28.
[3] Revilla-León M, Gómez-Polo M, Vyas S, et al. Artificial intelligence
applications in implant dentistry: a systematic review. J Prosthetic Dentistry
2021;129:293–300. doi:10.1016/j.prosdent.2021.05.008.
[4] Patil S, Albogami S, Hosmani J, et al. Artificial intelligence in the diagnosis of
oral diseases: applications and pitfalls. Diagnostics 2022;12:1029. https://doi.
org/10.3390/diagnostics12051029.
[5] Reyes LT, Knorst JK, Ortiz FR, and Ardenghi TM. Scope and challenges of
machine learning-based diagnosis and prognosis in clinical dentistry: a lit-
erature review. J Clin Transl Res 2021;7(4):523–539. PMID: 34541366;
PMCID: PMC8445629.
[6] Kunz F, Stellzig-Eisenhauer A, Zeman F, and Boldt J. Artificial intelligence
in orthodontics. J Orofac Orthop Fortschritte Kieferorthop 2020;81:52–68.
[7] Papantonopoulos G, Takahashi K, Bountis T, and Loos BG. Artificial neural
networks for the diagnosis of aggressive periodontitis trained by immuno-
logic parameters. PLoS One 2014;9:e89757.
Oral implantology with AI and applications of image analysis 35
1
Department of Electronics and Communication Engineering, KLE Technological University, Dr. M.S.
Sheshgiri College of Engineering & Technology, Belagavi Campus, India
2
Department of Electrical and Electronics Engineering, KLE Technological University, Dr. M.S.
Sheshgiri College of Engineering & Technology, Belagavi Campus, India
3
Department of CSE(AI), KLE Technological University, Dr. M.S. Sheshgiri College of Engineering &
Technology, Belagavi Campus, India
4
Department of Electrical and Electronics Engineering, Indian Institute of Technology, Dharwad,
WALMI Campus, India
38 Deep learning in medical image processing and analysis
3.1 Introduction
The aberrant cell division which leads to the body’s cells becoming irregular causes
the growth of tumours. These unregulated aberrant cell divisions have the potential
to kill healthy body tissues and end up forming a mass, generally termed a tumour,
whose growth causes multiple disorders. These tumours can be broadly classified
into two types namely, malignant and benign [1]. The tumour with high rates of
spreading and influencing capabilities to other healthy parts of the body is called a
malignant tumour. On the other hand, benign tumours are known for not spreading
or influencing other healthy parts of the body. Hence, not all tumours move from
one part of the body to another, and not all types of tumours are necessarily
carcinogenic.
Growth of tumours, weight reduction, metabolic syndrome, fertility, lympho-
edema, endocrine, peripheral neuropathy, cardiac dysfunction, pulmonary, altered
sleep, psychological, fear of recurrence, long haul, fatigue, and irregular bleeding are
few of the short-term and long-term side effects faced by the cancer survivors and
patients [2,3]. The processes of detection, comprehension, and cure are still in the
early stages and are a focused research domain within the therapeutic sector. In the
past and currently, the traditional cancer diagnosis process has had many limitations
such as high dependence on the patient’s pathological reports, multiple clinical trials
and courses, and slow indicative procedure [4]. With a prevalence of 13.6% in India,
breast cancer is the second most common cause of death for women [5]; Nonetheless,
tobacco smoking causes 90% of occurrences of lung cancer, the most common cause
of death among males [6]. Unfortunately, lung cancer, along with breast cancer, is a
significant cause of mortality in women [7,8]. Considering the year 2020, Figures 3.1
and 3.2 illustrate the top five common cancers which cause the majority of mortality
in males and females around the globe [5].
It has been found that early-phased treatment can either postpone or prevent
patient mortality when a patient is early diagnosed with cancer in the breast or lung.
However, with the current technology, early identification is difficult, either
47.6%
Percentage of female mortality (%)
40
30
20
15.5%
13.7%
10.5% 9.5%
10 7.7%
6.0%
0
Breast Lung Liver Colorectum Cervix uteri Stomach Other cancer
Types of cancers
Figure 3.1 For the year 2020, the top five most commonly occurring cancers
cause the majority of female mortality worldwide
ML algorithms for breast and lung cancer detection 39
42.8%
40
Figure 3.2 For the year 2020, the top five most commonly occurring cancers
cause the majority of male mortality worldwide
Machine learning
algorithms used for
detection of
Section V: breast and lung Section III:
cancer: A review
Conclusion Results and discussion
Section IV:
Proposed methodology
early lung cancer detection is connected with a reduction in the number of FP classifi-
cations while keeping a high degree of true-positive (TP) diagnoses, i.e., sensitivity.
The detection of lung cancer based on CT images was proposed by the authors in
[14]. The authors employed CNN and the developed model was found to be 96%
accurate as compared with the previous study [11]. The model was implemented in
MATLAB and the dataset was obtained from the lung image database consortium
(LIDC) and image database resource initiative (IDRI). The system was also able to
detect the presence of cancerous cells. Authors in [15] provided an overview of the
technical aspects of employing radiomics; i.e. analysis of invisible data from the
extracted image and the significance of artificial intelligence (AI) in the diagnosis of
non-small cell lung cancer. The technical implementation limitations of radiomics
such as harmonized datasets, and large extracted data led to the exploration of AI in
the diagnosis of cancer. The authors discussed the multiple steps employed in the
study which include data acquisition, reconstruction, segmentation, pre-processing,
feature extraction, feature selection, modelling, and analysis. A detailed study of the
existing dataset, segmentation method, and classifiers for predicting the subtypes of
pulmonary nodules was presented.
The authors in [16] conducted a study to predict the risk of cancer on patients’
CT volumes. The model exhibited an accuracy of 94.4% and outperformed when
compared to the radiologists. The study is unique in nature as the research is con-
ducted in comparison to the previous and current CT images. Trial cases of 6,716
numbers were considered, and the model was validated on 1,139 independent
clinical datasets. The data mining technique was used in the study by the authors in
[17], and the experimentation aimed at providing a solution to the problem which
arises after pre-processing the data during the process of cleaning the data. The
authors experimented with applying the filter and resampling the data with three
classifiers on two different datasets. The study was conducted over five perfor-
mance parameters. The results demonstrated the accuracy level of the classifiers to
be better after the resampling technique was employed. The dataset considered
were Wisconsin Breast Cancer (WBC) and breast cancer dataset. Results proved
the performance of the classifiers to be improved for the WBC dataset with the
resampling filter applied four times. Whereas, for the breast cancer dataset, seven
times the resampling filter was applied. J48 Decision Tree classifier showed
99.24% and 98.20%, Naı̈ve Bayes exhibited 99.12% and 76.61%, and sequential
minimal optimization (SMO) showed 99.56% and 95.32% for WBC and breast
cancer datasets, respectively. The WBC dataset was used in the study by the
authors in [18] who applied visualization and ML techniques to provide a com-
parative analysis on the same. The predictions were made by visualizing the data
and analyzing the correlation of the features. The result demonstrated an accuracy
of 98.1% by managing the imbalanced data. The study categorized the original
dataset into three datasets. All the independent features were in one dataset, all the
highly correlated features were in one dataset, and the features with low correlation
were grouped as the last dataset. Logistic regression showed the accuracy results as
98.60%, 95.61%, and 93.85%, KNN demonstrated the scores as 96.49%, 95.32%,
and 94.69, SVM obtained 96.49%, 96.49%, and 93.85%, decision tree showed
42 Deep learning in medical image processing and analysis
95.61%, 93.85%, and 92.10%, random forest obtained 95.61%, 94.73%, and
92.98%, and rotation forest algorithm showed 97.4%, 95.89%, and 92.9%.
A systematic review was presented by the authors in [19] who presented the DL
and ML techniques for detecting breast cancer based on medical images. The review
summarized research databases, algorithms, future trends, and challenges in the
research field. It provided a complete overview of the subject and related progress. It
proposed that computer-aided detection can be more accurate than the diagnosis of a
radiologist. A computer-aided design (CAD) system was developed in [20] to classify
mammograms. The study employed feature extraction by discrete wavelet transfor-
mation. The principal component analysis is used to extract the discriminating features
from the characteristics of the original vector features. A weighted chaotic salp swarm
optimization algorithm is proposed for classification. The dataset under study is the
Mammographic Image Analysis Society (MIAS), Digital Database for Screening
Mammography (DDSM), and Breast Cancer Digital Repository (BCDR). A complete
review of the CAD methods is discussed by the authors in [21] for detecting breast
cancer. The study was conducted on mammograms, and enhancement of the image and
histogram equalization techniques were proposed. Table 3.1 summarizes the entire
literature review with limitations, motivation, and aim.
from the image as well as the salt and pepper noises, also known as impulse
noises. By using techniques such as the Otsu threshold and statistical threshold,
additional disturbance elements such as artefacts, black backgrounds, and labels
existing on the mammography or tomography images can be eliminated. Other
elements such as contrast and brightness can also be enhanced by utilizing a
variety of techniques including intensity-range based partitioned cumulative
distribution function (IRPCDF) and background over-transformation controlled
(BOTC) [29].
The selection of the region of interest (ROI) and related annotation is one of
the best ways to diagnose cancer by concentrating on the tumour-grown areas
with a high degree of accuracy and precision. However, the laborious and time-
consuming traditional ROI selection procedure is discouraging [30]. Therefore,
in our next study, we will propose to develop an automated technique that can
speed up the ROI selection process compared to the manual methods. In most of
the methods, feature extraction is an optional yet effective process. The features
such as wavelet energy values, standard deviation, and mean are derived from the
images based on texture analysis methods such as grey level co-occurrence
matrix (GLCM). A metaheuristic process based on natural selection known as
genetic algorithm (GA) can also be implemented for feature extraction and
selection. This method helps in enhancing the quality of the data obtained from
the input images.
In the next study, the best algorithms and methodologies will be evaluated
and implemented based on the literature review presented in this chapter. This
stage will involve performing a thorough analysis of the top five algorithms
considering diverse circumstances. The final phase of the suggested procedure
will involve the classification-based diagnosis. The lung cancer dataset will be
the primary testing ground for this suggested strategy which will then be applied
to all other cancer datasets. The proposed methodology will primarily focus on
understanding and improving the fault regions in low-resolution pictures,
understanding the development patterns of tumour sizes, determining whether or
not the tumour is malignant, and ultimately improving diagnosis errors and
computational cost.
48 Deep learning in medical image processing and analysis
3.5 Conclusion
The main focus of this study is the identification of numerous implementable algo-
rithms and techniques for rapid and precise detection of malignant tumour growth in
the breast and lungs. These algorithms will function in conjunction with the con-
ventional medical methods of a cancer diagnosis. The fact that lung cancer and breast
cancer are the two major causes of mortality for both men and women leads us to
perform numerous research on the early detection of these malignancies. For the
proposed methodology to function, the images from mammography or tomography
will be helpful as input images.
Understanding the wide range of algorithms that can be employed is immen-
sely aided by this literature review on various algorithms used to diagnose breast
and lung cancer. The proposed methodology will henceforth be expanded and put
into practice for lung cancer in future studies considering other cancer datasets.
Alternate approaches for image processing will be explored and combined with the
same model, including ROI selection and feature extraction. The best-performing
algorithms will then be the subject of extensive research.
References
[9] din NM ud, Dar RA, Rasool M, et al. Breast cancer detection using deep
learning: datasets, methods, and challenges ahead. Comput Biol Med 2022;
149: 106073.
[10] Ismail NS and Sovuthy C. Breast cancer detection based on deep learning
technique. In: 2019 International UNIMAS STEM 12th Engineering
Conference (EnCon). IEEE, pp. 89–92.
[11] Ministry of Health Malaysia. Malaysia National Cancer Registry Report
(MNCRR) 2012–2016, 2019, http://nci.moh.gov.my.
[12] Deserno T and Ott B. 15,363 IRMA images of 193 categories for
ImageCLEFmed 2009. RWTH Publications. Epub ahead of print 2009,
doi:10.18154/RWTH-2016-06143.
[13] Penedo MG, Carreira MJ, Mosquera A, et al. Computer-aided diagnosis: a
neural-network-based approach to lung nodule detection. IEEE Trans Med
Imaging 1998; 17: 872–880.
[14] Sasikala S, Bharathi M, and Sowmiya BR. Lung cancer detection and
classification using deep CNN. Int J Innov Technol Explor Eng 2018; 8:
259–262.
[15] Devi VA, Ganesan V, Chowdhury S, Ramya G, and Dutta PK. Diagnosing
the severity of covid-19 in lungs using CNN models. In 6th Smart Cities
Symposium (SCS 2022), Hybrid Conference, Bahrain, 2022, pp. 248–252,
doi:10.1049/icp.2023.0427.
[16] Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening
with three-dimensional deep learning on low-dose chest computed tomo-
graphy. Nat Med 2019; 25: 954–961.
[17] Mohammed SA, Darrab S, Noaman SA, et al. Analysis of breast cancer
detection using different machine learning techniques. In: Data Mining and
BigData. DMBD 2020. Communications in Computer and Information
Science, vol. 1234. Springer, Singapore, pp. 108–117.
[18] Dutta PK, Vinayak A, and Kumari S. Asymptotic patients’ healthcare
monitoring and identification of health ailments in post COVID-19 scenario.
In: O Jena, AR Tripathy, AA Elngar, and Z Polkowski (eds.), Computational
Intelligence and Healthcare Informatics, 2021, https://doi.org/10.1002/
9781119818717.ch16.
[19] Houssein EH, Emam MM, Ali AA, et al. Deep and machine learning tech-
niques for medical imaging-based breast cancer: a comprehensive review.
Exp Syst Appl; 167. Epub ahead of print 1 April 2021, doi:10.1016/j.
eswa.2020.114161.
[20] Mohanty F, Rup S, Dash B, et al. An improved scheme for digital mam-
mogram classification using weighted chaotic salp swarm algorithm-based
kernel extreme learning machine. Appl Soft Comput 2020; 91: 106266.
[21] Ramadan SZ. Methods used in computer-aided diagnosis for breast cancer
detection using mammograms: a review. J Healthc Eng 2020; 2020: 1–21.
[22] Hamed G, Marey MAE-R, Amin SE-S, et al. Deep learning in breast cancer
detection and classification. In: Proceedings of the International Conference
on Artificial Intelligence and Computer Vision (AICV2020). AICV 2020.
50 Deep learning in medical image processing and analysis
1
School of Information Technology & Engineering, Vellore Institute of Technology, India
52 Deep learning in medical image processing and analysis
4.1 Introduction
Throughout the history of humans, medical research has always been a top priority.
Be it the discovery of vaccines, anesthesia, micro-surgeries, or radiology, each of
them has had a huge impact on the human population. Deep learning can become
an indispensable tool for doctors just like a stethoscope. With its exemplary image
segmentation capability, deep learning has made significant contributions to bio-
medical image processing. Using natural language processing and computer vision
capabilities, deep learning furnishes diverse solutions that are not only limited to
processing the image but also in delivering adequate analysis with regard to results
achieved. Non-linear processing units make up a layered architecture which facil-
itates feature extraction and image transformation [3]. This layered architecture
supported by deep learning algorithms allows the system to adjust weights and
biases depending on the effect of respective parameters. Each layer is responsible
for a specific kind of processing such as gray scaling the biomedical image, noise
reduction, color balancing, and ultimately feature detection. This constant “adjust
and tune” process makes deep learning algorithms extremely useful with
medical data.
Today, the advancements in photographic technologies have enabled physi-
cians to capture high-resolution images. While each image measures as high as 32
MB, processing images using general-purpose processing algorithms is extremely
redundant and time-consuming. Deep learning algorithms when put into action, not
just analyze these images (a connected database with parameters) but can even
diagnose the disease or disorder, eliminating the need for a doctor. It is often
reported that the difference in the approach of doctors leads to different paths of
treatment. Deep learning predetermines the disease by analyzing symptoms, thus
saving a lot of time and effort. Moreover, the physician can now propose treatment
Deep learning for streamlining medical image processing 53
are broadly classified into two types, i.e., feed-forward neural network (FFNN) or
recurrent neural network (RNN) based on the pattern of the association of neurons.
An FFNN forms a directed acyclic graph with each layer consisting of nodes, while
an RNN generally occurs as a directed cycle. Weights and activation functions are
the other two parameters that affect the output of a neural network. Training a
neural network is an iterative process where weights are optimized to minimize the
loss function. By adjusting these weights, neural networks efficiency can be
altered. By using an activation function, the ANN is activated or deactivated. This
process is done by comparing the input value to a threshold value. This on-and-off
operation throughout the layers of the network, introduces a non-linearity and
makes it continuously differentiable. Some activation functions are tabulated in
Table 4.1.
Even though deep learning is one of the significant inventions in the field of
computational sciences. Unfortunately, there doesn’t exist any “one size fits all”
solution. Deep learning comes with a lot of dependencies. This trade-off between
high dependencies and meticulous results, often expects stakeholders to take rigid
decisions. For every AI solution deployed, a general set of preparations need to be
followed, which are listed as follows:
● Define the size of the sample from the dataset
● Determining if a previous application domain could be modified to solve the
issue at hand. This helps in estimating if the model needs to be trained from
scratch or if transfer learning could be applied
● Assess dependent and independent variables to decide the type of algorithms
applicable to the given problem statement
● Interpret results based on model logic and behavior
form. Unlike a FCN SegNet utilizes a large pooling index that is received from the
encoder for up-sampling the input non-linearly. Introduced as “You Only Look
Once” (YOLO), this algorithm was a substitute for RBCNN. Because of its lucidity
and enhanced execution speed, YOLO is becoming extremely popular in the object
detection domain. YOLO imparts real-time object detection capabilities by divid-
ing the image into N grids, each with S S dimension. Each grid is accountable for
detecting only a single object. This enables YOLO to a perfect algorithm when
detecting large objects [11]. However, when detecting smaller objects like a line of
ants, YOLO isn’t the best choice here. Nevertheless, YOLO has gone through
several upgrades. Today, more than five versions of YOLO are being actively
utilized for a variety of use cases.
results, smart CAD systems come in handy in sharing and preserving the medical
history of patients. A CAD system is more accurate and expeditious when com-
pared to a human physician. Though technology can never replace human physi-
cians, it can always supplement their efforts, to smoothen the process. Countries
like Japan are heavily relying on such technologies. Healthy Brain Dock is a
Japanese brain screening system, which detects high-risk groups for Alzheimer’s
disease. It uses a traditional MRI system combined with technology to preclude and
detect the onset of asymptomatic brain diseases such as dementia and aneurysm.
Table 4.2 Deep learning algorithms mapped with their medical use cases
4.6.3 Mammography
Popularly known as mammogram (MG), is the process of using low-energy X-rays
for diagnosing and screening breast cancer [26]. The history of mammography
begins in 1913. Since then, there have been several advancements in this field. Still,
detecting tumors is a deadly task given their small size. Today, MG is a reliable
tool, however, the expertise of a physician is a must. Deep learning provides a two-
step solution here, which includes detection, segmentation, and classification.
CNN-based algorithms are tremendously valuable in such use cases for feature
extraction tasks. Innovations over a period of time have enabled these intelligent
systems to diagnose and detect cancer at early stages [27]. Classification algorithms
permit analysts to quickly determine the type of tumor. This helps start treatment at
Deep learning for streamlining medical image processing 61
4.6.4 Histopathology
Histopathology is the assessment of illness symptoms under a microscope, using a
mounted glass slide from a biopsy or surgical specimen. It is extensively used in the
identification of different diseases such as the presence of tumor in the kidney,
lungs, and breast [28]. Using dyes, tissue sections are stained to identify lung
cancer, Crohn’s disease, ulcers, etc. The samples are accumulated through endo-
scopy, colonoscopy, or adopting surgical procedures such as biopsy. A crucial
challenge in existing histopathology infrastructure is identifying disease growth at a
species level. Hematoxylin and Eosin (H&E) staining has played a significant role
in diagnosing cancer, however, identifying disease patterns by dying technique
requires competence [29]. Today, digital pathology has automated this tedious
challenge. Using histopathology images, deep learning is automating tasks like cell
segmentation, tumor classification, labeling and annotating, nucleus detection, etc.
Deep learning models have successfully simulated cell activities, using histo-
pathology images to predict the future conditions of the tissue.
4.6.5 Endoscopy
A long nonsurgical mounted camera is directly inserted through a cavity into the body
for visual examination of internal organs of the body. Endoscopy is a pretty mature test
that has been in practice for a long. It is best suited for diagnosing ulcers, inflammation,
celiac disease, blockages, gastroesophageal reflux disease, and sometimes cancerous
linkages. Even though, physicians treat patients with anesthesia before beginning
endoscopy, the test can be uncomfortable for most people. A painless non-invasive
inspection of the gastrointestinal tract can be done using a recent invention – Wireless
capsule endoscopy (WCE). As the name suggests, this capsule can be taken orally.
Deep learning comes into the picture after endoscopy images start appearing. Images
received from WCE are fed to deep learning algorithms. CNNs make real-time image
segmentation, detection, classification, and identification possible. From detecting
hookworm through WCE images to analyzing symptoms for predicting disease,
endoscopy has evolved a lot. Today, such solutions are used to detect cancer in early
stages by performing real-time analysis of tumors. VGG-16 is one of the popular
CNN-based algorithms for the diagnostic assessment of the esophageal wall.
4.6.7 Bio-signals
Apart from medical imaging techniques, bio-signaling techniques such as ECG,
EEG, PCG, PPG, EMG, SS, NSS, and a lot more are common. Electrocardiography
is one of the most common techniques for diagnosing cardiovascular disease. Deep
learning algorithms allow early detection of heart-related diseases by analyzing
ECG patterns. DNNs detect anomalies from electrocardiography scripts. They are
being utilized for electrocardiography interpretation, arrhythmia classification, and
systolic dysfunction detection. Such algorithms can work with data involving
hundreds of parameters, thus handling complex medical data in the most optimized
pattern. Bio-signals being intensely sensitive to body factors, DNNs can be used to
eradicate inconsequential labels from the dataset.
4.7.1 Segmentation
It is a crucial step in medical image analysis, as it enables researchers to focus on
key areas with relevant information. It is the process of dividing the image into
Deep learning for streamlining medical image processing 63
several regions each concentrating on certain features that must be taken care of
according to the researcher’s interest. While there are dozens of ways of classifying
segmentation techniques, some of the most prominent types on the basis of deep
learning include semantic level segmentation and instance level segmentation. U-Net,
a FCN allows semantic segmentation, while the latter (level segmentation) extends
R-CNN. Image segmentation finds its application in almost every image processing
workflow which we will discuss some of them in later sections. From the above
discussion, we conclude that image segmentation is one of the most significant use
cases of deep learning when it comes to medical image processing. At the same time,
it is the starting point for most of the medical image analysis workflows.
4.7.2 Classification
Using artificial intelligence for image classification is not a new concept. Since the
inception of digital image processing, artificial intelligence algorithms have been
rigorously tested against complex object detection and classification tasks.
However, with the dawn of machine learning algorithms, the results achieved were
remarkable. Deep learning algorithms took it to the whole next level. Today, object
detection and classification solutions are in high demand. Coupled with Internet of
Things (IoT) technology, deep learning-based image classification systems are
actively deployed in industries.
Image classification is the task of annotating input images based on business
logic. Certain algorithms such as Bayesian classifiers, neural network-based classi-
fiers, and geometric classifiers are easy to deploy and hence are more commercial.
However, CNN-based classifiers, though complex to use, provide high accuracies
over traditional machine learning-based classifiers [33]. CNN-based classifiers are at
par when it comes to medical image analysis. This is possible because of the layered
architecture of neural networks. Most CNN-based neural networks comprise a feature
extraction module and a classification module. The input image is initially passed
through convolutional and pooling layers to extract features. The output then is
passed through the classification module. Deep learning-based methods achieve
satisfying performance on low-resolution medical images. They are able to identify
different types of cells at multiple stages. Fluorescent images are generally used to
train deep learning algorithms in such scenarios.
Classifiers can also help in identifying diseases based on features extracted. By
feeding feature parameters, classifiers can differentiate sickle cells from normal
cells in case of anemia. It can identify white blood cells, leukemia, autoimmune
diseases, lung cancer subtypes, hepatic granuloma, and so on. Deep CNN-based
classifiers achieve higher accuracies compared to their competitors. Moreover, they
have high execution speeds and low throughput, which make them an ideal choice.
4.7.3 Detection
As discussed in the previous sections of this chapter, object detection is a crucial
step in any image analysis. A major challenge in detecting lesions is that multiple
false positives arise while performing object detection. In addition, a good
64 Deep learning in medical image processing and analysis
monitor biological processes such as the cell division cycle and other cellular
activities. RNN-based trackers track each microtubule activity in real-time and
deliver velocity-time graphs.
PSPNet
AlexNet
U-Net
GoogleNet
FRU-Net
VGGNet Classification Segmentation
ResNet MicroNet
LeNet SegNet
Deep learning
models Mask R-
CNN
CNN Image RNN
reconstruction Object tracking
U-Net Fast R-
CNN
LSTM
Figure 4.1 Network models for medical image analysis and processing
segmentation and feature extraction, are started. In case the sample contains lots of
low-resolution images, deep learning allows rapid and convenient image reconstruc-
tion. The process of medical image analysis in case of requirement of image
enhancements, is more intensive. Consider a brain tumor image sample if passed
through this process, after image segmentation, the image is rigorously scanned by the
neural network and passed on to the feature extraction engine. Abnormal tissues here
are identified based on the patterns in the image. It is at this point, neurologists can
identify certain logics such as the classification of tumor or start the analysis of the
entire image sample. Image segmentation is one of the key processes involved in
medical image processing and analysis. There existed several methods of image seg-
mentation techniques (Figure 4.2). A widely accepted listing is picturized in Figure 4.3.
An important sub-process throughout medical image analysis using deep
learning methodologies is defining region of interest (ROI). Detection and analysis
of morphological features, texture variations, shading variations, and gray-level
feature analysis are some of the outcomes of this process. It is after establishing
ROI, the evaluated image scans are fed to classification algorithms for labeling.
While there exist several classification methodologies, namely, pixel-wise
classification, sub-pixel-based classification, and object-based classification, a
DNN-based classification model is a go-to classification technique due to their high
accuracy along with ease of incorporating query and item features. Other algo-
rithms popular in this space include K-Means, ISODATA, and SOM which rely on
unsupervised classification techniques.
The aforementioned approach is common across all medical image analysis
use cases. The process can be condensed into three major steps:
1. Image formation: This part involves data acquisition and image reconstruc-
tion. In the case of real-time image analysis, the algorithm processes images
Canny edge
detection
Laplacian
Region based Gaussian
Split & merge growing Graph cut
Robert
operator
Edge based
Global Region based Gradient Sobel
thresholding based operator
Threshold based Image K-means Prewitt
Local segmentation Clustering operator
thresholding techniques
Watershed Using artificial Fuzzy C-
neural means
networks
Using partial (ANN)
differential
equations (PDE)
Image acquisition
Fetching sample images from
data source Image digitization
Noise filtering
Image calibration
Image
Image enhancement transformation
shading
illumination
Image visualization
feature-reconstruction
Classification
coming from an active data source. In the case of batch processing, a collection
of medical images sits on a central repository from where it is passed onto the
deep learning algorithm. Data acquisition broadly consists of the detection of
the image, converting it to a specific format and scale, preconditioning the
image, and digitalizing the acquired image signal. The obtained raw image
contains original data about captured image parameters, which is the exact
description of the internal characteristics of patients’ bodies. This is the pri-
mary source of image features and must be preserved, as it becomes the subject
Deep learning for streamlining medical image processing 69
where the model is exposed to new data known as testing data. This phase is
used to verify the accuracy of the model. Testing and training data may come
from the same dataset but are expected to be mutually exclusive in order to get
accurate results. A validation phase is one that lies between the testing and
training phases, where the performance of the model is gauged. Following we
tabulate some popular metrics to determine the accuracy of deep learning
algorithms (Table 4.4).
influence the predicted class such as risk factors (age of patient, genetic history of a
disease, consumption of intoxicants, mutations, and infections) which are quite
common in biological data, machine learning models stand as a robust and reliable
feature selector. Harmonization and de-noising techniques prevent overfitting to
enhance relevant features.
High upfront cost in establishing deep learning-based solutions for the con-
sumption of public is a major setback for institutions. AI-based solutions are
computationally expensive and therefore require expensive infrastructure. They
require highly skilled professionals at least in the initial phases. All these
arrangements come with a cost, which institutions are skeptical about. A brain dock
system designed by a Japanese company uses an MRI system for detecting
Alzheimer’s disease in a high-risk group. This revolutionary technology takes just
2 hours for the entire check-up. Results produced are further analyzed by a group of
physicians to propose advice concerning enhancement in daily lifestyle that can
mitigate such severe disease. While such a solution appears invaluable for society,
a healthy brain dock test can cost somewhere around 600,000 JPY.
Dimensionality reduction is another challenge when it comes to processing
unstructured data such as images. Medical images tend to contain more noise and
redundant features than a casual landscape picture. Moreover, advanced image-
capturing devices produce mixed outputs, which if not analyzed using
suitable algorithms can lead to a loss of opportunity. Treating a 3D computed
tomography image with regular algorithms can neglect a lot of data points which not
only limits our scope of analysis but also produces incorrect results. Currently, CNN-
based algorithms are unable to deliver promising results when it comes to 3D med-
ical images. Hence, a general approach is to break an image into several components
to imitate a 2D image and individually process it. The final result is summed up to
produce reports before patients. Certain image analysis algorithms perform a 3D
reconstruction of subcellular fabric to produce relevant medical results [31]. Though
this approach delivers satisfactory results, advanced algorithms are required to sim-
plify this two-step process. Due to this lengthy two-step process, feeding medical
records to a simple machine learning-based prediction system is far more efficient
than relying on the aforementioned algorithm.
Deep learning stands as a potential alternative for medical image segmentation
tasks. However, constant requirement discovery and solution improvement are
expected to fit it into complex use cases. Furthermore, enhancements in deep
learning and microscopy techniques will help develop intelligent systems that
deliver super-resolution high-content imaging with automatic real-time objective
image analysis. Predicting and diagnosing diseases using intelligent systems opens
up doors for sustainable medical treatment.
4.10 Conclusion
Among the endless possibilities offered by deep learning in the field of medical
image scanning, this chapter outlines some of the major breakthroughs deep
Deep learning for streamlining medical image processing 73
learning has caused in medical image processing and analysis. The fast and efficient
processing abilities of deep learning algorithms make it a revolutionary technology,
which can mitigate slow, error–prone, and labor-intensive image analysis tasks. This
chapter was an attempt to highlight how deep learning is streamlining the complex
process of medical image analysis to yield exemplary results. Even though medical
image analysis requires knowledge of various domains such as mathematics, com-
puter science, pharmacology, physics, biology, physiology, and much more, deep
learning systems have the ability to outshine a set of physicians. However, we believe
a deep learning system can never be a complete replacement for physical doctors, but
it can definitely serve as a ‘second set of eyes’, thus establishing a healthy coex-
istence between humans and intelligent machines. Physicians will always be required
to act as guides and supervisors. Physicians will be required to exhibit soft skills at all
times that will exercise a constructively critical approach utilizing the enormous
potential of intelligent systems while reducing the possibility of the scientific dys-
topian nightmare of the “machines in power”.
Deep learning systems are highly relevant and practical in the context of
developing nations where medical facilities are limited. In real context, they have
high execution speeds, provide significant cost reduction, better diagnostic accu-
racy with better clinical and operational efficiency, and are scalable along with
better availability. Such intelligent algorithms can be easily integrated into mobile
software applications which can touch remote locations, thus benefiting the masses
who otherwise were isolated because of geographical, economic, or political rea-
sons. These solutions can even be extended towards designing mental health
solutions such as scoring sleep health by monitoring EEGs to prevent the onset of
possible diseases [40].
Medical image research has a bright future. Deep learning solutions will
eventually use transfer learning and then meta-learning [41]. The amalgamation of
these technologies along with data augmentation, self-supervising learnings, rein-
forcement learning, and business domain adaptation will significantly improve the
current performance of neural networks and thus solve advanced use cases.
References
[1] Prabhavathy, P., Tripathy, B.K., and Venkatesan, M. Analysis of diabetic
retinopathy detection techniques using CNN models. In: S. Mishra, H.K.
Tripathy, P. Mallick, and K. Shaalan (eds.), Augmented Intelligence in
Healthcare: A Pragmatic and Integrated Analysis, Studies in Computational
Intelligence, vol. 1024, Springer. https://doi.org/10.1007/978-981-19-1076-
0_6
[2] Gupta, P., Bhachawat, S., Dhyani, K., and Tripathy, B.K. A study of gene
characteristics and their applications using deep learning, studies in big data
(Chapter 4). In: S. S. Roy and Y.-H. Taguchi (eds.), Handbook of Machine
Learning Applications for Genomics, Vol. 103, 2021. ISBN: 978-981-16-
9157-7, 496166_1_En
74 Deep learning in medical image processing and analysis
[3] Tripathy, B.K., Garg, N., and Nikhitha, P. In: L. Perlovsky and G. Kuvich
(eds.), Introduction to deep learning, cognitive information processing for
intelligent computing and deep learning applications, IGI Publications.
[4] Debgupta, R., Chaudhuri, B.B., and Tripathy B.K. A wide ResNet-based
approach for age and gender estimation in face images. In: A. Khanna, D.
Gupta, S. Bhattacharyya, V. Snasel, J. Platos, and A. Hassanien (eds.),
International Conference on Innovative Computing and Communications,
Advances in Intelligent Systems and Computing, vol. 1087, Springer,
Singapore, 2020, pp. 517–530, https://doi.org/10.1007/978-981-15-1286-5_44.
[5] Ravi Kumar Rungta, P.J. and Tripathy, B.K. A deep learning based approach
to measure confidence for virtual interviews. In: A.K. Das et al. (eds.),
Proceedings of the 4th International Conference on Computational
Intelligence in Pattern Recognition (CIPR), CIPR 2022, LNNS480,
pp. 278–291, 2022.
[6] Puttagunta, M. and Ravi, S. Medical image analysis based on deep learning
approach. Multimedia Tools and Applications, 2021;80:24365–24398.
https://doi.org/10.1007/s11042-021-10707-4
[7] Karan Maheswari, A.S., Arya, D., Tripathy, B.K. and Rajkumar, R.
Convolutional neural networks: a bottom-up approach. In: S. Bhattacharyya,
A.E. Hassanian, S. Saha, and B.K. Tripathy (eds.), Deep Learning Research
with Engineering Applications, De Gruyter Publications, 2020, pp. 21–50.
doi:10.1515/9783110670905-002
[8] Yu, H., Yang, L.T., Zhang, Q., Armstrong, D., and Deen, M.J. Convolutional
neural networks for medical image analysis: state-of-the-art, comparisons,
improvement and perspectives. Neurocomputing, 2021;444:92–110. https://
doi.org/10.1016/j.neucom.2020.04.157
[9] Kaul, D., Raju, H. and Tripathy, B. K. Deep learning in healthcare. In: D.P.
Acharjya, A. Mitra, and N. Zaman (eds.), Deep Learning in Data Analytics –
Recent Techniques, Practices and Applications), Studies in Big Data,
vol. 91. Springer, Cham, 2022, pp. 97–115. doi:10.1007/978-3-030-75855-
4_6.
[10] Alalwan, N., Abozeid, A., ElHabshy, A.A., and Alzahrani, A. Efficient 3D
deep learning model for medical image semantic segmentation. Alexandria
Engineering Journal, 2021;60(1):1231–1239. https://doi.org/10.1016/j.
aej.2020.10.046.
[11] Liu, Z., Jin, L., Chen, J. et al. A survey on applications of deep learning in
microscopy image analysis. Computers in Biology and Medicine,
2021;134:104523. https://doi.org/10.1016/j.compbiomed.2021.104523
[12] Adate, A. and Tripathy, B.K. S-LSTM-GAN: shared recurrent neural
networks with adversarial training. In: A. Kulkarni, S. Satapathy,
T. Kang, and A. Kashan (eds.), Proceedings of the 2nd International
Conference on Data Engineering and Communication Technology.
Advances in Intelligent Systems and Computing, vol. 828, Springer,
Singapore, 2019, pp. 107–115.
Deep learning for streamlining medical image processing 75
[13] Liu, X., Song, L., Liu, S., and Zhang, Y. A review of deep-learning-based
medical image segmentation methods. Sustainability, 2021;13(3):1224.
https://doi.org/10.3390/su13031224.
[14] Adate, A. and Tripathy, B.K. A survey on deep learning methodologies of
recent applications. In D.P. Acharjya, A. Mitra, and N. Zaman (eds.), Deep
Learning in Data Analytics – Recent Techniques, Practices and
Applications), Studies in Big Data, vol. 91. Springer, Cham, 2022, pp. 145–
170. doi:10.1007/978-3-030-75855-4_9
[15] Vaidyanathan, A., van der Lubbe, M. F. J. A., Leijenaar, R. T. H., et al. Deep
learning for the fully automated segmentation of the inner ear on MRI.
Scientific Reports, 2021;11(1):Article no. 2885. https://doi.org/10.1038/
s41598-021-82289-y
[16] Sihare, P., Bardhan, P., A.U.K., and Tripathy, B.K. COVID-19 detection
using deep learning: a comparative study of segmentation algorithms. In: K.
Das et al. (eds.), Proceedings of the 4th International Conference on
Computational Intelligence in Pattern Recognition (CIPR), CIPR 2022,
LNNS480, 2022, pp. 1–10.
[17] Yagna Sai Surya, K., Geetha Rani, T., and Tripathy, B.K. Social distance
monitoring and face mask detection using deep learning. In: J. Nayak, H.
Behera, B. Naik, S. Vimal, D. Pelusi (eds.), Computational Intelligence in
Data Mining. Smart Innovation, Systems and Technologies, vol. 281. Springer,
Singapore. https://doi.org/10.1007/978-981-16-9447-9_36
[18] Jungo, A., Scheidegger, O., Reyes, M., and Balsiger, F. pymia: a Python
package for data handling and evaluation in deep learning-based medical
image analysis. Computer Methods and Programs in Biomedicine,
2021;198:105796. https://doi.org/10.1016/j.cmpb.2020.105796
[19] Abdar, M., Samami, M., Dehghani Mahmoodabad, S. et al. Uncertainty
quantification in skin cancer classification using three-way decision-based
Bayesian deep learning. Computers in Biology and Medicine,
2021;135:104418. https://doi.org/10.1016/j.compbiomed.2021.104418
[20] Wang, J., Zhu, H., Wang, S.-H., and Zhang, Y.-D. A review of deep learning
on medical image analysis. Mobile Networks and Applications, 2020;26
(1):351–380. https://doi.org/10.1007/s11036-020-01672-7
[21] Ahmedt-Aristizabal, D., Mohammad Ali Armin, S.D., Fookes, C., and Lars
P. Graph-based deep learning for medical diagnosis and analysis: past, pre-
sent and future. Sensors, 2021;21(14):4758. https://doi.org/10.3390/
s21144758
[22] Çallı, E., Sogancioglu, E., van Ginneken, B., van Leeuwen, K.G., and
Murphy, K. Deep learning for chest X-ray analysis: a survey. Medical Image
Analysis, 2021;72:102125. https://doi.org/10.1016/j.media.2021.102125
[23] Shorten, C., Khoshgoftaar, T.M., and Furht, B. Deep learning applications
for COVID-19. Journal of Big Data, 2021;8(1):Article no. 18. https://doi.
org/10.1186/s40537-020-00392-9
[24] Gaur, L., Bhatia, U., Jhanjhi, N. Z., Muhammad, G., and Masud, M. Medical
image-based detection of COVID-19 using deep convolution neural
76 Deep learning in medical image processing and analysis
[36] Magadza, T. and Viriri, S. Deep learning for brain tumour segmentation: a
survey of state-of-the-art. Journal of Imaging, 2021;7(2):19. https://doi.org/
10.3390/jimaging702001
[37] Pramod, A., Naicker, H.S., and Tyagi, A.K. Machine learning and deep
learning: open issues and future research directions for the next 10 years. In
Computational Analysis and Deep Learning for Medical Care, Wiley, 2021,
pp. 463–490. https://doi.org/10.1002/9781119785750.ch18
[38] Ma, X., Niu, Y., Gu, L., et al. Understanding adversarial attacks on deep
learning based medical image analysis systems. Pattern Recognition,
2021;110:107332. https://doi.org/10.1016/j.patcog.2020.107332
[39] Ding, Y., Tan, F., Qin, Z., Cao, M., Choo, K.-K.R., and Qin, Z. DeepKeyGen:
a deep learning-based stream cipher generator for medical image encryption
and decryption. IEEE Transactions on Neural Networks and Learning
Systems, 2022;33(9):4915–4929. doi:10.1109/TNNLS.2021.3062754
[40] Nogales, A., Garcı́a-Tejedor, Á.J., Monge, D., Vara, J.S., and Antón, C. A
survey of deep learning models in medical therapeutic areas. Artificial
Intelligence in Medicine, 2021;112:102020. https://doi.org/10.1016/j.
artmed.2021.102020
[41] Singh, R., Bharti, V., Purohit, V., Kumar, A., Singh, A.K., and Singh, S.K.
MetaMed: few-shot medical image classification using gradient-based meta-
learning. Pattern Recognition, 2021;120:108111. https://doi.org/10.1016/j.
patcog.2021.108111
This page intentionally left blank
Chapter 5
Comparative analysis of lumpy skin disease
detection using deep learning models
Shikhar Katiyar1, Krishna Kumar1, E. Ramanujam1,
K. Suganya Devi1 and Vadagana Nagendra Naidu1
5.1 Introduction
Cattle are the most widespread species that provide milk, meat, and draft power to
humans and remain sizeable ruminant livestock [1]. The term “livestock” is
indistinct and may be defined broadly as any population of animals kept by humans
for a functional, commercial purpose [2]. Horse, Donkey, Cattle, Zebu, Bali cattle,
Yak, Water buffalo, Gayal, Sheep, Goat, Reindeer, Bactrian camel, Arabian camel,
Llama, Alpaca, Domestic Pig, Chicken, Rabbit, Guinea pig, etc., are the varieties of
livestock raised by the people [3]. Around 21 million people in India depend on
1
Department of Computer Science and Engineering, National Institute of Technology Silchar, India
80 Deep learning in medical image processing and analysis
livestock for their livelihood. Specifically, India has ranked first in cattle inventory
per the statistics of 2021 [4]. Brazil and China have the second and third-largest
cattle inventory rates. In addition, India ranks first in milk production per the 20th
livestock census of India report as shown in Table 5.1 [5]. In India, livestock
employed 8.8% of the total population and contributes around 5% of the GDP and
25.6% of the total Agriculture GDP [6].
Cattle majorly serve social and financial roles in societies. Over the world, more
than 1.5 billion cattle were present as per the report [7]. It has been raised primarily
for their family’s subsistence in local sales. Most cattle farmers supply these cattle to
the international markets in large quantities [8]. Around 40% of the world’s agri-
cultural output has been contributed by livestock which secures food production to
almost a billion people [9]. It has also been growing fast in the world, driven by
income growth and supported by structural and technical advances, particularly in the
agriculture sector. This growth and transformation have provided multiple opportu-
nities to the agricultural sector regarding poverty alleviation and food security.
Livestock is also considered a valuable asset to livestock owners for their
wealth, collateral credits, and security during financial needs. It has also been a
center for mixed farming systems as it consumes waste products during agricultural
and food processing, helps control insects and weeds, produces manure for fertili-
zation and conditioning fields, and provides draught power for plowing and trans-
portation [10]. In some places, livestock has been used as a public sanitation facility
to consume waste products; otherwise, it may pose severe pollution and public
health problems. In the world, livestock contributes 15% towards food energy and
25% towards dietary protein. Almost 80% of the illiterate and undernourished
people have a primary dependency on agriculture and the raising of livestock for
their daily needs [11]. Data from the Food and Agriculture Organization (FAO)
database on rural income generating activities (RIGA) shows that, in a sample of 14
countries, 60% of rural people raise livestock. A significant proportion of the rural
household’s livestock is to contribute to household income [12].
Comparative analysis of lumpy skin disease detection 81
infected cows. Direct causes of diseases are chemical poisons, parasites, fungi,
viruses, bacteria, nutritional deficiencies, and unknown causes. Additionally, the
well-being of cattle can also be influenced indirectly by elements like food, water,
and the surrounding environment. [15]. The detailed description and infection of
the diseases are as follows.
5.1.1.2 Ringworm
Ringworm is a hideous skin lesion, round and hairless caused by a fungal infection of
the hair follicle and the skin’s outer layer [17]. Trichophyton verrucosum is the most
prevalent agent infecting cattle, with other fungi being less common. Ringworm is a
zoonotic infection. Ringworm is uncommon in sheep and goats raised for meat.
5.1.1.4 Bluetongue
Bluetongue (BT) is a viral infection spread by vectors. It affects both wild and
domestic ruminants such as cattle, sheep, goats, buffaloes, deer, African antelope
species, and camels [19]. Although the Bluetongue virus (BTV) does not typically
cause visible symptoms in most animals, it can lead to a potentially fatal illness in a
subset of infected sheep, deer, and wild ruminants. It transmits primarily by a few
species of the genus Culicoides, insects that act as vectors. These vectors become
infected with the BTV when they feed on viraemic animals and subsequently
spread the infection to vulnerable ruminants.
physical abilities and numerous microscopic holes in the brain’s cortex, giving it a
spongy appearance. These illnesses lead to a decline in brain function, including
memory loss, personality changes, and mobility issues that worsen over time.
5.1.1.7 Anthrax
The spore-forming bacteria Bacillus anthracis causes anthrax [22]. Anthrax spores
in soil are highly resistant and can cause illness even years after an outbreak. Wet
weather or deep tilling bring the spores to the surface, and when consumed by
ruminants, the sickness emerges. Anthrax is found on all continents which is a
major cause of mortality in domestic and wild herbivores, most animals, and some
bird species.
Since the LSD outbreaks have been heavily in India, this chapter has a deep
insight into LSD detection using Artificial intelligence techniques, especially with
deep learning models. The details of the research work that deal already with other
skin disease detection, hybrid deep learning models proposed for LSD detection,
experimental analysis with the models proposed, and discussion are as follows.
The proposed model has two phases for the detection of LSD in cows. The phases
are data collection and classification using hybrid deep learning models, and the
detailed structure of the proposed system is shown in Figure 5.2.
Testing
Veterinary
Hospitals CNN2D
Data collection
CNN2D+GRU
Training
Prediction of Normal
Field Survey
and Lumpy Disease
of images Deep Learning Models
Labelling
Veterinary
doctor
addition to the gates of the LSTM unit and an activation function. GRU does
not have an exclusive memory cell rather it exposes the memory at each
time step.
To improve the performance of the CNN, the variants of RNN, such as
GRU and LSTM, are appended to the feature extracted from the CNN for
classification.
Output
Dropout (0.25)
Dropout (0.25)
Dropout (0.25)
Dense (256)
Dense (128)
Dense (64)
Dense (2)
Flatten
Output
MaxPooling2D (2,2)
MaxPooling2D (2,2)
MaxPooling2D (2,2)
Dropout (0.25)
Dropout (0.25)
Dropout (0.25)
Conv2D (32)
Conv2D (32)
Conv2D (64)
Dense (32)
Dense (2)
Flatten
Output
MaxPooling2D (2,2)
MaxPooling2D (2,2)
MaxPooling2D (2,2)
Lambda/Reshape
Dropout (0.25)
Dropout (0.25)
Dropout (0.25)
Conv2D (32)
Conv2D (32)
Conv2D (64)
LSTM (32)
Dense (32)
Dense (2)
Figure 5.5 A hybrid convolutional neural network (2D)+LSTM model
Output
MaxPooling2D (2,2)
MaxPooling2D (2,2)
MaxPooling2D (2,2)
Lambda/Reshape
Dropout (0.25)
Dropout (0.25)
Conv2D (32)
Dropout (0.25)
Conv2D (32)
Conv2D (64)
GRU (32)
Dense (32)
Dense (2)
Figure 5.6 A hybrid convolutional neural network (2D)+GRU model
has been used as mentioned in Figure 5.5 up to the second block of Conv2D, Max
pooling, and dropout layer. After that, a lambda layer is used to reshape the
structure of the tensor. Then it is fed as input to the LSTM of 32 units. Then the
sequenced output is fed as input to a dense layer of 32 units, dropout of 0.25, and a
final dense layer with sigmoid activation function to classify the images as Normal
(healthy) and Lumpy infected cows. Figure 5.5 shows the architecture of the hybrid
CNN+LSTM model.
5.4.5 Hyperparameters
The hyperparameters utilized to train and test the hybrid deep learning models are
shown in Table 5.2.
TP þ TN
Accuracy ¼ (5.1)
TP þ TN þ FP þ FN
TP
Precision ¼ (5.2)
TP þ FP
TP
Recall ¼ (5.3)
TP þ FN
2Precision Recall
F Measure ¼ (5.4)
Precision þ Recall
The performance of the proposed hybrid deep learning models is shown in
Table 5.3. The Conv2D has the highest performance of 99.61% and outperforms
the other models. However, accuracy alone does not contribute more to disease
classification. Thus, the precision and recall values are analyzed for performance
comparison. On comparing the precision and recall, the Conv2D+GRU scores
higher value and outperforms the other two models. This results in a higher
recognition rate of lumpy-infected disease and healthy cows than Conv2D and
Conv2D+LSTM models. The F-measure also evident that the Conv2D+GRU per-
forms better than the other two models.
The experimentation on training and testing with its loss and accuracy values
for the MLP, CNN2D, CNN2D+LSTM, and CNN2D+GRU has also been analyzed
and shown in Figures 5.7, 5.8, 5.9, and 5.10, respectively. Figures shows no chance
of overfitting and underfitting (concept of high bias and high variance). This proves
the efficiency of the proposed system in diagnosing lumpy skin disease detection
using AI concepts.
1.00 6
Train Train
Validation Validation
0.95 5
4
accuracy
0.90
3
loss
0.85 2
0.80 1
0
0 2 4 6 8 0 2 4 6 8
epoch epoch
0.90 0.4
accuracy
loss
0.85 0.3
0.80 0.2
0.75 0.1
0.70 0.0
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
epoch epoch
1.00 0.5
Train Train
Validation Validation
0.95 0.4
0.90 0.3
accuracy
loss
0.85 0.2
0.80 0.1
0.0
0 2 4 6 8 0 2 4 6 8
epoch epoch
1.00 0.5
Train Train
Validation Validation
0.95
0.4
0.90
accuracy
0.3
loss
0.85
0.2
0.80
0.1
0.75
0 2 4 6 8 0 2 4 6 8
epoch epoch
5.5 Conclusion
The foundation of every civilization or society is its health care system, which
ensures that every living thing receives an accurate diagnosis and effective treat-
ment. Today’s world is becoming technologically advanced and automated.
Therefore, this industry’s use of modern technology, machines, robotics, etc., is
both necessary and unavoidable. Thanks to technological advancements,
94 Deep learning in medical image processing and analysis
References
[1] Gerber PJ, Mottet A, Opio CI, et al. Environmental impacts of beef pro-
duction: review of challenges and perspectives for durability. Meat Science.
2015;109:2–12.
[2] Johnston J, Weiler A, and Baumann S. The cultural imaginary of ethical meat: a
study of producer perceptions. Journal of Rural Studies. 2022;89:186–198.
[3] Porter V. Mason’s world dictionary of livestock breeds, types and varieties.
CABI; 2020.
[4] Martha TR, Roy P, Jain N, et al. Geospatial landslide inventory of India—an
insight into occurrence and exposure on a national scale. Landslides.
2021;18(6):2125–2141.
[5] 20th Livestock Census Report; 2019 [updated 2019 Oct 18; cited 2022 Nov
30]. Department of Animal Husbandry and Dairying. Available from:
https://pib.gov.in/PressReleasePage.aspx?PRID=1588304.
[6] Neeraj A and Kumar P. Problems perceived by livestock farmers in utiliza-
tion of livestock extension services of animal husbandry department in
Jammu District of Jammu and Kashmir. International Journal of Current
Microbiology and Applied Sciences. 2018;7(2):1106–1113.
[7] USDA U. of A. Livestock and Poultry: World Markets and Trade; 2021.
[8] Lu CD and Miller BA. Current status, challenges and prospects for dairy
goat production in the Americas. Asian-Australasian Journal of Animal
Sciences. 2019;32(8_spc):1244–1255.
[9] Crist E, Mora C, and Engelman R. The interaction of human population, food
production, and biodiversity protection. Science. 2017;356(6335):260–264.
[10] Meissner H, Scholtz M, and Palmer A. Sustainability of the South African
livestock sector towards 2050. Part 1: worth and impact of the sector. South
African Journal of Animal Science. 2013;43(3):282–297.
[11] Hu Y, Cheng H, and Tao S. Environmental and human health challenges of
industrial livestock and poultry farming in China and their mitigation.
Environment International. 2017;107:111–130.
Comparative analysis of lumpy skin disease detection 95
[12] FAO F. Food and Agriculture Organization of the United Nations, Rome,
2018. http://faostat fao org.
[13] Bradhurst R, Garner G, Hóvári M, et al. Development of a transboundary
model of livestock disease in Europe. Transboundary and Emerging
Diseases. 2022;69(4):1963–1982.
[14] Brooks DR, Hoberg EP, Boeger WA, et al. Emerging infectious disease: an
underappreciated area of strategic concern for food security. Transboundary
and Emerging Diseases. 2022;69(2):254–267.
[15] Libera K, Konieczny K, Grabska J, et al. Selected livestock-associated
zoonoses as a growing challenge for public health. Infectious Disease
Reports. 2022;14(1):63–81.
[16] Grubman MJ and Baxt B. Foot-and-mouth disease. Clinical Microbiology
Reviews. 2004;17(2):465–493.
[17] Lauder I and O’sullivan J. Ringworm in cattle. Prevention and treatment
with griseofulvin. Veterinary Record. 1958;70(47):949.
[18] Bachofen C, Stalder H, Vogt HR, et al. Bovine viral diarrhea (BVD): from
biology to control. Berliner und Munchener tierarztliche Wochenschrift.
2013;126(11–12):452–461.
[19] Maclachlan NJ. Bluetongue: history, global epidemiology, and pathogenesis.
Preventive Veterinary Medicine. 2011;102(2):107–111.
[20] Collins SJ, Lawson VA, and Masters CL. Transmissible spongiform ence-
phalopathies. The Lancet. 2004;363(9402):51–61.
[21] O’Brien DJ. Treatment of psoroptic mange with reference to epidemiology
and history. Veterinary Parasitology. 1999;83(3–4):177–185.
[22] Cieslak TJ and Eitzen Jr EM. Clinical and epidemiologic principles of
anthrax. Emerging Infectious Diseases. 1999;5(4):552.
[23] Sultana M, Ahad A, Biswas PK, et al. Black quarter (BQ) disease in cattle
and diagnosis of BQ septicaemia based on gross lesions and microscopic
examination. Bangladesh Journal of Microbiology. 2008;25(1):13–16.
[24] Coetzer J and Tuppurainen E. Lumpy skin disease. Infectious Diseases of
Livestock. 2004;2:1268–1276.
[25] Suparyati S, Utami E, Muhammad AH, et al. Applying different resampling
strategies in random forest algorithm to predict lumpy skin disease. Jurnal
RESTI (Rekayasa Sistem dan Teknologi Informasi). 2022;6(4):555–562.
[26] Shivaanivarsha N, Lakshmidevi PB, and Josy JT. A ConvNet based real-
time detection and interpretation of bovine disorders. In: 2022 International
Conference on Communication, Computing and Internet of Things (IC3IoT).
IEEE; 2022. p. 1–6.
[27] Bhatt R, Sharma G, Dhall A, et al. Categorization and reorientation of
images based on low level features. Journal of Intelligent Learning Systems
and Applications. 2011;3(01):1.
[28] Allugunti VR. A machine learning model for skin disease classification
using convolution neural network. International Journal of Computing,
Programming and Database Management. 2022;3(1):141–147.
96 Deep learning in medical image processing and analysis
[29] Skin Disease Dataset; 2017 [cited 2022 Nov 30]. Dermatology Resource.
Available from: https://dermetnz.org.
[30] Ahsan MM, Uddin MR, Farjana M, et al. Image Data collection and
implementation of deep learning-based model in detecting Monkeypox dis-
ease using modified VGG16, 2022. arXiv preprint arXiv:220601862.
[31] Karthik R, Vaichole TS, Kulkarni SK, et al. Eff2Net: an efficient channel
attention-based convolutional neural network for skin disease classification.
Biomedical Signal Processing and Control. 2022;73:103406.
[32] Upadya P S, Sampathila N, Hebbar H, et al. Machine learning approach for
classification of maculopapular and vesicular rashes using the textural fea-
tures of the skin images. Cogent Engineering. 2022;9(1):2009093.
[33] Rony M, Barai D, Hasan Z, et al. Cattle external disease classification using
deep learning techniques. In: 2021 12th International Conference on
Computing Communication and Networking Technologies (ICCCNT). IEEE,
2021. p. 1–7.
[34] Saranya P, Krishneswari K, and Kavipriya K. Identification of diseases in
dairy cow based on image texture feature and suggestion of therapeutical
measures. International Journal of Internet, Broadcasting and
Communication. 14(4):173–180.
[35] Rathod J, Waghmode V, Sodha A, et al. Diagnosis of skin diseases using
convolutional neural networks. In: 2018 Second International Conference on
Electronics, Communication and Aerospace Technology (ICECA). IEEE,
2018. p. 1048–1051.
[36] Thohari ANA, Triyono L, Hestiningsih I, et al. Performance evaluation of
pre-trained convolutional neural network model for skin disease classifica-
tion. JUITA: Jurnal Informatika. 2022;10(1):9–18.
Chapter 6
Can AI-powered imaging be a replacement
for radiologists?
Riddhi Paul1, Shreejita Karmakar1 and Prabuddha Gupta1
Artificial Intelligence (AI) has a wide range of potential uses in medical imaging,
despite many clinical implementation challenges. AI can enhance a radiologist’s
productivity by prioritizing work lists, for example, AI can automatically examine
chest X-rays for pneumothorax and evidence of intracranial hemorrhage,
Alzheimer’s disease, and urinary stones. AI may be used to automatically quantify
skeletal maturity on pediatric hand radiographs, coronary calcium scoring, prostate
categorization through MRI, breast density via mammography, and ventricle seg-
mentation via cardiac MRI. The usage of AI covers almost the full spectrum of
medical imaging. AI is gaining traction not as a replacement for a radiologist but as
an essential companion or tool. The possible applications of AI in medical imaging
are numerous and include the full medical imaging life cycle, from picture pro-
duction to diagnosis to prediction of outcome. The availability of sufficiently vast,
curated, and representative training data to train, evaluate, and test algorithms
optimally are some of the most significant barriers to AI algorithm development
and clinical adoption, but they can be resolved in upcoming years through the
creation of data libraries. Therefore, AI is not a competitor, but a friend in need of
radiologists who can use it to deal with day-to-day jobs and concentrate on more
challenging cases. All these aspects of interactions between AI and human
resources in the field of medical imaging are discussed in this chapter.
Radiology is a medical specialty that diagnoses and treats illnesses using imaging
technology. Self-learning computer software called artificial intelligence (AI) can
aid radiology practices in finding anomalies and tumors, among other things. AI-
based systems have gained widespread adoption in radiology departments all over
the world due to their ability to detect and diagnose diseases more accurately than
human radiologists. Artificial intelligence and medical imaging have experienced
1
Amity Institute of Biotechnology, Amity University Kolkata, India
98 Deep learning in medical image processing and analysis
rapid technological advancements over the past ten years, leading to a recent con-
vergence of the two fields. Radiology-related AI research has advanced thanks to
significant improvements in computing power and improved data access [1].
The ability to be more effective thanks to AI systems is one of their biggest
benefits. AI can be used to accomplish much larger, more complex tasks as well as
smaller, repetitive ones more quickly. Whatever their use, AI systems are not con-
strained by human limitations and never get old. The neural networks that operate in
the brain served as the inspiration for the deep learning technique, which has large
networks in layers that can learn over time. In imaging data, deep learning can uncover
intricate patterns. In a variety of tasks, AI performance has advanced from being
subhuman to being comparable to humans, and in the coming years, AI performance
alongside humans will greatly increase human performance. For diagnosis, staging,
planning radiation oncology treatments, and assessing patient responses, cancer 3D
imaging can be recorded over time and space multiple times. Clinical work already
shows this to be true. A recent study found that there is a severe lack of radiologists in
the workforce, with 1.9 radiologists per million people in low-income countries and
97.9 in high-income nations, respectively. An expert clinician was tasked by UK
researchers with categorizing more than 3,600 images of hip fractures. According to
the study, clinicians correctly identified only 77.5% of the images, whereas the
machine learning system did so with 92% accuracy [2]. In a nutshell, AI is a savior for
global healthcare due to the constantly rising demand for radiology and the develop-
ment of increasingly precise AI-based radiology systems.
AQUISITION
PREPROCESSING
IMAGES
CLINICAL TASKS
INTEGRATED DIAGNOSTICS
REPORT
carrying out specific procedures to extract some useful information from it is known as
image processing. When implementing certain specified signal processing techniques,
the image processing system typically interprets all pictures as 2D signals [13].
The preprocessing steps include:
1. Converting all the images into the same format.
2. Cropping the unnecessary regions on images.
3. Transforming them into numbers for algorithms to learn from them (array of
numbers) [14].
Through preprocessing, we may get rid of undesired distortions and enhance
certain properties that are crucial for the application we are developing. Those
qualities could alter based on the application. For software to work properly and
deliver the required results, a picture must be preprocessed (Figure 6.1).
IMAGES—Following pre-processing and acquisition, we receive a clear pixel of
the picture, which the AI and deep learning utilize to compare with the patients’
radiographs and perform clinical duties and processes [15,16] (Figure 6.1).
CLINICAL TASKS—AI approaches are also the most effective in recognizing the
diagnosis of many sorts of disorders. The presence of computerized reasoning (AI)
as a means for better medical services provides new opportunities to recover patient
and clinical group outcomes, reduce expenses, and so on [17]. Individual care
providers and care teams must have access to at least three key forms of clinical
information to successfully diagnose and treat individual patients: the patient’s
health record, the quickly changing medical-evidence base, and provider instruc-
tions directing the patient care process.
The clinical tasks are further sub-divided into the following:
1. Detection—This includes automated detection of abnormalities like tumors
and metastasis in images. Examples can be detecting a lung nodule, brain
metastasis, or calcification in the heart.
2. Characterization—After detection, we look forward to characterizing the
result obtained. Characterization is done in the following steps:
(a) Segmentation: Detecting the boundaries of normal tissue and abnormality
(b) Diagnosis: Identifying the abnormalities whether they are benign or malignant.
(c) Staging: The observed abnormalities are assigned to different predefined
categories.
3. Monitoring—Detecting the change in the tissue over time by tracking
multiple scans (Figure 6.1).
INTEGRATED DIAGNOSTICS—The usage and scope of advanced practitioner
radiographers, who with the use of AI technologies can offer an instantaneous result to
the patient and referring doctor at the time of examination, may be expanded if AI is
demonstrated to be accurate in picture interpretation. The use of AI in medical imaging
allows doctors to diagnose problems considerably more quickly, encouraging early
intervention. Researchers found that by evaluating tissue scans equally well or better
than pathologists, AI can reliably detect and diagnose colorectal cancer [18] (Figure 6.1).
102 Deep learning in medical image processing and analysis
Figure 6.2 Radio imaging of the thorax using an Indian healthcare start-up,
Qure.ai [23]
1.2
1.1
1.0
0.9
0.8
0.7
6.6.3 Colonoscopy
Unidentified or incorrectly categorized colonic polyps may increase the risk of
colorectal cancer. Even while the majority of polyps start benign, they can even-
tually turn cancerous. Early identification and consistent use of powerful AI-based
solutions for monitoring are essential [27] (Figure 6.4).
Figure 6.4 The CTC pipeline: first, DICOM images are cleaned, and then colon
regions are segmented. Second, the 3D colon is reconstructed from
segmented regions, then the centerline may be extracted. Finally, the
internal surface of the colon can be visualized using different
visualization methods [28].
Figure 6.5 Using AI-based image enhancement to reduce brain MRI scan times
and improve signal to noise ratio [31]
6.6.5 Mammography
The interpretation of screening mammography is technically difficult. AI can help
with interpretation by recognizing and classifying microcalcifications [9]
(Figure 6.6).
Can AI-powered imaging be a replacement for radiologists? 107
Inp
ut
MR Patient
representation imaging data
CO
NV
Repeated
three
Sampling regions times
M Voxel
poo ax
lin
g
classification
Fully
Patch extraction connected
Threshold
Softmax
Tumor Non-tumor
Figure 6.8 (a) mpMR images obtained and (b) tumor segmentation performed by
a deep learning algorithm to create the probability map (from right to
left) [36]
network (CNN), a deep learning tool which uses it to locate the tumour and its
extent within the image [35]. An expert reader is used for training which is fol-
lowed by the independent reader to generate the algorithm result, and the related
probability map created by the algorithm (Figure 6.8). The model application is
trained with hundreds of cases of rectal cancer and the performance obtained was
comparable to human performance in the validation data set. Therefore, deep
learning tools can accelerate accurate identification and segmentation of tumours
from patient data [37].
Expert
knowledge
CT data
Training Annotated
data Abdominal data
CT scan
Deep
Abdominal CT learning
scan
program running on all abdominal CT images which will alert the radiologist on the
abnormal pancreas (Figure 6.9).
some challenges for the user. Each tool determines a varied quantity of features
from various categories,
3. Effect of acquisition and reconstruction
Each institution has its own set of reconstruction parameters and methods, with
potential variances among individual patients. All these factors have an impact
on image noise and texture, which in turn affects image characteristics.
Therefore, rather than reflecting different biological properties of tissues, the
features derived from images acquired at a single institution using a variety of
acquisition protocols or acquired at various institutions using a variation of
scanners in a wide range of patient populations can be affected by a combi-
nation of parameters [40]. Certain settings for acquisition and reconstruction
may yield unstable features, resulting in different values being derived from
successive measurements made under the same circumstances.
4. Human reluctancy
Both developing precise AI algorithms and comprehending how to incorporate
AI technologies into routine healthcare operations are difficult. Radiologists’
duties and responsibilities are subject to change. Despite the indicated preci-
sion and efficacy of algorithms, it is doubtful that they will ever be entirely
independent [41].
5. Inadequate IT infrastructure
Despite several AI applications in radiology, many healthcare organizations
have yet to begin the digital revolution. Their systems lack interoperability,
hardware has to be upgraded, and their security methods are out-of-date. The
use of AI in this situation may provide extra challenges [42].
6. Data integrity
The shortage of high-quality labeled datasets is a problem that affects all fields
and businesses, including radiology. It is difficult to get access to clean, labeled
data for training medical AI [42].
Healthcare providers should make sure that human experts continue to take the lead
in decision-making and that human–machine collaboration is effective. In order to
combat these issues, IT infrastructure must be gradually changed, ideally with
assistance from a consortium of experts. Many healthcare organizations are already
undergoing digital transformations and there is an increasing need for high-quality
information, it is only a matter of time before most datasets meet these criteria.
6.11 Conclusion
Since the 1890s, when X-ray imaging first gained popularity, medical imaging has
been a cornerstone of healthcare. This trend has continued with more recent
advancements in CT, MRI, and PET scanning. It is now feasible to identify
incredibly minute differences in tissue densities thanks to advancements in imaging
112 Deep learning in medical image processing and analysis
References
[1] Oren O, Gersh B, and Bhatt D. Artificial intelligence in medical imaging:
switching from radiographic pathological data to clinically meaningful
endpoints. The Lancet Digital Health 2020;2:E486–E488.
[2] Sandra VBJ. The electronic health record and its contribution to health-
care information systems interoperability. Procedia Technology 2013;9:
940–948.
[3] Driver C, Bowles B, and Greenberg-Worisek A. Artificial intelligence in
radiology: a call for thoughtful application. Clinical and Translational
Science 2020;13:216–218.
[4] Berlin L. Radiologic errors, past, present and future. Diagnosis (Berlin)
2014;1(1):79–84. doi:10.1515/dx-2013-0012. PMID: 29539959.
[5] Farooq K, Khan BS, Niazi MA, Leslie SJ, and Hussain A. Clinical Decision
Support Systems: A Visual Survey, 2017. ArXiv.
[6] Wainberg M, Merico D, Delong A, and Frey BJ. Deep learning in biome-
dicine. Nature Biotechnology 2018;36(9):829–838. doi:10.1038/nbt.4233.
Epub 2018 Sep 6. PMID: 30188539.
Can AI-powered imaging be a replacement for radiologists? 113
[23] Engle E, Gabrielian A, Long A, Hurt DE, and Rosenthal A. Figure 2: perfor-
mance of Qure.ai automatic classifiers against a large annotated database of
patients with diverse forms of tuberculosis. PLoS One 2020;15(1):e0224445.
[24] Kennedy S. AI model can help detect collapsed lung using chest X-rays. The
artificial intelligence model accurately detected pneumothorax, or a col-
lapsed lung, and exceeded FDA guidelines for computer-assisted triage
devices. News Blog:https://healthitanalytics.com/news/ai-model-can-help-
detect-collapsed-lung-using-chest-x-rays.
[25] Artificial Intelligence-Assisted Detection of Adhesions on Cine-MRI.
Master Thesis Evgeniia Martynova S1038931.
[26] Hamabe A, Ishii M, Kamoda R, et al. Artificial intelligence-based technol-
ogy to make a three-dimensional pelvic model for preoperative simulation of
rectal cancer surgery using MRI. Ann Gastroenterol Surg. 2022 May 11;6
(6):788–794. doi: 10.1002/ags3.12574.
[27] Tang X. The role of artificial intelligence in medical imaging research. BJR
Open 2019;2(1):20190031. doi: 10.1259/bjro.20190031. PMID: 33178962;
PMCID: PMC7594889.
[28] Alkabbany I, Ali AM, Mohamed M, Elshazly SM, and Farag A. An AI-based
colonic polyp classifier for colorectal cancer screening using low-dose
abdominal CT. Sensors 2022;22:9761.
[29] Kudo SE, Mori Y, Misawa M, et al. Artificial intelligence and colonoscopy:
current status and future perspectives. Digestive Endoscopy 2018;30:52–53.
[30] Ramasubbu R, Brown EC, Marcil LD, Talai AS, and Forkert ND. Automatic
classification of major depression disorder using arterial spin labeling MRI
perfusion measurements. Psychiatry and Clinical Neurosciences 2019;73:
486–493.
[31] Rudie JD, Gleason T, and Barkovich MJ. Clinical assessment of deep
learning-based super-resolution for 3D volumetric brain MRI. Radiology:
Artificial Intelligence 2022;4(2):e210059.
[32] Suh Y, Jung J, and Cho B. Automated breast cancer detection in digital
mammograms of various densities via deep learning. Journal of
Personalized Medicine 2020;10(4):E211.
[33] Hasan AS, Sagheer A, and Veisi H. Breast cancer classification using
machine learning techniques: a review. IJRAR 2021;9:590–594.
[34] Swathikan C, Viknesh S, Nick M, and Markar SR. Diagnostic performance
of artificial intelligence-centred systems in the diagnosis and postoperative
surveillance of upper gastrointestinal malignancies using computed tomo-
graphy imaging: a systematic review and meta-analysis of diagnostic accu-
racy. Annals of Surgical Oncology 2021;29(3):1977.
[35] Wang PP, Deng CL, and Wu B. Magnetic resonance imaging-based artificial
intelligence model in rectal cancer. World Journal of Gastroenterology
2021;27(18):2122–2130. doi: 10.3748/wjg.v27.i18.2122. PMID: 34025068;
PMCID: PMC8117733.
Can AI-powered imaging be a replacement for radiologists? 115
[36] Trebeschi S, Van Griethuysen JJM, Lambregts DMJ, et al. Deep learning for
fully-automated localization and segmentation of rectal cancer on multi-
parametric. Scientific Report 2017;7(1):5301.
[37] Trebeschi S, van Griethuysen JJM, Lambregts DMJ, et al. Deep learning for
fully-automated localization and segmentation of rectal cancer on multi-
parametric MR. Scientific Reports 2017;7(1):5301. doi: 10.1038/s41598-
017-05728-9. Erratum in: Sci Rep. 2018 Feb 2;8(1):2589. PMID: 28706185;
PMCID: PMC5509680.
[38] Joy Mathew C, David AM, and Joy Mathew CM. Artificial Intelligence and
its future potential in lung cancer screening. EXCLI J. 2020;19:1552–1562.
doi: 10.17179/excli2020-3095. PMID: 33408594; PMCID: PMC7783473.
[39] Dilmegani C. Top 6 Challenges of AI in Healthcare and Overcoming them in
2023. Updated on December 26, 2022 | Published on March 1, 2022.
[40] Rizzo S, Botta F, Raimondi S, et al. Radiomics: the facts and the challenges
of image analysis. European Radiology Experimental 2018;2(1):36. doi:
10.1186/s41747-018-0068-z. PMID: 30426318; PMCID: PMC6234198.
[41] Lebovitz S, Lifshitz-Assaf H, and Levina N. To incorporate or not to
incorporate AI for critical judgments: the importance of ambiguity in pro-
fessionals’ judgment process. Collective Intelligence, The Association for
Computing Machinery 2020.
[42] Waller J, O’connor A, Eleeza Raafat, et al. Applications and challenges of
artificial intelligence in diagnostic and interventional radiology. Polish
Journal of Radiology 2022;87: e113–e117.
[43] Mun SK, Wong KH, Lo S, Li Y, and Bayarsaikhan S. Artificial intelligence
for the future radiology diagnostic service. Frontiers in Molecular
Biosciences 2021;7:Article 614258.
[44] Wagner M, Namdar K, Biswas A, et al. Radiomics, machine learning, and
artificial intelligence—what the neuroradiologist needs to know.
Neuroradiology 2021;63:1957–1967.
[45] Koçak B, Durmaz EŞ, Ateş E, and Kılıçkesmez Ö. Radiomics with artificial
intelligence: a practical guide for beginners. Diagnostic and Interventional
Radiology 2019;25(6):485–495. doi: 10.5152/dir.2019.19321. PMID:
31650960; PMCID: PMC6837295.
This page intentionally left blank
Chapter 7
Healthcare multimedia data analysis algorithms
tools and techniques
Sathya Raja1, V. Vijey Nathan1 and Deva Priya Sethuraj1
In the domain of Information Retrieval (IR), there exists a number of models which
is used for different sorts of applications. The extraction of multimedia is one of the
types which specifically deals with the handling of multimedia data with different
types of tools and techniques. There are various techniques for handling multi-
media data such as feature handling, extraction, and selection. The features selected
by these techniques have been classified using machine learning and deep learning
techniques. This chapter provides complete insights into the audio, video, and text
semantic descriptions of the multimedia data with the following objectives:
(i) Methods
(ii) Data summarization
(iii) Data categorization and its media descriptions
Upon considering this organization, the entire chapter has dealt with as a case
study depicting feature extraction, merging, filtering, and data validation.
7.1 Introduction
The information retrieval (IR) domain is considered an essential paradigm in dif-
ferent real-time applications. The advancement in data retrieval techniques was
established more than five thousand years ago. In practice, the intent of the data
retrieval to that of information retrieval has been raised with the accordance of
model development, process analysis, and data interpretation and evaluation. One
of the primary forms of data that have multiple supportable formats is multimedia
data. This data utilizes different information retrieval models to establish a parti-
cular decision support system. In a specific context, feature-based analysis plays a
significant role in data prediction and validation. The only advent is that it must
adapt to that of the particular database community and the modular applications in
which it deals with the formats.
1
Department of Computer Science and Engineering, SRM TRP Engineering College, India
118 Deep learning in medical image processing and analysis
1. Feature extraction
2. Filtering
3. Categorization
The first step in MMIR, which is pretty simple and obvious, is feature
extraction. The general goal of this particular step can be achieved by completing
one but two processes, namely, summarization and pattern detection. Before going
over anything, we need a not-accurate summary of what we are onto. That is the
summarization process. It takes whatever media it has and summarizes it. The next
one is pattern detection. Here we use either auto-correlation or cross-correlation to
detect the patterns.
The second step in MIMIR is merging and filtering. As we are to feed multi-
media datasets, the pool will likely be a cluster of all available media formats. This
step ensures that every relevant data gets into the algorithm by properly merging
and filtering them. It sets multiple media channels, and each channel has a label on
the supposed data going in. Then it uses a simple filtering method, such as factor
analysis, to the more complex one, such as Kalman filter, to effectively filter and
merge the descriptions.
The last step in MIMIR is categorization. In this step, we can choose any ML
form, as one always performs better than another, respective to the given dataset.
As we have an abundance of ML classifiers, we can choose the one that will likely
give us acceptable results. We can also let the algorithm choose the classifier using
tools such as Weka, data miner, R, and Python.
The process of research practice and its supportive culture has become
blooming with the process of handling different types of data. The supporting
types are having different issues with the data processing platforms which are
suited for analysis. Also, the utilization of data-driven models is increasing
daily with its available metrics. Metric-based data validation and extraction is
one of the tedious tasks, which certainly makes the data suitable for analysis.
The algorithmic models may vary, but the aspect that must be considered is
Healthcare multimedia data analysis algorithms tools and techniques 119
easy. In the present stages of study, the designers choose their way of repre-
senting and handling the data to a certain extent, especially [5]:
● Design of decision support systems (DSS) to provide a complete service.
● To utilize the system effectively to communicate with the professionals, this
states the expectations behind the system.
● To enhance the researchers to effectively utilize the model in data integration,
analysis, and spotting relevant multimedia data.
The extraction of multimedia data sources is analyzed with efficient forms of
data analysis and linguistic processes. These methods can be efficiently organized
into three such groups:
1. Methods suitably used for summarizing the media data are precisely the result
of the feature extraction process.
2. Methods and techniques for filtering out the media content and its sub-processes.
3. Methods that are suitable for categorizing the media into different classes and
functions.
7.3 Methodology
Now, let us discuss in detail the methods available to perform retrieval based on
multimedia information retrieval.
Feature extraction
Image
Visual features Text annotation
collection
Multi-dimensional indexing
Query processing
Retrieval engine
Query interface
User
Images
Color
Feature extraction histogram
“Appropriate”
mapping
User interaction
Search
photo “Decision”
collage filtering process Query
● Blob segment with different imaging properties at the curvature points deter-
mine similar cases with image analysis properties.
● Ridge analysis makes the functions of two variables to determine the set of
curvature points at least at one single dimension.
● Decision Tree—a decision support tool that uses a more tree-like model of
decisions, their conditions, and their possible consequences, combining event
outcomes and utility. It is one of the ways to display an algorithm that only
contains conditional control statements.
Age Sex cpType trestbps chol fbs restecg thalach exang oldpeak slope ca thal classlabel
Influence factors
1.0
0.9
0.8
0.7
0.6
Weight
0.5
0.4
0.3
0.2
0.1
0.0
Influencing rates
_7
_6
_8
_5
_2
_9
11
_3
12
10
_4
_1
13
14
16
15
18
20
21
17
Model optimization
Prediction accuracy Bounds
0.725
0.7
0.675
Value 0.65
0.625
0.6
0.575
0.55
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0
Number of situations
Figure 7.5 Detection curve over the observed situations of the disease
Confidence levels
no yes
27.5
25.0
22.5
20.0
17.5
Count
15.0
12.5
10.0
7.5
5.0
2.5
0.0
10% – 20%
20% – 30%
30% – 40%
40% – 50%
50% – 60%
60% – 70%
70% – 80%
80% – 90%
90% – 100%
Confidence of failure
7.5 Applications
The variants of text-based models are significantly used in different sectors for
conceptual analysis of text design and extraction. A significant analysis must be
128 Deep learning in medical image processing and analysis
Risk levels
no yes
35
30
25
Count
20
15
10
5
0
10% – 20%
20% – 30%
30% – 40%
40% – 50%
50% – 60%
60% – 70%
70% – 80%
80% – 90%
90% – 100%
Risk of failure
made to make the data exact and confirm a decision at the best level [9].
Applications include bioinformatics, signal processing, content-based retrieval, and
speech recognition platforms.
● Bio-informatics is concerned with biological data analysis with complete
model extraction and analysis. The data may be in a semi or unstructured
format.
● Bio-signal processing concerns the signals concerning living beings in a given
environment.
● Content-based image retrieval deals with the search of digital images for the
given environment of extensive data collection.
● Facial recognition system concerned with activity recognition for the given
platform in the sequence of data frames.
● Speech recognition system transforms speech to text as recognized by
computers.
● Technical chart analysis a market data analysis usually falls under this cate-
gory of concern. This can be of type chart and visual perception analysis.
7.6 Conclusion
Information analysis from different formats of data is one of the tedious tasks.
Analyzing and collecting those variants of data need to be considered a challenging
task. Most of the modeling in multimedia data follows significant IR-based mod-
eling to bring out the essential facts and truths behind it. In this chapter, we have
discussed the different forms of IR models, tools, and applications with an example
case study illustrating the flow of analysis of medical data during the stages of the
Healthcare multimedia data analysis algorithms tools and techniques 129
modeling process. In the future, the aspects of different strategies can be discussed
in accordance with the level of data that can be monitored with various tools and
applications.
References
[1] Hanjalic, A., Lienhart, R., Ma, W. Y., and Smith, J. R. (2008). The holy grail
of multimedia information retrieval: so close or yet so far away?
Proceedings of the IEEE, 96(4), 541–547.
[2] Kolhe, H. J. and Manekar, A. (2014). A review paper on multimedia infor-
mation retrieval based on late semantic fusion approaches. International
Journal of Computer Applications, 975, 8887.
[3] Raieli, R. (2013, January). Multimedia digital libraries handling: the organic
MMIR perspective. In Italian Research Conference on Digital Libraries
(pp. 171–186). Springer, Berlin, Heidelberg.
[4] Rüger, S. (2009). Multimedia information retrieval. Synthesis Lectures on
Information Concepts, Retrieval, and Services, 1(1), 1–171.
[5] Khobragade, M. V. B., Patil, M. L. H., and Patel, M. U. (2015). Image
retrieval by information fusion of multimedia resources. International
Journal of Advanced Research in Computer Engineering & Technology
(IJARCET), 4(5), 1721–1727.
[6] Sangale, A. P. and Durugkar, S. R. (2014). A review on circumscribe based
video retrieval. International Journal, 4(11), 34–44.
[7] Aslam, J. A. and Montague, M. (2001, September). Models for metasearch.
In Proceedings of the 24th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval (pp. 276–284).
[8] Benavent, X., Garcia-Serrano, A., Granados, R., Benavent, J., and de Ves, E.
(2013). Multimedia information retrieval based on late semantic fusion
approaches: experiments on a wikipedia image collection. IEEE
Transactions on Multimedia, 15(8), 2009–2021.
[9] Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. (2006). Content-based mul-
timedia information retrieval: state of the art and challenges. ACM
Transactions on Multimedia Computing, Communications, and Applications
(TOMM), 2(1), 1–19.
[10] Asuncion, A. and Newman, D. (2007). UCI Machine Learning Repository.
[11] Saleena, N. (2018). An ensemble classification system for twitter sentiment
analysis. Procedia Computer Science, 132, 937–946.
[12] Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., and Iglesias, C. A.
(2017). Enhancing deep learning sentiment analysis with ensemble techni-
ques in social applications. Expert Systems with Applications, 77, 236–246.
[13] Hussein, D. M. E. D. M. (2018). A survey on sentiment analysis challenges.
Journal of King Saud University—Engineering Sciences, 30(4), 330–338.
[14] Heikal, M., Torki, M., and El-Makky, N. (2018). Sentiment analysis of Arabic
tweets using deep learning. Procedia Computer Science, 142, 114–122.
130 Deep learning in medical image processing and analysis
[15] Al-Thubaity, A., Alqahtani, Q., and Aljandal, A. (2018). Sentiment lexicon
for sentiment analysis of Saudi dialect tweets. Procedia Computer Science,
142, 301–307.
[16] Boudad, N., Faizi, R., Thami, R. O. H., and Chiheb, R. (2018). Sentiment
analysis in Arabic: a review of the literature. Ain Shams Engineering
Journal, 9(4), 2479–2490.
[17] Singh, P., Dwivedi, Y. K., Kahlon, K. S., Sawhney, R. S., Alalwan, A. A.,
and Rana, N. P. (2020). Smart monitoring and controlling of government
policies using social media and cloud computing. Information Systems
Frontiers, 22(2), 315–337.
[18] Anandarajan, M., Hill, C., and Nolan, T. (2019). Practical text analytics
Maximizing the Value of Text Data. (Advances in Analytics and Data
Science (vol. 2, pp. 45–59). Springer.
[19] Chakraborty, K., Bhattacharyya, S., Bag, R., and Hassanien, A. E. (2018,
February). Comparative sentiment analysis on a set of movie reviews using
deep learning approach. In International Conference on Advanced Machine
Learning Technologies and Applications (pp. 311–318). Springer, Cham.
[20] Pandey, S., Sagnika, S., and Mishra, B. S. P. (2018, April). A technique to
handle negation in sentiment analysis on movie reviews. In 2018
International Conference on Communication and Signal Processing
(ICCSP) (pp. 0737–0743). IEEE.
Chapter 8
Empirical mode fusion of MRI-PET images
using deep convolutional neural networks
N.V. Maheswar Reddy1, G. Suryanarayana1, J. Premavani1
and B. Tejaswi1
In this chapter, we develop an image fusion method for magnetic resonance ima-
ging (MRI) and positron emission tomography (PET) images. This method
employs empirical mode decomposition (EMD) based on morphological filtering
(MF) in a deep learning environment. By applying our resolution enhancement
neural network (RENN) on PET source images, we obtain the lost high-frequency
information. The PET-RENN recovered HR images and MRI source images are
then subjected to bi-dimensional EMD to generate multiple intrinsic mode func-
tions (IMFs) and a residual component. Morphological operations are applied to the
intrinsic mode functions and residuals of MRI and PET images to obtain the fused
image. The fusion process involves a patch-deep fusion technique instead of a
pixel-deep fusion technique to reduce spatial artifacts introduced by pixel-wise
maps. The results of our method are evaluated on various datasets and compared
with the existing methods.
8.1 Introduction
Positron emission tomography (PET) produces an image with functional data that
depicts the metabolism of various tissues. However, PET images cannot contain
structural information about tissues and have limited spatial resolution. On the
other hand, magnetic resonance imaging (MRI), a different non-invasive imaging
technique, offers strong spatial resolution information about the soft tissue struc-
ture. However, gray color information that indicates the metabolic function of
certain tissues is absent in MRI images [1]. The fusion of MRI and PET can deliver
complementary data useful for better clinical diagnosis [2].
Image fusion is the technique of combining two or more images together to
create a composite image that incorporates the data included in each original
1
Electronics and Communications Engineering, Velagapudi Ramakrishna Siddhartha Engineering
College, India
132 Deep learning in medical image processing and analysis
EMD
EMD
image [3–7]. There are three types of techniques in image fusion, namely, spatial
domain fusion, transform domain fusion, and deep learning techniques [8].
Principal component analysis (PCA) and average fusion are simple spatial fusion
techniques. In these techniques, the output image is directly obtained by fusing
the input images. Due to this, spatial domain fusion techniques produce degra-
dation and distortion in the fused image. Hence, the fused images produced by
spatial domain fusion techniques are less efficient compared to transform domain
fusion techniques [8].
In transform domain techniques, the input images are first transformed from
the spatial domain technique to the frequency domain prior to fusion. Discrete and
stationary wavelet transforms are primarily employed in transformed domain
techniques. These techniques convert the input image sources into low–low, low–
high, high–low, and high–high frequency bands which are referred to as wavelet
coefficients. However, these methods suffer from translational invariance problems
leading to distorted edges in the fused image [9].
Deep learning techniques for image fusion have been popularized in recent
times due to their dominance over the existing spatial and transformed domain
techniques. Zhang et al. [10] proposed a convolution neural network for esti-
mating the features of input source images. In the obtained image, the input
source images are fused region by region. The hierarchical multi-scale feature
fusion network is initiated by Lang et al. [11]. They used this technique for
extracting multi features from input images [11]. In this chapter, we develop an
MRI-PET fusion model in a deep learning framework. The degradation in PET
low-resolution images is reduced by employing PET-RENN. The input image
sources are extracted as IMFs and residual components by applying EMD as
described in Figure 8.1. Morphological operations are applied to the IM func-
tions and residues. PETRNN is used to recover higher-resolution images from
lower-resolution of PET images [12].
Empirical mode fusion of MRI-PET images 133
8.2 Preliminaries
8.2.1 Positron emission tomography resolution
enhancement neural network (PET-RENN)
G(x,y) I(x,y)
INPUT OUTPUT
PETRNN IMAGE (HR)
IMAGE (LR) (m/a,n/a) (m,a)
Due to the rapid progress in image processing technology, there has been an
increase in the requirement for higher-resolution scenes and videos. As shown in
Figure 8.2, PETRNN technique produces a higher-resolution (HR) image from a
lower resolution (LR) image. In our work, PET-RENN technique is used to recover
a higher-resolution images from a lower-resolution PET image sources. Let G(x, y)
be the input image with a size (m/a, n/a). When PETRNN is applied to the input
image, then it is converted to I(x, y) with a size of (m, a).
Jhang et al. [13] proposed the PET-RENN technique and explained multiple
approaches to the PET-RENN technique. The approaches are construction-based
methods, learning-based methods, and interpolation-based methods. Basically
learning-based method yields better accurate results. Deep learning-based PET-
RENN techniques have been popularized in recent times. Insight these techniques,
multiple convolution neural networks are developed to accept the lower resolution
of input image sources. After that, these convolution layers convert lower-
resolution images to higher-resolution images.
ðx; yÞ ! Iðx; yÞ (8.1)
As far as we know among the BEMD methods currently in use, the EMD approach
[8] has the fastest time approach for a greyscale image’s decomposition. It employs
the envelopes for estimating approach depending on statistics filters. Instead of
computing the distance between neighbor maxima/minima, EMD uses the average
maxima distance as the filter size [14]. However, it is only intended to interpret.
Fringe patterns for single-channel images. In this study, we provide an improved
fast empirical mode decomposition method (EFF-EMD) modification-based multi-
channel bidimensional EMD method (MF-MBEMD). Here the MF-MBEMD pro-
duces the envelope surfaces of a multi-channel (MC) image. This allows for the
134 Deep learning in medical image processing and analysis
In this section, we discuss the proposed EMD fusion of MRI-PET images using
deep networks with the help of a block diagram, as outlined in Figure 8.3.
8.4.1 EMD
Let M(x, y) be the input image of MRI and G(x, y) be the input image of PET.
But PET images are suffered from low resolution. Hence, we apply the
PETRNN technique to recover the high resolution of PET images from a lower
resolution. We obtained the image I(x, y). When we apply the EMD technique to
Empirical mode fusion of MRI-PET images 135
PETRENN
EMD EMD
DECOMPOSITION DECOMPOSITION
FUSED IMAGE
the input images M(x, y), I(x, y) then the input images splits into IMFs and a
residual component.
EM ðI Þ ! ½IIMF1 IIMF2 . . . . . . . . . . . . Iresidue (8.7)
EM ðM Þ ! ½MIMF1 MIMF2 . . . . . . . . . Mresidue (8.8)
Figure 8.5 Fused results of various techniques on test set-1: (c), (d), and
(e) images are fused images of (a) MRI and (b) PET. (c) is a couple
dictionary learning fusion method image (CDL), (d) is coupled
featured learning fusion method image (CFL), and (e) is our method.
Empirical mode fusion of MRI-PET images 137
Figure 8.6 Fused results of various techniques on test set-2: (c), (d), and
(e) images are fused images of (a) MRI and (b) PET. (c) is a couple
dictionary learning fusion method image (CDL), (d) is coupled
featured learning fusion method image (CFL), and (e) is our method.
Figure 8.7 Fused results of various techniques on test set-3: (c), (d), and
(e) images are fused images of (a) MRI and (b) PET. (c) is a couple
dictionary learning fusion method image (CDL), (d) is coupled
featured learning fusion method image (CFL), and (e) is our method.
Figure 8.8 Fused results of various techniques on test set-4: (c), (d), and
(e) images are fused images of (a) MRI and (b) PET. (c) is couple
dictionary learning fusion method image (CDL), (d) is coupled
featured learning fusion method image (CFL), and (e) is our method.
8.6 Conclusion
We introduced a unique EMD-based image fusion method approach based on
deep networks for generating superior fusion images. With multi-bidimensional
EMD we generate multiple IM functions and a residual component from the
input source images. This enables us to extract salient information from PET
and MR images.
References
[1] Zhu, P., Liu, L., and Zhou, X. (2021). Infrared polarization and intensity
image fusion based on bivariate BEMD and sparse representation.
Multimedia Tools and Applications, 80(3), 4455–4471.
[2] Bevilacqua, M., Roumy, A., Guillemot, C., and Alberi-Morel, M. L. (2012).
Low-complexity single-image super-resolution based on nonnegative
neighbor embedding. In: Proceedings British Machine Vision Conference,
pp. 135.1–135.10.
[3] Pan, J. and Tang, Y.Y. (2016). A mean approximation based bidimensional
empirical mode decomposition with application to image fusion. Digital
Signal Processing, 50, 61–71.
Empirical mode fusion of MRI-PET images 139
[4] Li, H., He, X., Tao, D., Tang, Y., and Wang, R. (2018). Joint medical image
fusion, denoising and enhancement via discriminative low-rank sparse dic-
tionaries learning. Pattern Recognition, 79, 130–146.
[5] Ronneberger, O., Fischer, P., and Brox, T. (2015, October). U-net: con-
volutional networks for biomedical image segmentation. In International
Conference on Medical Image Computing and Computer-assisted
Intervention (pp. 234–241). Springer, Cham.
[6] Daneshvar, S. and Ghassemian, H. (2010). MRI and PET image fusion by
combining IHS and retina-inspired models. Information Fusion, 11(2), 114–123.
[7] Ma, J., Liang, P., Yu, W., et al. (2020). Infrared and visible image fusion via
detail preserving adversarial learning. Information Fusion, 54, 85–98.
[8] Ardeshir Goshtasby, A. and Nikolov, S. (2007). Guest editorial: Image
fusion: advances in the state of the art. Information Fusion, 8(2), 114–118.
[9] Ma, J., Ma, Y., and Li, C. (2019). Infrared and visible image fusion methods
and applications: a survey. Information Fusion, 45, 153–178.
[10] Liu, Y., Liu, S., and Wang, Z. (2015). A general framework for image fusion
based on multi-scale transform and sparse representation. Information
Fusion, 24, 147–164.
[11] Li, H., Qi, X., and Xie, W. (2020). Fast infrared and visible image fusion
with structural decomposition. Knowledge-Based Systems, 204, 106182.
[12] Yeh, M.H. (2012). The complex bidimensional empirical mode decom-
position. Signal Process 92(2), 523–541.
[13] Zhang, J., Chen, D., Liang, J., et al. (2014). Incorporating MRI structural
information into bioluminescence tomography: system, heterogeneous
reconstruction and in vivo quantification. Biomedical Optics Express, 5(6),
1861–1876.
[14] Zhang, Y., Brady, M., and Smith, S. (2001). Segmentation of brain MR images
through a hidden Markov random field model and the expectation-
maximization algorithm. IEEE Transactions on Medical Imaging, 20(1), 45–57.
This page intentionally left blank
Chapter 9
A convolutional neural network for
scoring of sleep stages from raw single-channel
EEG signals
A. Ravi Raja1, Sri Tellakula Ramya1, M. Rajalakshmi2 and
Duddukuru Sai Lokesh1
9.1 Introduction
For good health, sleep is a crucial factor in everyone’s life. A complex biological
process in which the condition of both body and mind are inactive or state of
irresponsiveness is termed as sleep. Healthy sleep improves human health in a
physical way and makes the person more stable in their respective mental states or
mental processes. However, nowadays, many large portions of the population are
unable to sleep regularly. Improper quality of sleep will weaken your body and give
1
ECE Department, V R Siddhartha Engineering College, India
2
Department of Mechatronics Engineering, Thiagarajar College of Engineering, India
142 Deep learning in medical image processing and analysis
9.3 Methodology
9.3.1 Sleep dataset
A multicenter cohort research dataset called SHHS [23] is used in this proposed
work. “American-National-Heart-Lung and Blood-Institute” initiated this dataset
study to determine cardiovascular diseases which are associated with breathing. In
this chapter, the dataset study consists of two different records of polysomnography
signals. The first polysomnographic record, SHHS-1 is only used in this proposed
work because it consists of signals which are sampled at 125–128 Hz frequency.
SHHS-1 dataset includes around 5,800 polysomnographic records of all patients.
The polysomnographic record includes various channels such as C4-A1, and C3-A2
EEG channels (two EEG channels), 1-ECG channel, 1-EMG channel, 2-EEG
channels, and other plethysmography channels.
These polysomnographic records are manually scored by field specialists
relying on Rechtschaffen-Kales (R&K) rules.
Each record in this dataset was scored manually per 30 s epoch for sleep
stages. They are several sleep stages according to R&K rules, such as Wake-stage,
N1-stage, N2-stage, N3-stage, and N4-stage, which is also referred to as non-REM
sleep stage and REM sleep stage. Detailed information about sleep scoring manu-
ally is provided in this chapter [24].
9.3.2 Preprocessing
A significant “wake” phase, first before the patient falls asleep and the other after
he or she awakes, is recorded in most polysomnographic data. These waking per-
iods are shortened in length, such as the number of epochs before and after sleep
does not exceed that of most commonly represented other sleep stage classes,
Because the accessible EEG signal is symmetrical, such that those signals produce
equivalent results. The EEG channel named C4-A1 is used in the following pro-
posed work. Stages N4 and N3 are consolidated into a single sleep-stage N3, as
indicated in the AASM guidelines [25]. Even though they may be anomalies, some
patients who have no epoch associated with a particular sleep stage are omitted.
Table 9.1 shows the summary of the number of epochs (and their proportional
importance) of every stage and total epochs. Classes are extremely uneven, as per
the PSG study. Stage N1 has a deficient representation. The EEG readings are not
preprocessed in any manner.
Subsamp...
Filters of size n
Convolution
Wake
N1 Convolutional +ReLU
N2 Max pooling
Fully connected + ReLU
N3 Sleep Stages
REM
36×36×3
64×64×5
128 ×128×7
128 ×128×7
9.3.4 Optimization
As a cost function, multiclass cross-entropy was performed, and minimum batch
size training was used for optimizing the parameters of the weights and biases.
Consider w be all the trainable parameters where s signifies the size of the mini-
mum batch of the training sample. Take ß = {yk(0), k [[1, s]]} is a minimum batch
of training samples, with {mk, k [[1, B]]} representing one-hot-encoded target
class and the range { xk, k [[1,s]] } representing the output of networks connected
with the yk(0) in ß. The minibatch cost C expression is written in (9.2).
Xs
Cðw; XÞ ¼ mT logxk ðwÞ
k¼1 k
(9.2)
Convolutional neural network for scoring of sleep stages 147
these metrics are presented per class altogether. Additionally, we studied how to
visualize our trained neural network and learned about sleep phases during the
classification process. There are many different ways to visualize neural networks
that have been learned [27,28].
9.5.1 Pre-training
The initial step in training is to monitor the model’s representation learning-based
section’s pre-training with the training set to ensure that the proposed model
doesn’t over-adapt the many sleeping phases. The two CNNs are retrieved from the
proposed model and layered with a softmax layer. This stacked softmax layer is
deployed to pre-train the two CNNs in this stage, and its parameters are deleted
after the pre-training is accomplished. The softmax layer is eliminated after the
conclusion of the pre-training. The training set of class balance is produced by
replicating minority stages of sleep in the actual training dataset until every stage of
sleep has the same amount of data (in other words oversampling).
9.5.3 Regularization
Further, to avoid overfitting issues, we used two regularization strategies. The
dropout layer [29,30] is a method that periodically sets the values of input to zero
(i.e., units of the dropout layer along with their connections) with a predefined
probability overtraining period. As illustrated in Figure 9.2, the dropout layers with
0.5 probabilities were applied across the model. This dropout layer was only
Convolutional neural network for scoring of sleep stages 149
needed for the purpose of training, and it was deleted from the network during the
testing period so that consistent output could be produced.
TensorFlow, a deep learning toolkit based on Google TensorFlow libraries
[31], was used to build the proposed model. This library enables us to distribute
computational techniques over several CPUs, such as validation and training
activities. It takes around two days to train the entire model. At a rate of about
30 epochs per second, interference is performed.
9.6 Results
Table 9.2 shows the confusion-matrix derived from the test dataset. Table 9.3
displays the recall, F1-score, and precision for multiclass and graphical repre-
sentation is represented in Figure 9.3. The sleep stage N1 is the most misclassified,
with only 25% of valid classifications. With 93% of valid classifications, sleep
stage Wake was the most accurately classified sleep stage.
N2, REM, and N3 are the following sleep stages, with 89%, 87%, and 80%,
respectively. The accuracy of overall multiclass classification is 87%, with a
kappa-coefficient of 0.81.
Sleepstage N1 is never nearly confused with N3 and is frequently confused
with sleep stage N1 (25%), sleep stage N2 (37%), and sleep stage REM (24%).
Sleep stage REM is sometimes (4%) mistaken with sleep stage-N3 and rarely with
the other sleep stages. Sleep stage-N2, on the other hand, is frequently confused
with sleep stage-N3 (21%) and nearly never with other sleep stages.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Wake Stage NI Stage N2 Stage N3 Stage REM Stage
Sleep Stages
9.7 Discussion
9.7.1 Major findings
This research shows that utilizing a single EEG channel and CNN trained on raw
samples makes it feasible to categorize the sleep phases with their performance
metrics comparable to other approaches. The training of data is completed from
beginning to end, without the need for any specialist expertise in the selection of
features or preprocessing of the signal. This is beneficial because the model may
learn the features which are most appropriate for task classification. We considered
applying a bandpass FIR filter to preprocess the signals, but it probably does not
help because the convolution layer could be capable of learning appropriate filters.
One more benefit is that this methodology is easy to adapt to different applications
or mediums.
Although training a giant CNN is significantly more challenging, the inference
is relatively inexpensive and may be performed on a portable device or a home
computer once the model has been trained. When it comes to the kind of errors that
the model produces, we have seen that they generally correspond to sleep phases
that are close together. N3 is frequently confused with sleep stage N2 but nearly
never with sleep stage N1. Likewise, while sleep stage N1 is characterized as a
stage with the least inter-human agreement, it might be mistaken as REM, N2, or as
Wake, all of which contain patterns comparable to sleep stage N1 but nearly never
with sleep stage N3.
Lastly, REM is more likely to be confused with sleep stage N2 than sleep stage
Wake. One probable explanation is that eye movement is a significant commonality
between Wake and REM, yet the EEG channel C4-A1 generation leaks relatively
little frontal activity of eye movement.
Convolutional neural network for scoring of sleep stages 151
9.7.3 Comparison
Table 9.4 summarizes the performance metrics and characteristics found in recent
signal channel EEG sleep scoring studies. It is difficult to compare studies in the
sleep scoring research since they do not all apply the same database, scoring
methods, or number of patients, and they do not all balance the classes in the same
manner. The number of hours after and before the night of wake epochs is retained
in the PhysioNet Sleep-edfx database [34]. The wake-sleep stage has a substantially
bigger number of epochs than the other phase of sleep. Some researchers [35]
reduce the number of wake epochs, whereas others include all wake epochs in the
evaluation of their performance metrics, which disproportionately benefits the
conclusion. To compare various studies objectively, we start with reported confu-
sion matrices, and if the Wake sleep stage is the most popular class, then we adjust
it to make as the second most popular class in sleep stages. Sleep stages N4 and N3
are combined into a single sleep stage N3, in one study [36] when only a confusion
matrix of 6-class is provided.
Table 9.4 also lists some of the additional study features, including the channel
of the EEG signal, the database used, sleep scoring rules, and the methodology used
in their study. Although the expanded Sleep-edfx has been long accessible, several
recent types of research employ the sleep-edfx database. We got improved results
on the sleep-edfx database, which was unexpected. This is because human raters
are not flawless, and fewer technicians scored Sleep-EDF than expanded Sleep-
edfx, and methodologies evaluating Sleep-EDF can quickly learn the rater’s clas-
sification technique. Our algorithm, on the other hand, is examined on 1,700
records at test time scored by various approaches. This ensures that the system does
not over-dependent on a small group of professionals’ rating styles. The study by
Arnaud [37] provided support for our proposed study.
This approach demonstrated that this method is comparative in the evaluation
of performance and that the network has been trained to detect significant observed
patterns. A method for sleep-scoring is desirable as it enables the system to be light.
Implementing multichannel CNN models perform greater than one channel, which
also offers new possibilities. Our finding revealed that our proposed model was
capable of learning features of the model for scoring sleep stages from various raw
single-channel EEGs without modifying the model’s algorithm for training and
Table 9.4 Summary of performance criteria of various methods using single-channel EEG signal
Reference Database Signal used Rules used Model Performance accuracy Kappa coefficient F1 score
Tsinalis Sleep-EDF Fpz-Cz R&K Convolutional neural network 0.75 0.65 0.75
Fraiwan Custom C3-A1 AASM RandomForest classifier 0.83 0.77 0.83
Hassan Sleep-EDF Pz-Oz R&K Empirical mode decomposition 0.83 0.76 0.83
Zhu Sleep-EDF Pz-Oz R&K Support vector machine 0.85 0.79 0.85
Suprtak MASS Fpz-Cz AASM CNN-LTSM 0.86 0.80 0.86
Hassan Sleep-EDF Pz-Oz R&K EMD-bootstrap 0.86 0.82 0.87
Proposed work SHHS-1 C4-A1 AASM Convolutional neural network 0.89 0.81 0.87
Convolutional neural network for scoring of sleep stages 153
References
[12] A.R. Hassan and M.I.H. Bhuiyan, “A decision support system for automatic
sleep staging from EEG signals using tunable Q-factor wavelet transform
and spectral features”, J. Neurosci. Methods, 271, 107–118, 2016.
[13] R. Sharma, R.B. Pachori, and A. Upadhyay, “Automatic sleep stages clas-
sification based on iterative filtering of electroencephalogram signals”,
Neural Comput. Appl., 28, 1–20, 2017.
[14] Y.-L. Hsu, Y.-T. Yang, J.-S. Wang, and C.-Y. Hsu, “Automatic sleep stage
recurrent neural classifier using energy features of EEG signals”,
Neurocomputing, 104, 05–114, 2013.
[15] O. Tsinalis, P.M. Matthews, Y. Guo, and S. Zafeiriou, “Automatic sleep
stage scoring with single-channel EEG using convolutional neural net-
works”, 2016, arXivpree-prints.
[16] A. Supratak, H. Dong, C. Wu, and Y. Guo, “DeepSleepNet: a model for
automatic sleep stage scoring based on raw single-channel EEG”, 2017,
arXiv preprint arXiv:1703.04046.
[17] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet classification with
deep convolutional neural networks”, Adv. Neural Inf. Process. Syst., 1,
1097–1105, 2012.
[18] R. Collobert and J. Weston, “A unified architecture for natural language
processing: deep neural networks with multitask learning”, in: Proceedings
of the 25th International Conference on Machine Learning, ICML, ACM,
New York, NY, USA, 2008.
[19] H. Cecotti and A. Graser, “Convolutional neural networks for p300 detection
with application to brain–computer interfaces”, IEEE Trans. Pattern Anal.
Mach. Intell., 33(3), 433–445, 2011.
[20] M. Hajinoroozi, Z. Mao, and Y. Huang, “Prediction of driver’s drowsy and
alert states from EEG signals with deep learning”, in: IEEE 6th International
Workshop on Computational Advances in Multi-Sensor Adaptive Processing
(CAMSAP), IEEE, pp. 493–496, 2015.
[21] A. Page, C. Shea, and T. Mohsenin, “Wearable seizure detection using
convolutional neural networks with transfer learning”, in: IEEE
International Symposium on Circuits and Systems (ISCAS), IEEE, pp. 1086–
1089, 2016.
[22] Z. Tang, C. Li, and S. Sun, “Single-trial EEG classification of motor imagery
using deep convolutional neural networks”, Optik 130, 11–18, 2017.
[23] S.F. Quan, B.V. Howard, C. Iber, et al., “The sleep heart health study:
design, rationale, and methods”, Sleep, 20 (12) 1077–1085, 1997.
[24] Sleep Data – National Sleep Research Resource – NSRR, https://sleepdata.
org/.
[25] R.B. Berry, R. Brooks, C.E. Gamaldo, S.M. Harding, C. Marcus, and B.
Vaughn, “AASM manual for the scoring of sleep and associated events”, J.
Clin. Sleep Med. 13(5), 665–666, 2012.
[26] D. Kingma and J. Ba, “Adam: a method for stochastic optimization”, 2014,
arXiv:1412.6980.
Convolutional neural network for scoring of sleep stages 155
The use of artificial intelligence (AI) in healthcare has made great strides in the
past decade. Several promising new applications are proving to be useful in
medicine. The most important and significant of them are image analysis and
classification using deep learning. This has led to several intelligent disease
detection systems assisting doctors. They are a boon not only for the doctors as
it reduces their workload effectively and efficiently, but also for the patients
with accurate and fast results. Hence it becomes necessary to understand the
concepts along with their limitations and future prospects. This will help in
designing and applying image analysis across a wide range of medical spe-
cialties. We discuss the basic concepts of deep learning neural networks and
focus on the applications in the specialties of radiology, ophthalmology, and
dermatology. In addition to a thorough literature survey, we have also built a
representational system for each of these specialties and presented the results.
We also discuss the benefits for the patients along with the limitations of such
intelligent systems. We have built a neural network intelligent system in each of
these specialties and presented the results with details of the dataset and models
used. We have got high-performance metrics of AUC up to 90% in our radi-
ological system, accuracies of 92% in ophthalmology, and 98% in dermatology.
With enough amount of data, highly efficient and effective disease detection
systems can be built, to perform as aides to healthcare professionals for
screening and monitoring for various diseases and disorders. More image
datasets should be available in the public domain for further research, improved
models, and better performance metrics. Also applications should use parallel
processing of data to reduce time taken. Healthcare professionals should be
fully trained to adapt to the use of intelligent decision-making systems to assist
them in patient care.
1
Thiagarajar College of Engineering Madurai, India
158 Deep learning in medical image processing and analysis
10.1 Introduction
The life expectancy of humans has more than doubled in the past 200 years. If we
consider recent time periods, global data shows an increase of about 6.6 years in
life expectancy from 2000 to 2019. While this is a very great achievement, a result
of advances in medicine and healthcare, the healthy life expectancy (HALE) is
rising at a lower rate (5.4 years) [1]. It becomes necessary to ably support the
already stretched healthcare community, to effectively serve a growing population,
and to ensure adequate healthcare for all. Intelligent systems are already proving
their efficacy, accuracy, and speed in several aspects of healthcare, from diag-
nostics to complex surgeries.
Artificial intelligence (AI), especially deep learning (DL) models are
making great strides in diagnostics by screening images. Figure 10.1 shows the
usage of AI in healthcare is much higher than all other technologies. Several
studies have proven AI systems perform at least as well as qualified profes-
sionals after suitable training. In some studies, AI/DL systems outperform the
experts too! [2].
Automation of disease detection started making steady progress in healthcare
with the introduction of machine learning (ML) models. As computing power and
data storage capabilities increased over the years, newer models and various deep
learning models have come into greater significance in healthcare.
Image capture devices have also become much better with higher-resolution
images aiding disease detection. Databases which can store large volumes of data,
which is required for storing large numbers of high-quality images have supported
the development of deep learning models.
We are now entering what can be described as a golden era of intelligent
systems aiding humans, especially in healthcare. Deep learning for medical image
analysis is a vast domain in itself.
70%
Usage
60%
50%
40%
30%
20%
10%
0%
AI for medicine Tele medicine Disease Electronic Internet of Blockchain Cloud Other
management health record things computing
technologies interoperability
In this chapter, the fundamentals of deep learning and the current specialties in
healthcare where DL applications are already performing successfully, are pre-
sented. We will also present exciting future trends in addition to the challenges and
limitations in biomedical image analysis using deep learning. The structure of this
paper is shown in Figure 10.2.
This chapter has three major sections:
1. Demystifying deep learning—a simple introduction to DL
2. Current trends and what we can expect in the future
3. Challenges and limitations in building biomedical DL systems
The structure of this chapter is shown in Figure 10.2.
Introduction
Current trends in
medical imaging
Challenges
Patient benefits
Conclusion
The goal of the first section is to explain the concepts of deep learning with
reference to AI and ML. It also highlights the differences between the three tech-
niques (AI, ML, and DL).
The current trends section is further subdivided into overview, radiology,
ophthalmology, and dermatology. These three specialties are chosen for this sec-
tion, based on their current success in DL applications for image analysis.
In radiology, AI plays a major role in disease diagnosis from X-rays, mam-
mograms, and CT/MRI images. X-rays are easy to obtain and involve minimal
radiation exposure. X-rays are mainly used for imaging bones and the lungs.
Recently, X-rays for assessing the severity of COVID-19 with lung involvement
were widely used globally. Several studies conducted for these applications have
found AI diagnostics and decision support systems to be at least as good as that of
doctors and trained specialists.
In ophthalmology, the AI analysis of images mainly refers to using retinal
fundus images (RFI) and optical coherence tomography (OCT) to detect various
diseases, not just ophthalmological diseases including diabetic retinopathy, glau-
coma, etc., but even neurological diseases. Recent studies show good results in the
early detection of Alzheimer’s disease just by the AI image analysis of the retinal
fundus. Papilledema and hence any swelling of the brain can also be detected. In
addition to this, the direct visualization of the microvasculature in the retinal fundus
is now proving to be useful in predicting and detecting systemic diseases like
chronic kidney failure and cardiovascular diseases.
In dermatology, AI has achieved great success in analyzing skin photographs
and diagnosing diseases including detecting skin cancers, dermatitis, psoriasis, and
onychomycosis. Research is still ongoing to enable patients to just upload an image
and get an instant and reliable diagnosis.
The third section presents the challenges and future work needed before the
widespread use of AI in medical diagnostics. Challenges including normal-
ization of images from various sources, a large database of images for training,
and the need to consider and ensure patient safety, legal and ethical issues are
presented.
10.2 Demystifying DL
As this chapter will deal with AI, ML, and DL, it is necessary to first define these
three terms. AI is the superset which encompasses both ML and DL (Figure 10.3).
Although DL can be considered as the subset of ML, it differs from conventional
ML algorithms or techniques, in that, DL uses a large volume of data to learn
insights from the data by itself. These patterns are then used to make predictions on
any similar data or unseen data.
AI and ML usually describe systems that follow a fixed set of rules to make
predictions. These rules were predefined by an expert in that field. These were
not considered as a data-driven approach, but just automation based on a few
predefined sets of instructions.
Deep learning for biomedical image analysis 161
Intelligent systems designed for deep learning networks are inspired by the
human brain. The architecture of deep learning systems, closely resembles
human brain structure. The basic computational unit of a neural network (NN) is
called a perceptron. This closely resembles a human neuron. Similar to the
electrical pulses traveling through the neurons, the perceptron uses signals to
provide suitable outputs
Similar to the neurons combining together to form the human neural network,
the perceptron combines to form an intelligent system. The NNs used for DL are
comprised of an input layer, an output layer, and several hidden layers as shown in
Figure 10.4.
Figure 10.5 shows a schematic diagram of a deep learning system which can be
customized according to the application. Further study and a detailed explanation
of the concepts of deep learning can be found in [5].
Artificial intelligence
(AI) To incorporate human behavior and
intelligence to machine or systems.
Machine learning
Methods to learn from data or past
(ML)
experience, which automates
analytical model building.
Deep
learning Computation through multi-layer
(DL) neural networks and processing.
Step 1: Step 2:
Data understanding and DL model building and training Step 3:
preprocessing Validation and interpretation
Learning type
Discriminative,
Performance
generative, hybrid
analysis
Preprocessing
Real-world Data
and
data annotation Tasks
augmentation DL model
Prediction, detection Model
training
classification, etc. interpretation and
Visualization Visualization conclusion drawing
and testing DL methods
simple tasks MLP, CNN, RNN,
GAN, AE, DBN, DTL
AE+CNN, etc.
10.3.2 Radiology
Radiology is the foremost of all specialties in healthcare to primarily use image
analysis. From simple X-rays to mammograms, CT/MRI, and PET scans, the
images in diagnostic radiology are used to non-invasively visualize the inner organs
and bones in our body. Radiological visualization is used by almost all other spe-
cialties in healthcare. These image-based diagnoses play a pivotal role not only in
disease detection but also guide subsequent treatment plans.
Radiology was one of the first few specialties in medicine to use digitized
images and adapt AI/ML methods, and more recently computer vision (CV) tech-
niques using advanced neural networks. A recent study in radiology shows that AI
applications are used for the following tasks in diagnostic radiology. Perception
(70%) and reasoning (17%) tasks are the primary functionalities for AI tools
(Figure 10.6).
Deep learning for biomedical image analysis 163
Acquisition Processing
Administration 7%
2%
3%
Reporting
1%
Reasoning
17%
Perception
70%
Most of the existing AI applications in radiology are for CT, MRI, and X-ray
modalities (29%, 28%, and 17%, respectively) [6].
Most of the current applications are for any one of these modalities and focus
on any one anatomical part. Very few applications work for multiple modalities and
multiple anatomical regions. The AI applications to analyze images of the brain
have the highest share of about 27%, followed by the chest and lungs at 12% each.
Mammograms to detect cancer have also achieved good results in screening
programs.
Several monotonous and repetitive tasks like segmentation, performed by
radiologists are successfully performed by intelligent systems in a much shorter
time. This has saved up several man-hours for the doctors and has enabled quicker
results for the patients. In some applications, smaller lesions and finer features are
detected by the DL system better than by human diagnosticians, leading to high
accuracy in disease detection.
Reference Dataset Dataset Disease Algorithms used Significant results Limitations/future work
size detected
[7] Own data 5,232 Bacterial Pneu- InceptionNet V3 Best Model: InceptionNet V3 Images from different devices
images monia, Viral Pneumonia/Normal (different manufacturers) for
Pneumonia training and testing to make the
system
Accuracy: 92.8% Universally useful
Sensitivity: 93.2%
Specificity: 90.1%
AUC: 96.8%
Bacterial/Viral Accuracy:
90.7%
Sensitivity: 88.6%
Specificity: 90.9%
AUC: 94.0%
[7,8] 5,232 Pneumonia Xception, VGG16 Best Model: VGG16 N/A
images Accuracy: 87%
Sensitivity: 82%
Specificity: 91%
[7,9] 5,856 Pneumonia VGG16, VGG19, Best Model: ResNet50 More datasets and
images DenseNet201, Accuracy: 96.61% advanced feature
Inception_ResNet_V2, Sensitivity: 94.92% extraction techniques maybe
Inception_V3, Specificity: 98.43% used – You-Only-Look- Once
Resnet50, Precision: 98.49% (YOLO), and U-Net.
MobileNet_V2, F1 score: 96.67%
Xception
[10,11] 273 COVID-19 Inception V3 Best model: InceptionNet V3 The FM-HCF-DLF model – other
images Combined with MLP combined with MLP classifiers can be tried (instead of
Sensitivity: 93.61% MLP).
Specificity: 94.56%
Precision: 94.85%
Accuracy: 94.08%
F1 score: 93.2%
Kappa value: 93.5%
(Continues)
[12,13] LIDC- 3,500 Pneumonia, lung AlexNet, VGG16, Best Model: MAN-SVM EFT implementation for Local
IDRI data- images cancer VGG19, ResNet50, Accuracy: 97.27% Binary Pattern (LBP) based
base MAN- SoftMax, MAN- Sensitivity: 98.09% feature extraction
SVM Specificity: 95.63%
Precision: 97.80%
F1 score: 97.95%
[13,14] 112,120 Atelectasis, CheXNeXt (121-layer Mass detection: Sensitivity: Both CheXNeXt and the
images cardiomegaly, DenseNet) 75.4% radiologists did not
consolidation, Specificity: 91.1% consider patient history or
edema, effusion, Nodule detection: Sensitivity: review previous visits.
emphysema, 69.0% If considered, it is known to
fibrosis, hernia, Specificity: 90.0% improve the diagnostic
infiltration, performance of
mass, nodule, radiologists.
pleural
thickening,
pneumonia,
pneumothorax
Mean accuracy: 82.8%
[14] Own data 108,948 Atelectasis, AlexNet Best model: ResNet50 Dataset can be extended to cover,
cardiomegaly, more disease classes and also to
integrate other clinical information
(ChestX- images Effusion, GoogLeNet, Accuracy: Atelectasis: 70.69%
ray8) infiltration, VGGNet-16, Cardiomegaly: 81.41%
mass, nodule, ResNet50 Effusion: 73.62%
pneumonia, Infiltration: 61.28%
pneumothorax Mass: 56.09%
Nodule: 71.64%
Pneumonia: 63.33%
Pneumothorax: 78.91%
(Continues)
Table 10.1 (Continued)
Reference Dataset Dataset Disease Algorithms used Significant results Limitations/future work
size detected
[11,15] Kaggle 1,215 COVID-19, ResNet50. ResNet101 Best Model: ResNet101 The system could be extended to
images bacterial Accuracy: 98.93% detect other viruses (MERS,
pneumonia, Sensitivity: 98.93% SARS, AIDS, and H1N1)
viral pneumonia Specificity: 98.66%
Precision: 96.39%
F1-score: 98.15%
[16] Radiology 380 COVID-19 SVM classifier (with Best model: SVM Other lung diseases can be
assistant, images Linear, Quadratic, (Linear kernel) considered
Kaggle Cubic, and Gaussian Accuracy: 94.74%
kernel) Sensitivity: 91.0%
Specificity: 98.89%
F1 score: 94.79%
AUC: 0.999
[17,18] 5,606 Atelectasis, VDSNet, vanilla gray, Best model: VDSNet Image augmentation for
images pneumonia, vanilla RGB, hybrid Recall: 0.63 increasing the accuracy
hernia, edema, CNN and VGG, Precision: 0.69
emphysema, modified capsule Fb (0.5) score: 0.68
cardiomegaly, network Validation accuracy: 73%
fibrosis, pneu-
mothorax,
consolidation,
pleural thicken-
ing, mass,
effusion,
infiltration,
nodule
Deep learning for biomedical image analysis 167
10.3.3 Ophthalmology
Ophthalmology was a natural forerunner in adapting AI screening tools for
image analysis, mainly because it relies on several images for disease detection
and monitoring. Retinal imaging, which includes retinal fundus imaging (RFI)
and optical coherence tomography (OCT) is used for diagnosing several dis-
eases of the eye, brain, and even systemic diseases like diabetes and chronic
kidney disease.
Diabetic retinopathy (DR) is caused by damage to the retina which in turn is
caused by diabetes mellitus. It can be diagnosed and assessed using retinal fundus
images. Early diagnosis and intervention can save vision. Similarly, age-related
macular degeneration (AMD) is also avoidable if diagnosed early. Again the
diagnosis is based on retinal images.
Deep learning for biomedical image analysis 169
0.90
0.85
0.80
Accuracy
0.75
0.70
0.65
Training
0.60 Validation
2 4 6 8 10 12 14
Epoch
1.0
0.8
0.6
True positive rate
Atelectasis (AUC:0.78)
0.4 Cardiomegaly (AUC:0.90)
Consolidation (AUC:0.79)
Edema (AUC:0.88)
Effusion (AUC:0.87)
Emphysema (AUC:0.88)
Fibrosis (AUC:0.79)
0.2 Hernia (AUC:0.82)
Infiltration (AUC:0.71)
Mass (AUC:0.82)
Nodule (AUC:0.73)
Pleural_Thickening (AUC:0.77)
Pneumonia (AUC:0.74)
0.0 Pneumothorax (AUC:0.86)
10.3.4 Dermatology
Dermatology has hugely successful applications of artificial intelligence for a wide
range of diagnoses, from common skin conditions to screening for skin cancer.
Almost all of these applications are based on image recognition models and are also
used to assess and manage skin/hair/nail conditions. Google is now introducing an
AI tool which can analyze images captured with a smartphone camera. Patients
themselves or non-specialist doctors can use this tool to identify and diagnose skin
conditions. This is very useful for telehealth applications too. AI systems can prove
invaluable in the early detection of skin cancer, thereby saving lives [31].
Detection, grading, and monitoring are the main uses of AI systems mainly in
melanoma, psoriasis, dermatitis, and onychomycosis. They are also now used for
acne grading and also monitoring ulcers by automatic border detection and area
calculations.
Table 10.2 Literature survey—AI in ophthalmology
Reference dataset Dataset Disease detected Algorithms used Significant results Limitations/future work
size
[20] Cirrus HD- 1,208 Glaucoma, gNet3D Best model: gNet3D SD-OCT scans with low SS are included.
OCT, Cirrus SD- images myopia AUC: 0.88 Risk factors like pseudo-exfoliation/
OCT images pigment dispersion/ secondary
mechanisms are not considered.
[21] 3D OCT-2000, 357 Glaucoma CNN, random Best model: RF, N/A
Topcon images images forest AUC: 0.963
[22] 3D OCT-1000, 71 Age-related AMDnet, CNN, Best model: AMDnet, Models generalization to patients with early
3D OCT-2000 images macular VGG16, SVM AUC: 0.89 or intermediate AMD is not known
images degeneration
[23] SD-OCT 1,621 Age-related CNN, transfer AMD detection: Patients who had other associated
images images macular learning diseases were excluded. Unclear if the
degeneration results can be used in general
Best model: CNN
Sensitivity: 100% .
Specificity: 91.8%
Accuracy: 99%
Exudative changes detection:
Best model: transfer
learning
Model
Sensitivity: 98.4%
Specificity: 88.3%
Accuracy: 93.9%
[24] SS-OCT, 260 Multiple sclerosis SVM (linear, Best model: Decision tree MS cohort should be modified to
images polynomial, radial consider patients with at least only one year
basis, sigmoid), of disease duration as opposed to the
decision tree, average duration of 7.12 years
random forest
(Continues)
Table 10.2 (Continued)
Reference dataset Dataset Disease detected Algorithms used Significant results Limitations/future work
size
DRI OCT
Triton Wide protocol:
images Accuracy: 95.73%
AUC: 0.998
Macular protocol:
Accuracy: 97.24%
AUC: 0.995
[25] SD-OCT 6,921 Glaucomatous ResNet 3D Best model: ResNet 3D Performance in external validations was
images deep-learning system reduced compared to primary validation.
system, ResNet Only gradable images and cases of
2D deep-learning glaucomatous optic neuropathy with
system corresponding visual field defects were
included.
images Optic neuropathy AUC: 0969
Sensitivity: 89%
Specificity: 96%
Accuracy: 91%
[26] Cirrus SD- 20,000 Age-related ReLayNet (for Best Model: Inceptionres- No explicit definitions of features were
images macular segmentation), Net50 given, so the algorithm may use features
degeneration Inception-v3, previously not recognized or ignored by
InceptionresNet50 humans. The images were from a single
clinical site.
OCT images Accuracy: 86–89%
[27] Zeiss 463 Diabetic VGG19, Best model: VGG19 Only images with a signal strength of 7 or
volumes retinopathy above were considered, which maybe
sometimes infeasible in patients with
pathology.
(Continues)
PlexEite ResNet50, Sensitivity: 93.32%
9,000 images DenseNet Specificity: 87.74%
Accuracy: 90.71%
[28] Zeiss Cirrus 35,900 Age-related macu- VGG16, Best model: InceptionV3 N/A
images lar degeneration
(dry, inactive wet,
active wet)
HD-OCT InceptionV3, Accuracy: 92.67%
ResNet50
4000, Sensitivity (dry): 85.64%
Optovue Sensitivity (inactive wet):
97.11%
RTVue-XR Sensitivity (active wet):
88.53%
Avanti Specificity (dry): 99.57%
images Specificity (inactive wet):
91.82%
Specificity (active wet):
99.05%
[29] Cirrus OCT, 8,529 Age-related Logistic Best model: Logistic OCT angiography to detect subclinical
volumes macular regression regression MNV not included, which could be sig-
degeneration nificant in assessing progression risk with
drusen
Zeiss images 0.5–1.5 mm area – AUC:
0.66
0–0.5 mm area – AUC: 0.65
174 Deep learning in medical image processing and analysis
1.0
0.9
0.8
Accuracy
0.7
0.6
0.5
Training
0.4 Validation
2 4 6 8 10 12
Epoch
Reference Dataset Dataset Disease detected Algorithms used Significant results Limitations/future work
size
[32] DermNet 2,475 Melanoma DT, RF, GBT, Best model: CNN Tested only on one dataset, of
images CNN Accuracy: 88.83% limited size.
Precision: 91.07%
Recall: 87.68%
F1-Score: 89.32%
[33] HAM10000 10,015 Actinic keratoses, CNN, RF, DT, LR, Best model: CNN Model can be improved by
images basal cell carcino- LDA, SVM, KNN, Accuracy: 94% hyper- parameter fine-tuning
ma, benign NB, Inception V3 Precision: 88%
keratosis-like le- Recall: 85%
sions, F1-Score: 86%
dermatofibroma,
melanoma, mela-
nocytic nevi, vas-
cular lesions
[34] ISIC N/A Skin cancer CNN, GAN, KNN, Best model: CNN N/A
SVM Accuracy: 92%
Precision: 92%
Recall: 92%
F1-Score: 92%
[4,35] 120 Melanoma KNN Best model: KNN Ensemble learning methods or evolutionary
images Accuracy: 98% algorithms can be considered for faster and
more accurate results
[36] Available 120 Herpes, dermatitis, SVM, GLCM Best model: SVM Very limited dataset, with only
on images psoriasis Accuracy (Herpes): 20 images for each class
request 85%
Accuracy
(Dermatitis): 90%
Accuracy
(Psoriasis): 95%
(Continues)
Table 10.3 (Continued)
Reference Dataset Dataset Disease detected Algorithms used Significant results Limitations/future work
size
[37] Own data 80 Melanoma, SVM, AlexNet Best model: SVM Very limited dataset, with only 20 images for
images eczema, Accuracy (melano- each class.
psoriasis ma): 100% Overfitting is likely the reason for such high
Accuracy accuracies.
(eczema): 100%
Accuracy
(psoriasis): 100%
[38] ISIC 640 Melanoma KNN, SVM, Best model: CNN Semi-supervised learning could be used to
images CNN Majority Accuracy: 85.5% overcome lack of enough labeled training data
voting
[39] ISBI 2016 1,279 Melanoma VGG16 ConvNet Best model: Larger dataset can be used to avoid overfitting.
Challenge images VGG16 ConvNet Additional regularization and fine-tuning of
dataset for (i) trained with fine-tuning hyper-parameters can be done.
Skin Lesion from scratch; Accuracy: 81.33%
Analysis (ii) pre-trained on Sensitivity: 0.7866
a larger dataset Precision: 0.7974
(iii) fine-tuning the Loss: 0.4337(on
ConvNets test data)
[40] HAM10000 10,015 Skin cancer AlexNet, ResNet, Best model: DCNN A user-friendly CAD system can be built.
images VGG-16, Dense- Accuracy (Train):
Net, MobileNet, 93.16%
DCNN Accuracy (Test):
91.43%
Precision: 96.57%
Recall: 93.66%
F1-Score: 95.09%
[41] Subset of N/A N/A Inception V2, Best model: N/A
DermNet Inception V3, Inception V3
MobileNet, Precision: 78%
ResNet, Xception Recall: 79%
F1-Score: 78%
Deep learning for biomedical image analysis 179
limitations have become obsolete. But several other limitations do exist for bio-
medical imaging and analysis systems. We will see a few in this section.
The first challenge in medical image processing with artificial intelligence is the
availability of data. While certain fields and subdomains like ophthalmology and dia-
betic retinopathy have large volumes of data available in the public domain other rarer
diseases and other fields have very limited datasets. So, most of the literature is based
Deep learning for biomedical image analysis 181
1.0
0.9
0.8
Accuracy
0.7
0.6
0.5
Training
0.4 Validation
0 5 10 15 20 25
Epoch
on a few sets of images. More availability of diverse data would ensure more versatile
and robust models which can work with different inputs [43]. Freely available datasets
in the public domain are needed for further progress in this field.
Ethics and legalities in collecting and using data have to be strictly followed by
international standards. All images must be de-identified and obtained with con-
sent. Patient privacy has to be preserved properly.
Another concern in using artificial intelligence for disease detection is the lack
of explainability of the models, i.e., we do not know on what features the models
base their decisions on. The upcoming explainable artificial intelligence (XAI) or
explainable machine learning (XML) may solve this problem to a certain extent as
it helps us to understand how the intelligent systems process the data and base their
decisions on and what to base their decisions on [44]. This eliminates the black box
approach which is currently prevalent, where we input the data and have the
decision as the output. Also, the decision systems have to take into account, the
other data about the patient in addition to the image being analyzed. Intelligent
decision systems must take into account age, previous medical history, and other
co-morbidities in addition to the images in the decision-making process.
Very often the collected data has an inbuilt bias. This can also affect the
training of the model and hence performance. This can be avoided by carefully
planning and monitoring the data collection process.
Each country or region has regulatory bodies for approving medical devices.
The intelligent systems for users in disease detection/decision-making systems also
have to undergo stringent checks and tests and approval has to be sort from
suitable regulatory bodies before using for patient benefits.
Training medical experts in using AI systems efficiently will lead to them adapting
intelligent systems quickly and easily in their regular practice. A recent survey shows
less than 50% of medical experts in radiology, ophthalmology, and dermatology have
at least average knowledge of AI applications in their specialty [45] (Figure 10.19).
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
Very poor Below average Average Above average Excellent
Ophthalmology Radiology Dermatology
10.6 Conclusions
References
[1] https://www.who.int/data/gho/data/themes/mortality-and-global-health-esti-
mates/ghe-life-expectancy-and-healthy-life-expectancy retrieved on 18.12.2022.
[2] Pham, TC., Luong, CM., Hoang, VD. et al. AI outperformed every derma-
tologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN
architecture with custom mini-batch logic and loss function. Sci Rep, 11,
17485 (2021).
[3] Kumar, Y., Koul, A., Singla, R., and Ijaz, M. F. (2022). Artificial intelligence in
disease diagnosis: a systematic literature review, synthesizing framework and
future research agenda. J Ambient Intell Humanized Comput, 14, 1–28.
[4] Savalia, S. and Emamian, V. (2018). Cardiac arrhythmia classification by
multi-layer perceptron and convolution neural networks. Bioengineering, 5
(2), 35. https://doi.org/10.3390/ bioengineering5020035
184 Deep learning in medical image processing and analysis
[21] An, G., Omodaka, K., Hashimoto, K., et al. (2019). Glaucoma diagnosis
with machine learning based on optical coherence tomography and color
fundus images. J Healthcare Eng, 2019.
[22] Russakoff, D. B., Lamin, A., Oakley, J. D., Dubis, A. M., and Sivaprasad, S.
(2019). Deep learning for prediction of AMD progression: a pilot study.
Invest Ophthalmol Visual Sci, 60(2), 712–722.
[23] Motozawa, N., An, G., Takagi, S., et al. (2019). Optical coherence
tomography-based deep-learning models for classifying normal and age-
related macular degeneration and exudative and non-exudative age-related
macular degeneration changes. Ophthalmol Therapy, 8(4), 527–539.
[24] Perez del Palomar, A., Cegonino, J., Montolio, A., et al. (2019). Swept
source optical coherence tomography to early detect multiple sclerosis dis-
ease. The use of machine learning techniques. PLoS One, 14(5), e0216410.
[25] Ran, A. R., Cheung, C. Y., Wang, X., et al. (2019). Detection of glaucoma-
tous optic neuropathy with spectral-domain optical coherence tomography: a
retrospective training and validation deep-learning analysis. Lancet Digital
Health, 1(4), e172–e182.
[26] Saha, S., Nassisi, M., Wang, M., Lindenberg, S., Sadda, S., and Hu, Z. J.
(2019). Automated detection and classification of early AMD biomarkers
using deep learning. Sci Rep, 9(1), 1–9.
[27] Heisler, M., Karst, S., Lo, J., et al. (2020). Ensemble deep learning for dia-
betic retinopathy detection using optical coherence tomography angio-
graphy. Transl Vis Sci Technol, 9(2), 20–20.
[28] Hwang, D. K., Hsu, C. C., Chang, K. J., et al. (2019). Artificial intelligence-
based decision-making for age-related macular degeneration. Theranostics,
9(1), 232.
[29] Waldstein, S. M., Vogl, W. D., Bogunovic, H., Sadeghipour, A., Riedl, S.,
and Schmidt-Erfurth, U. (2020). Characterization of drusen and hyperre-
flective foci as biomarkers for disease progression in age-related macular
degeneration using artificial intelligence in optical coherence tomography.
JAMA Ophthalmol, 138(7), 740–747.
[30] Kermany, D., Zhang, K., and Goldbaum, M. (2018), Labeled Optical
Coherence Tomography (OCT) and Chest X-Ray Images for Classification,
Mendeley Data, V2, doi: 10.17632/ rscbjbr9sj.2
[31] Liopyris, K., Gregoriou, S., Dias, J. et al. (2022). Artificial intelligence in
dermatology: challenges and perspectives. Dermatol Ther (Heidelb) 12,
2637–2651. https://doi.org/10.1007/s13555-022-00833-8
[32] Allugunti, V. R. (2022). A machine learning model for skin disease classi-
fication using convolution neural network. Int J Comput Program Database
Manag, 3(1), 141–147.
[33] Shetty, B., Fernandes, R., Rodrigues, A. P., Chengoden, R., Bhattacharya, S.,
and Lakshmanna, K. (2022). Skin lesion classification of dermoscopic ima-
ges using machine learning and convolutional neural network. Sci Rep, 12
(1), 1–11.
186 Deep learning in medical image processing and analysis
In the whole world, people are suffering from a variety of diseases. In order to detect
these diseases, several medical imaging procedures are used in which images from
different parts of the body are captured through advanced sensors and well-designed
machines. These medical imaging procedures increase the expectations of patients in
achieving better healthcare services from medical experts. Till now, various image
processing algorithms such as neural networks (NN), convolutional neural networks
(CNN), and deep learning are used for image analysis, image representation, and
image segmentation. Yet, these approaches are not giving promising results in some
applications of the healthcare sector. So, this chapter gives an overview of state-of-
the-art image processing algorithms as well as highlights its limitations. Most deep
learning algorithm implementations focus on the images of digital histopathology,
computerized tomography, mammography, and X-rays. This work offers a thorough
analysis of the literature on the classification, detection, and segmentation of medical
image data. This review aids the researchers in considering necessary adjustments to
deep learning algorithm-based medical image analysis. Further, the applications of
medical image processing using Artificial Intelligence (AI), machine learning (ML),
and deep learning in the healthcare sector are discussed in this chapter.
11.1 Introduction
Medical image processing plays an important role in identifying a variety of dis-
eases. Earlier the datasets which were available for analyzing the medical images
were very small. Nowadays, large datasets are available for interpreting medical
images. To analyze these large image datasets, various highly experienced medical
experts or radiologists are required. The number of patients outnumbered the
number of available medical experts. Further, there is a high probability that the
analysis done by medical experts is more prone to human errors. In order to avoid
this problem, various machine learning algorithms are used to automate the process
of medical image analysis [1]. Various image feature extraction and feature
1
School of Electronics and Electrical Engineering, Lovely Professional University, India
188 Deep learning in medical image processing and analysis
selection methods are used for analyzing the medical images where the system is
developed to train the data. Nowadays, neural network (NN), convolutional neural
networks (CNN), and deep learning methods give a remarkable effect in the field of
science. These methods not only give improvements in analyzing medical images
but also use artificial intelligence to automate the detection of various diseases [2].
With the advent of machine learning algorithms, the medical images are possible to
be analyzed more accurately as compared to the existing algorithms.
Zhu et al. [3] used a Memristive pulse coupled neural network (M-PCNN) for
analyzing the medical images. The results in this chapter proved that the network can
be further used for denoising medical images as well as for extracting the features of
the images. Tassadaq Hussain [4] proposed an architecture for analyzing medical
images or videos. Rajalakshmi et al. [5] proposed a model for the retina which is used
for detecting the light signal through the optic nerve. Li et al. [6] exploited deep neural
networks and hybrid deep learning models for predicting the age of humans by using
3D MRI brain images. Maier et al. [7] give an overview of analyzing medical images
using deep learning algorithms. Selvikvag et al. [8] used machine learning algorithms
such as artificial neural networks and deep neural networks on MRI images. Fourcade
et al. [9] analyzed medical images using deep learning algorithms for improving visual
diagnosis in the health sector. The authors also claimed that these novel techniques are
not only going to replace the expertise of medical experts but they may automate the
process of diagnosing various diseases. Litjens et al. [10] used machine learning
algorithms for analyzing cardiovascular images. Zhang et al. [11] proposed a synergic
deep learning model using deep convolutional neural networks (DCNN) for classifying
the medical images on four datasets. Further, Kelvin et al. [12] discussed several
challenges that are associated with a diagnosis of cardiovascular diseases by using deep
learning algorithms. Various authors [13–17] used deep learning algorithms for image
segmentation, image classification, and pattern recognition, as well as detecting several
diseases by finding meaningful interpretations of medical images.
Thus, it is concluded that machine learning algorithms, deep learning algo-
rithms, and artificial intelligence plays a significant role in medical image proces-
sing and its analysis. Machine learning algorithms not only extract the hidden
information from medical images but also facilitate doctors for predicting accurate
information about diseases. The genetic variations in the subjects are also analyzed
with the help of machine learning algorithms. It is also observed that machine
learning algorithms process medical images in raw form and it takes more time to
tune the features. Although it shows significantly good accuracies in detecting the
diseases as compared to the conventional algorithms. Deep learning algorithms
show promising results and superior performance in the automated detection of
diseases in comparison to machine learning algorithms.
Support vector
machine Agglomerative
clustering Deep Q Network
(DQN)
Decision trees
K-means clustering
Deep deterministic
Neural networks policy gradient
(DDPG)
Density-based
spatial clustering of
k-nearest neighbors applications with
noise (DBSCAN)
into three parts such as supervised learning, unsupervised learning, and reinforce-
ment learning as shown in Figure 11.1.
network. Any complex data can be analyzed by adding more layers to the deep neural
network [22]. It shows superior performance in various applications of analyzing
medical images such as identifying cancer in the blood, etc. [23–25]. It has the fol-
lowing limitations. The position and orientation of an object are not encoded by
CNN. It is not capable of spatial invariance with respect to the input data.
Collecting data
Choosing a model
Parameter tuning
Making predictions
11.5.1 Histopathology
Histopathology is the study of human tissues under a microscope using a sliding
glass to determine various diseases including kidney cancer, lung cancer, breast
cancer, and others. In histopathology, staining is utilized to visualize a particular
area of the tissue [26].
Deep learning is rapidly emerging and improving histopathology images. The
challenges in analyzing multi-gigabyte whole slide imaging (WSI) images for
developing deep learning models were discussed by Dimitriou et al. [27]. In their
discussion of many public “Grand Challenges,” Serag et al. [28] highlight deep
learning algorithm innovations in computational pathology.
lesions. CT scans also identify pulmonary nodules [29]. To make an early diagnosis
of lung cancer, malignant pulmonary nodules must be identified [30,31].
Li et al. [32] proposed deep CNN for identifying semisolid, solid, and ground-
glass opacity nodules. Balagourouchetty et al. [33] suggested a GoogLeNet-based
ensemble FCNet classifier for classifying liver lesions.
Three modifications are made to the fundamental Googlenet architecture for
feature extraction. To detect and classify the lung nodules, Masood et al. [34]
presented a multidimensional region-based fully convolutional network (mRFCN),
which exhibits 97.91% classification accuracy. Using supervised MSS U-Net and
3DU-Net, Zhao and Zeng (2019) [35] suggested a deep-learning approach to
autonomously segment kidneys and kidney cancers from CT images. Further, Fan
et al. [36] and Li et al. [37] used deep learning-based methods for COVID-19
detection from CT images.
11.5.3 Mammograph
Mammograph (MG) is the most popular and reliable method in order to find breast
cancer. MG is used to see the structure of the breasts in order to find breast illnesses
[38]. A small fraction of the actual breast image is made up of cancers, making it
challenging to identify breast cancer on mammography screenings. There are three
processes in the analysis of breast lesions from MG: detection, segmentation, and
classification [39]. Active research areas in MG include the early detection and auto-
matic classification of masses. The diagnosis and classification of breast cancer have
been significantly improved during the past ten years using deep learning algorithms.
Fonseca et al. [40] proposed a breast composition categorization model by
using the CNN method. Wang et al. [41] introduced a novel CNN model to identify
Breast Arterial Calcifications (BACs) in mammography images. Without involving
humans, a CAD system was developed by Ribli et al. [42] for identifying lesions.
Wu et al. [43] also developed a deep CNN model for the classification of breast
cancer. A deep CNN-based AI system was created by Conant et al. [44] for
detecting calcified lesions.
11.5.4 X-rays
The diagnosis of lung and heart conditions such as hypercardiac inflation, atelec-
tasis, pleural effusion, and pneumothorax, as well as tuberculosis frequently
involves the use of chest radiography. Compared to other imaging techniques, X-
ray pictures are more accessible, less expensive, and dose-effective, making them
an effective tool for mass screening [45].
It was suggested to develop the first deep CNN-based TB screening system by
Hwang et al. [46] in 2016. Rajaraman et al. [47] proposed modality-specific
ensemble learning for the detection of abnormalities in chest X-rays (CXRs).
The abnormal regions in the CXR images are visualized using class selective
mapping of interest (CRM). For the purpose of detecting COVID-19 in CXR pic-
tures, Loey et al. [48] suggested a GAN with deep transfer training. More CXR
images were created using the GAN network as the COVID-19 dataset was not
196 Deep learning in medical image processing and analysis
11.6 Conclusion
Deep learning and machine learning algorithms showed promising results in ana-
lyzing medical images as compared to conventional machine learning algorithms.
This chapter discusses several supervised, unsupervised, and reinforcement learn-
ing algorithms. It gives a broad overview of deep learning algorithm-based medical
image analysis. After 10–20 years, it’s anticipated that most daily tasks would be
automated with the use of deep learning algorithms. The replacement of humans in
the upcoming years will be the next step, especially in diagnosing medical images.
For radiologists of the future, deep learning algorithm can support clinical choices.
Deep learning algorithm enables untrained radiologists to make decisions more
easily by automating their workflow. By automatically recognizing and categoriz-
ing lesions, a deep learning algorithm is designed to help doctors diagnose patients
more accurately. By processing medical image analysis more quickly and effi-
ciently, deep learning algorithm can assist doctors in reducing medical errors and
improving patient care. As the healthcare data is quite complex and nonstationary,
it is important to select the appropriate deep-learning algorithm to deal with the
challenges of medical image processing. Thus, it is concluded that there are
numerous opportunities to exploit the various machine learning and deep learning
algorithms for enhancing the use of medical images in the healthcare industry.
Conflict of interest
None.
References
[1] Zhou Z., Rahman Siddiquee M.M., Tajbakhsh N., and Liang J. ‘UNet++: a
nested U-Net architecture for medical image segmentation’. In Proceedings
of the Deep Learning in Medical Image Analysis and Multimodal Learning
for Clinical Decision Support—DLMIA 2018, Granada, Spain, 2018.
Springer International Publishing: New York, NY, 2018; 11045, pp. 3–11.
[2] Litjens G., Kooi T, Bejnordi B. E., et al. ‘A survey on deep learning in
medical image analysis’. Medical Image Analysis. 2017; 42:60–88.
[3] Song Z., Lidan W., and Shukai D. ‘Memristive pulse coupled neural network with
applications in medical image processing’. Neurocomputing. 2017; 27:149–157.
[4] Hussain T. ‘ViPS: a novel visual processing system architecture for medical
imaging’. Biomedical Signal Processing and Control. 2017; 38:293–301.
[5] Rajalakshmi T. and Prince S. ‘Retinal model-based visual perception:
applied for medical image processing’. Biologically Inspired Cognitive
Architectures. 2016; 18:95–104.
Impact of machine learning and deep learning in medical image analysis 197
[6] Li Y., Zhang H., Bermudez C., Chen Y., Landman B.A., and Vorobeychik
Y. ‘Anatomical context protects deep learning from adversarial perturba-
tions in medical imaging’. Neurocomputing. 2020; 379:370–378.
[7] Maier A., Syben C., Lasser T., and Riess C. ‘A gentle introduction to deep
learning in medical image processing’. Zeitschrift für Medizinische Physik.
2019; 29(2):86–101.
[8] Lundervold A.S. and Lundervold A. ‘An overview of deep learning in
medical imaging focusing on MRI’. Zeitschrift fur Medizinische Physik.
2019; 29(2):102–127.
[9] Fourcade A. and Khonsari R.H. ‘Deep learning in medical image analysis: a
third eye for doctors’. Journal of Stomatology, Oral and Maxillofacial
Surgery. 2019; 120(4):279–288.
[10] Litjens G., Ciompi F., Wolterink J.M., et al. State-of-the-art deep learning in
cardiovascular image analysis. JACC: Cardiovascular Imaging. 2019; 12
(8):1549–1565.
[11] Zhang J., Xie Y., Wu Q., and Xia Y. ‘Medical image classification using
synergic deep learning’. Medical Image Analysis. 2019; 54:10–19.
[12] Wong K.K.L., Fortino G., and Abbott D. ‘Deep learning-based cardiovas-
cular image diagnosis: a promising challenge’. Future Generation Computer
Systems. 2020; 110:802–811.
[13] Sudheer KE. and Shoba Bindu C. ‘Medical image analysis using deep
learning: a systematic literature review. In Emerging Technologies in
Computer Engineering: Microservices in Big Data Analytics. ICETCE 2019.
Communications in Computer and Information Science Springer: Singapore,
2019, p. 985.
[14] Ker J., Wang L., Rao J., and Lim T. ‘Deep learning applications in medical
image analysis’. IEEE Access. 2018; 6:9375–9389.
[15] Zheng Y., Liu D., Georgescu B., Nguyen H., and Comaniciu D. ‘3D deep
learning for efficient and robust landmark detection in volumetric data’. In.
LNCS, Springer, Cham, 2015; 9349:565–572.
[16] Suzuki K. ‘Overview of deep learning in medical imaging’. Radiological
Physics and Technology. 2017; 10:257–273.
[17] Suk H.I. and Shen D. ‘Deep learning-based feature representation for AD/
MCI classification’. LNCS Springer, Heidelberg. Medical Image Computing
and Computer Assisted Intervention. 2013; 16:583–590.
[18] Esteva A., Kuprel B., Novoa R.A., et al. ‘Dermatologist-level classification
of skin cancer with deep neural networks’. Nature. 2017; 542:115–118.
[19] Cicero M., Bilbily A., Colak E., et al. Training and validating a deep con-
volutional neural network for computer-aided detection and classification of
abnormalities on frontal chest radiographs. Investigative Radiology. 2017;
52:281–287.
[20] Zeiler M.D. and Fergus R. ‘Visualizing and understanding convolutional
networks’. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.),
Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer
Science, 8689; 2014 Springer, Cham.
198 Deep learning in medical image processing and analysis
[36] Fan D.-P., Zhou T., Ji G.P., et al. ‘Inf-Net: automatic COVID-19 lung
infection segmentation from CT scans’. IEEE Transactions on Medical
Imaging. 2020; 39(8):2626–2637.
[37] Li L., Qin L., Xu Z., et al. ‘Artificial intelligence distinguishes COVID-19
from community acquired pneumonia on chest CT’. Radiology. 2020; 296
(2):E65–E71.
[38] Gardezi S.J.S., Elazab A., Lei B., and Wang T. ‘Breast cancer detection and
diagnosis using mammographic data: systematic review’. Journal of Medical
Internet Research. 2019; 21(7):1–22.
[39] Shen L., Margolies L.R., Rothstein J.H., Fluder E., McBride R., and Sieh W.
‘Deep learning to improve breast cancer detection on screening mammo-
graphy’. Scientific Reports. 2019; 9(1):1–13.
[40] Fonseca P., Mendoza J., Wainer J., et al. ‘Automatic breast density classi-
fication using a convolutional neural network architecture search procedure’.
In Proceedings of Medical Imaging 2015: Computer Aided Diagnosis, 2015;
p. 941428.
[41] Wang J., Ding H., Bidgoli F.A., et al. ‘Detecting cardiovascular disease from
mammograms with deep learning’. IEEE Transactions on Medical Imaging.
2017; 36(5):1172–1181.
[42] Ribli D., Horvath A., Unger Z., Pollner P., and Csabai I. ‘Detecting and
classifying lesions in mammograms with deep learning’. Scientific Reports.
2018; 8(1):4165.
[43] Wu N., Phang J., Park J., et al. ‘Deep neural networks improve radiologists’
performance in breast cancer screening’. IEEE Transactions on Medical
Imaging. 2020; 39:1184–1194.
[44] Conant E.F., Toledano A.Y., Periaswamy S., et al. ‘Improving accuracy and
efficiency with concurrent use of artificial intelligence for digital breast
tomosynthesis’. Radiology: Artificial Intelligence. 2019; 1(4):e180096.
[45] Candemir S., Rajaraman S., Thoma G., and Antani S. ‘Deep learning for
grading cardiomegaly severity in chest x-rays: an investigation’. In
Proceedings of IEEE Life Sciences Conference (LSC). 2018, pp. 109–113.
[46] Hwang S., Kim H.-E., Jeong J., and Kim H.-J. ‘A novel approach for
tuberculosis screening based on deep convolutional neural networks’. In
Proceedings of Medical Imaging 2016: Computer Diagnosis. 2016; 9785,
p. 97852W.
[47] Rajaraman S. and Antani S.K. ‘Modality-specific deep learning model
ensembles toward improving TB detection in chest radiographs’. IEEE
Access. 2020; 8:27318–27326.
[48] Loey M., Smarandache F., and Khalifa N.E.M. ‘Within the lack of chest
COVID-19 X-ray dataset: a novel detection model based on GAN and deep
transfer learning’. Symmetry. 2020; 12(4):651.
[49] Waheed A., Goyal M., Gupta D., Khanna A., Al-Turjman F., and Pinheiro P.
R. ‘CovidGAN: data augmentation using auxiliary classifier GAN for
improved Covid-19 detection’. IEEE Access. 2020; 8:91916–91923.
This page intentionally left blank
Chapter 12
Systemic review of deep learning techniques for
high-dimensional medical image fusion
Nigama Vykari Vajjula1, Vinukonda Pavani2, Kirti Rawal3
and Deepika Ghai3
In recent years, the research on medical image processing techniques plays a major role
in providing better healthcare services. Medical image fusion is an efficient approach
for detecting various diseases in different types of images by combining them to make
a fused image in real-time. The fusion of two or more imaging modalities is more
beneficial for interpreting the resulting image than just one image, particularly in
medical history. The fusion of two images refers to the process of the combined output
generated from multiple sensors to extract more useful information. The application of
deep learning techniques has continuously proved to be more efficient than conven-
tional techniques due to the ability of neural networks to learn and improve over time.
Deep learning techniques are not only used due to reduced acquisition time but also
to extract more features for the fused image. So, in this chapter, the review of image fusion
techniques proposed in recent years for high-dimensional imaging modalities like MRI
(magnetic resonance imaging), PET (positron emission tomography), SPECT (single
photon emission-computed tomography), and CT (computed tomography) scans are pre-
sented. Further, a comparative analysis of deep learning algorithms based on convolu-
tional neural networks (CNNs), generative models, multi-focus and multi-modal fusion
techniques along with their experimental results are discussed in this chapter. Afterward,
this chapter gives an overview of the recent advancements in the healthcare sector, the
possible future scope, and aspects for improvements in image fusion technology.
12.1 Introduction
Raw image data in most cases have limited information. For example, the focus
location is different and the object closest or farther from the image appears to be
blurred. In cases of medical diagnosis, it is confusing and difficult for the doctor to
identify the problem and provide better care. Image fusion technology is increasingly
1
DRDO New Delhi, India
2
Department of Biomedical Engineering, Manipal hospital, India
3
Lovely Professional university, India
202 Deep learning in medical image processing and analysis
being applied in diagnosing diseases as well as analyzing patient history. There are
different types of classification schemes that work toward fighting these anomalies
and getting as much data from the image as possible [1].
It is evident from recent studies that image fusion techniques like multi-level,
multi-focus, multimodal, pixel-level, and others can aid medical practitioners to
arrive at a better, unbiased decision based on the quantitative assessment provided
by these methods. Image fusion can be studied primarily in four categories such as
(i) signal level, (ii) pixel level, (iii) feature level, and (iv) decision level which will
be further explored in the upcoming sections as shown in Figure 12.1.
High-dimensional imaging modalities like CT, MRI, PET, and SPECT are
prevalent imaging techniques which are used in medical diagnosis where the
information is captured from several angles. In clinical settings, there are many
problems in the comparison and synthesis of image formats such as CT with PET,
MRI with PET, and CT with MRI. So, in order to produce more fruitful information
for medical diagnosis, it is necessary to combine images from multiple sources.
Although, it is very difficult to show clear views of organs in our body for identi-
fying life-threatening diseases like cancer. Tumors in the brain can be detected by
fusing MRI and PET images. Further, abdomen-related problems can be identified
by fusing the SPECT and CT scans, and fusion of ultrasound images with MRI
gives the vascular blood flow analysis [2]. This procedure is termed as multimodal
image fusion which will further be discussed in this chapter.
Image Fusion
Laplacian Pyramid
• Simple Average Discrete Transform Based Image
Decomposition-
• Maximum Fusion
based Image Fusion
• Minimum
• Max–Min • Wavelet Transform
• Simple Block Replace • Kekre’s Wavelet
• Weighted Averaging Transform
• Hue Intensity Saturation • Kekre’s Hybrid Wavelet
• Brovey Transform Transform
method • Stationary Wavelet
• Principle Component Transform
Analysis • Combination of Curvelet
• Guided Filtering and Stationary Wavelet
Transform
Image fusion
Brovey
Region-based Fusion based on
segmentation support vector
machine
PCA
K‐means clustering
Fusion based on
Wavelet transform
information level in
the region of images
Similarity matching
Intensity hue to content image
saturation transform retrieval
limitation is that they rely heavily on the accurate assessment of weights for different
pixels. If the estimation is not accurate, then it limits the performance of fusion.
Image fusion can be done by taking the average pixel intensity values for
fusing the images. These are also called averaging methods and don’t require
prerequisite information about the images. There are also techniques based on prior
information which can be more beneficial in terms of medical diagnosis.
While dealing with the fusion of medical images, radiologists must be aware of
the input images such as PET with CT or MRI with PET. Pixel-based techniques
also use fuzzy logic to handle imprecise information from the images received from
the radiologists. The models that can be built using fuzzy logic are mentioned in
detail in [5]. Fuzzy inference system (FIS) is the multimodal image fusion techni-
ques used in medical diagnosis. By selecting the right parameters to compute these
models, good results can be obtained with less computational cost [3]. The main
disadvantages of these approaches are the requirement of large data for processing
which further decreases the contrast of the fused image.
transformation process of medical image fusion, there are three steps involved
which are clearly described in [3]. Fourier transform and wavelet transforms are
one of the famous techniques used for medical image processing. Wavelet trans-
form covers the time domain information that cannot be obtained from Fourier
transform [6].
Another significant transformation method is the contourlet transform which
has better efficiency and directionality [7]. The contourlet transform is different
from other transforms by accepting input at every scale. It also obtains high levels
of efficiency in image representation which further produces redundancy. Despite
its drawbacks, contourlet transform is popular due to its fixed nature which is the
main feature in improving the efficiency of image fusion.
on the quality of the training and testing images which varies with different ima-
ging conditions.
The traditional fusion methods [1,3,4] make use of mathematical transforma-
tions for manually analyzing the fusion rules in spatial and transform domains. The
drawbacks of these techniques have been very apparent and there was a need to
introduce deep learning algorithms for adopting innovative transformations in
feature extraction and feature classification. Deep learning is a way to improve
medical image fusion, by taking advantage of better-level measurements and well-
designed loss functions to obtain more targeted features. There are numerous
methods proposed over time which address the problems with the previous ones or
introduce entirely new methods. Some methods are better than others because they
can be used for batch processing (processing multiple images at once) and it results
in images with better detail and clarity. In addition, it advances the proficiency of
detecting diseases and reduces the time to recover from the suggested cures.
The initial steps of developing a deep learning model involve pre-processing a
large number of images and then dividing them into training and testing data sets.
Afterward, the model for fusing the images and the related optimized factors are
created. The final step is to test the model by inputting several sets of images and
batch-processing multiple group images. The two famous methods in recent years
for achieving effective medical image fusion are CNN- and generative adversarial
network (GAN)-based techniques. Here, we focus on deep learning methods that
are particularly useful in medical diagnosis and imaging modalities.
IFCNN is another general image fusion framework that comprises three main
components: (i) feature extraction, (ii) feature fusion, and (iii) image reconstruction
[17]. For training the model, the image dataset has been generated. The perceptual
loss has been introduced for generating fused images that are more similar to the
ground-truth fused images.
information. The proposed method claims to address the weakness of active feature
fusion of the traditional methods by manual design through which it can process the
intermediate layer to avoid information loss.
GFPPC-GAN [22] introduces GAN by fusing generative facial prior (GFP) and
PC images, which an employs adversarial learning process between the PC image
and the fused image for improving the quality of information present in the image.
Although GANs can perform exceptionally well in medical image fusion, the
intensity level in the pixels in the functional image is far greater than the structural
information in the image. Most GAN models introduce a new challenge to medical
image fusion using GANs as the probability of feature imbalance can be frequent
[23–27].
The optimization methods for deep learning in image fusion include noise reduc-
tion, image registration, and other pre-processing techniques applied to a variety of
images. These large numbers of images in the datasets will be further divided into
training datasets and testing datasets as per the application. Afterward, optimization
techniques will be used for classifying the images. In model learning, the metrics of
the model are learned by assigning labels to various images (training). In the later
step, the testing is done for predicting the output for unknown input. The final
iteration of the test subjects gives the fused image.
Systemic review of deep learning techniques 209
Image dataset
Preprocessing Preprocessing
Image fusion
12.4.1 Evaluation
Operational efficiency is the most significant factor for measuring the performance
of fusion using deep learning methods. Experiment results on the public clinical
diagnostic medical image dataset show that the GAN-based algorithms have tre-
mendous detail preservation features and it can remove the artifacts which leads to
superior performance in comparison to other methods. The GAN- and CNN-based
methods are reported to have results that are high in efficiency due to their common
characteristics such as simple network architecture and low model parameters. A
simple network structure, more appropriate tasks-specific constraints, and optimi-
zation methods can be designed, achieving good accuracy and efficiency. The
advancement of these algorithms allows researchers to analyze the properties of
image fusion tasks before increasing the size of the neural network [37–43].
12.5 Conclusion
Medical image fusion plays an essential role in providing better healthcare services.
Due to the advancements in multi-focus image fusion methods, the existing
210 Deep learning in medical image processing and analysis
classification methods failed to accurately position all images. It is concluded from the
literature that deep learning techniques give superior performance in fusing medical
images and provide insights into each of those techniques. Medical image fusion is a
technique that can be used for the diagnosis and assessment of medical conditions. In
this chapter, we present a summary of the major modalities that are used for medical
imaging fusion, their applications in diagnosis, assessment, and treatment, as well as a
brief overview of the fusion techniques and evaluations based on the observed data.
References
[1] Deepak Kumar S. and Parsai M.P. ‘Different image fusion techniques–a
critical review’. International Journal of Modern Engineering Research
(IJMER). 2012; 2(5): 4298–4301.
[2] Benjamin Reena J. and Jayasree T. ‘An efficient MRI-PET medical image
fusion using non-subsampled shearlet transform’. In Proceedings of the
IEEE International Conference on Intelligent Techniques in Control,
Optimization and Signal Processing (INCOS), 2019. pp. 1–5.
[3] Galande A. and Patil R. ‘The art of medical image fusion: a survey’. In
Proceedings of the 2013 International Conference on Advances in
Computing, Communications and Informatics (ICACCI), 2013. pp. 400–405.
[4] Chetan Solanki K. and Narendra Patel M. ‘Pixel based and wavelet based
image fusion methods with their comparative study’. In Proceedings of the
National Conference on Recent Trends in Engineering & Technology, 2011.
[5] Irshad H., Kamran M., Siddiqui A.B., and Hussain A. ‘Image fusion using
computational intelligence: a survey’. In Proceedings of the Second
International Conference on Environmental and Computer Science, ICECS ’09,
2009. pp. 128–132.
[6] Guihong Q., Dali Z., and Pingfan Y. ‘Medical image fusion by wavelet
transform modulus maxima’. Optics Express. 2001; 9: 184–190.
[7] Bing H., Feng Y., Mengxiao Y., Xiaoying M., and Cheng Z. ‘A review of
multimodal medical image fusion techniques’. Computational and
Mathematical Methods in Machine Learning. 2020; 2020: 1–16.
[8] Liu Y., Chen X., Cheng J., and Peng H. ‘A medical image fusion method based
on convolutional neural networks’. In Proceedings of the 20th International
Conference on Information Fusion (Fusion). IEEE, 2017, pp. 1–7.
[9] Liu Y., Chen X., Ward R.K., and Wang Z.J. ‘Image fusion with convolu-
tional sparse representation’. IEEE Signal Processing Letters. 2016; 23(12):
1882–1886.
[10] Liu Y., Chen X., Ward R.K., and Wang Z.J. ‘Medical image fusion via
convolutional sparsity based morphological component analysis’. IEEE
Signal Processing Letters. 2019; 26(3): 485–489.
[11] Liu Y., Liu S., and Wang Z. ‘A general framework for image fusion based on
multi-scale transform and sparse representation’. Information Fusion. 2015;
24: 147–164.
Systemic review of deep learning techniques 211
[29] Saravanan S. and Juliet S. ‘Deep medical image reconstruction with auto-
encoders using Deep Boltzmann Machine Training’. EAI Endorsed
Transactions on Pervasive Health and Technology. 2020; 6(24): 1–9.
[30] Ganasala P. and Kumar V. ‘CT and MR image fusion scheme in non-
subsampled contourlet transform domain’. Journal of Digital Imaging. 2014;
27(3): 407–418.
[31] Gomathi P.S. and Bhuvanesh K. ‘Multimodal medical image fusion in non-
subsampled contourlet transform domain’. Circuits and Systems. 2016; 7(8):
1598–1610.
[32] Gong J., Wang B., Qiao L., Xu J., and Zhang Z. ‘Image fusion method based
on improved NSCT transform and PCNN model’. In Proceedings of the 9th
International Symposium on Computational Intelligence and Design
(ISCID). IEEE, 2016. pp. 28–31.
[33] James A.P. and Dasarathy B.V. ‘Medical image fusion: a survey of the state
of the art’. Information Fusion. 2014; 19: 4–19.
[34] Kaur H., Koundal D., and Kadyan V. ‘Image fusion techniques: a survey’.
Archives of Computational Methods in Engineering. 2021;28 : 1–23.
[35] Keith A. and Johnson J.A.B. Whole brain atlas. http://www.med.harvard.
edu/aanlib/. Last accessed on 10 April 2021.
[36] Li B., Peng H., and Wang J. ‘A novel fusion method based on dynamic
threshold neural p systems and nonsubsampled contourlet transform for
multi-modality medical images’. Signal Processing. 2021; 178: 107793.
[37] Mankar R. and Daimiwal N. ‘Multimodal medical image fusion under non-
subsampled contourlet transform domain’. In Proceedings of the
International Conference on Communications and Signal Processing
(ICCSP). IEEE, 2015. pp. 0592–0596.
[38] Nazrudeen M., Rajalakshmi M.M., and Suresh Kumar M.S. ‘Medical image
fusion using non-subsampled contourlet transform’. International Journal of
Engineering Research (IJERT). 2014; 3(3): 1248–1252.
[39] Polinati S. and Dhuli R. ‘A review on multi-model medical image fusion’. In
Proceedings of the International Conference on Communication and Signal
Processing (ICCSP). IEEE, 2019. Pp. 0554–0558.
[40] Polinati S. and Dhuli R. ‘Multimodal medical image fusion using empirical
wavelet decomposition and local energy maxima’. Optik. 2020; 205: 163947.
[41] Tan W., Thiton W., Xiang P., and Zhou H. ‘Multi-modal brain image fusion
based on multi-level edge-preserving filtering’. Biomedical Signal
Processing and Control. 2021; 64: 102280.
[42] Tian Y., Li Y., and Ye F. ‘Multimodal medical image fusion based on
nonsubsampled contourlet transform using improved PCNN’. In
Proceedings of the 13th International Conference on Signal Processing
(ICSP). IEEE, 2016. pp. 799–804.
[43] Tirupal T., Mohan B.C., and Kumar S.S. ‘Multimodal medical image fusion
techniques-a review’. Current Signal Transduction Therapy. 2020; 15(1): 1–22.
Chapter 13
Qualitative perception of a deep learning
model in connection with malaria disease
classification
R. Saranya1, U. Neeraja1, R. Saraswathi Meena1 and
T. Chandrakumar1
2D array and RGB as 3D array. There are different types of layers in the CNN:
convolutional layer, pooling layer, flatten layer, and fully connected layer (dense).
Filters: The number of filters also known as kernel. This refers to the depth of the
feature map considered.
Kernel size: This specifies the height and width of the kernel (convolution) window.
It takes an integer or a tuple of two integers like (3, 3). The window is typically a
square with equal height and breadth. A square window’s size can be provided as
an integer, such as 3 for a window with the dimensions (3, 3).
Strides: The number of pixels that we move the filter over the input image. For the
steps along the height and breadth, this requires a tuple. The default setting is (1, 1).
Padding: There are two choices Valid or same. valid refers to no padding. Similar
outcomes when padding with zeros such that the feature map’s size is the same as
the input’s size when strides are equal to 1.
Padding: The feature map is given padding to change the size of the combined
feature map.
Pool_size: This specifies the size of the pooling window and by default, it is (2,2).
Use a filter with the same number of channels as the feature map if it contains
multiple channels. Each channel will get the pooling operations on an autonomous basis.
Not all kernels can be “split” into two smaller kernels, the spatial separable
convolution has a major flaw. This is especially problematic during training
because the network can only use a tiny portion of the kernels that can be divided
into two smaller kernels out of all those that it could have adopted.
12 3 1
3
5 8
5
8
12
1
1
1
3 12 8
5 3
5
8
12
8 1 3 8
1
3
1
8
8
models that may encounter an overfitting problem and on layers with larger
kernels.
13.4 Implementation
This part implements the whole categorization procedure for malaria cells. The first
stage is to gather all the data needed for the procedure. The data in this case was
gathered from a variety of sources, including Kaggle and several medical sites
DL model in connection with malaria disease classification 219
(National Library of Medicine). The training pictures folder and the testing images
folder make up the dataset. Create a new folder in this folder called “single pre-
diction” to forecast the class of the image based on the model learned using the data
in the training and testing folders. Two subfolders, namely, parasitized and unin-
fected, may be found in the training and testing folder. Red blood cells at the
microscopic level were included in the photos. Images of cells afflicted by the
malarial sickness are found in the folder “Parasitized.” These pictures demonstrate
how the user has been impacted by malaria. Another folder contains human red
blood cells that have not been infected with the malarial illness.
Hence, two CNNs are built for the dataset, one of which has a convolution
layer, a max pooling layer, a flatten layer, and a fully connected layer, while the
other has a pointwise and depthwise layers, a max pooling layer, a flatten layer, and
a fully connected layer. This research compares two alternative architectural
designs on a crucial picture categorization issue. To begin putting this method into
practice, select the appropriate integrated developed environment (IDE). Python
3.7.12 was used in conjunction with Jupyter Notebook version 6.1.4 to develop this
approach. Importing sequential from keras.models for a simple stack of layers with
each layer having precisely one input tensor and one output tensor, as well as all the
necessary packages to implement the architecture from the package keras.layers,
such as Dense, Conv2D, MaxPool2D, Flatten, and SeparatableConv2D.
IDE loads the dataset using the “ImageDataGenerator” function. This func-
tion’s task is to import the dataset in its current state. The dataset import has to be
224224 pixels in size. The next step is to construct a fully connected layer of a
CNN. In this CNN architecture, the convolution layer contains filter sizes of 64,
128, 256, and 512. The input size of the first convolution layer was (224, 224, 3).
The input size for each layer is (224, 224, 3), the maximum pooling layer is 22,
and the CNN concludes with one flattened layer since the photos in the folders are
red, green, and blue (RGB) images. The result of the flattened layer is fed to the
completely connected layer. Two neurons connect the flattened layer to the output
layer, and here, the sigmoid activation function is utilized, whereas the ReLU
activation function was used for previous layers.
Similar to the first CNN, the second one was built in a similar manner. With
the exception of the convolution layer utilized for the input, the prior design
replaced all convolution layers with separable convolution layers. The convolution
layer’s number of neurons is the same as what was previously employed. In this
design, the sigmoid function was utilized as the activation function for the output,
while the ReLU function was employed for the remaining layers. The number of
parameters in the normal CNN was 1,788,610 but the versa in the separable CNN
was 384,194 as shown in Figure 13.4.
After construction, the architecture was put together using the metrics “binary
accuracy,” the optimizer “adam,” and the loss function “MSE.” The average of the
square of the difference between the actual and anticipated value is used to
compute MSE as shown in Figure 13.5.
Adam is an optimization strategy that can be used to iteratively update network
weights based on training data as opposed to the standard stochastic gradient
Feature extraction classifier
CONV LAYER1 CONV LAYER2 CONV LAYER3 CONV LAYER4 FULLY CONNECTED
INFECTED OR
FLATTEN LAYER OUTPUT LAYER
POOLING LAYER1 POOLING LAYER2 POOLING LAYER3 POOLING LAYER4 LAYER UNINFECTED
13.5 Result
Malaria, a potentially fatal blood illness, is spread by mosquitoes. Fever,
exhaustion, nauseousness, and headaches are typical malaria symptoms. As a
result, test data are used to evaluate the models as shown in figure 13.6. In con-
trast to normal neural networks with convolution layers, which train and validate
the model with a greater accuracy loss, pointwise and depthwise neural networks
experienced a much lower accuracy loss as shown in Figure 13.7.
0.080
Loss
0.075
0.070
0.065
0.060
0.505
Loss
0.500
0.495
13.6 Conclusion
The most troublesome sickness for people of all ages is malaria, we may infer from
this. Even if it is uncommon in areas with a moderate environment, malaria is
nevertheless common in countries that are tropical or subtropical. Therefore, this
research concludes that a microscopic red blood cell picture may be classed as an
uninfected cell or parameterized cell, therefore providing a judgment as to whether
the person was afflicted or unaffected by the malaria sickness. The key message is
to demonstrate how a separable CNN performs better than a conventional con-
volution neural network that was built.
References
[1] Arunkumar, T. R. and Jayanna, H. S. (2022). A novel light-weight approach
for the classification of different types of psoriasis disease using depth wise
separable convolution neural networks. Indian Journal of Science and
Technology, 15(13), 561–569.
[2] Zhang, Y., Wang, H., Xu. R., Yang, X., Wang, Y., and Liu, Y. High-
precision seedling detection model based on multi-activation layer and
depth-separable convolution using images acquired by drones. Drones.
2022; 6(6):152. https://doi.org/10.3390/drones6060152
[3] Hassan, E. and Lekshmi, V. L. Scene text detection using attention with
depthwise separable convolutions. Applied Sciences. 2022; 12(13):6425.
https://doi.org/10.3390/app12136425
[4] Zhu, Z., Wang, S., and Zhang, Y. (2022). ROENet: a ResNet-based output
ensemble for malaria parasite classification. Electronics, 11(13), 2040.
[5] Sengar, N., Burget, R., and Dutta, M. K. (2022). A vision transformer based
approach for analysis of plasmodium vivax life cycle for malaria prediction
DL model in connection with malaria disease classification 223
Glaucoma is an eye condition that, in its later stages, can cause blindness. It is
caused by a damaged optic nerve and has few early signs. A glaucomatous eye can
be diagnosed using perimetry, tonometry, and ophthalmoscopy. The fundamental
criterion for pre-primary glaucoma (PPG) is the presence of a glaucomatous eye
image, fundus, or either in the presence of an apparently normal visual field (VF).
The most common way for defining an aberrant VF using conventional-automated
perimetry is Anderson and Patella’s criterion. This study describes a deep learning
technique for analyzing fundus images for glaucoma that is generic. The research
design is focused on various conditions on several samples and architectural
designs, unlike previous studies. The results show that the model is either the same
as or better than what has been done before. The suggested prediction models
exhibit precision, sensitivity, and specificity in distinguishing glaucomatous eyes
from healthy eyes. Clinicians can utilize the prediction results to make more
informed recommendations. We may combine various learning models to enhance
the precision of our predictions. The CNN model includes decision rules for
making predictions. It can be used to describe the reasons for specific predictions.
14.1 Introduction
Glaucoma is frequently associated with an increase in stress within the sight.
Ophthalmology seems to be a family trait, and it is typically not diagnosed until late
adulthood. The retina, which delivers visual data to the brain, can be harmed by
increased eye stress. In a few years, glaucoma can cause irreparable vision loss or
perhaps total blindness if the disease continues. The majority of glaucoma patients
do not experience early pain or symptoms. Regular visits to an ophthalmologist are
necessary so that glaucoma can be diagnosed and treated before an irreversible
1
Thiagarajar College of Engineering, India
226 Deep learning in medical image processing and analysis
visual loss occurs. A person’s vision cannot be restored once it is lost. However,
reducing his eye pressure will help him maintain his vision. The majority of glau-
coma patients who adhere to their medication regimen and get routine eye exams
are able to maintain their vision. Every human has both an optic disk and a cup, but
glaucomatous eyes have an abnormally broad cup compared to the optic disk.
Generally, glaucoma is diagnosed by an ophthalmologist analyzing the patient’s
photos and identifying any irregularities. Due to image noise and other factors that
make precise analysis difficult, this technique is very time-consuming and not
always accurate. In addition, if a machine is taught to conduct analysis, it even-
tually gets more efficient than human analysis.
Our eyes are quite often employed if the person’s body has multiple senses.
Visual processing requires a considerable portion of the intellect. Glaucoma, which
is frequently due to a rise in hypertension, is an important responsibility of per-
manent loss of sight globally. Early glaucoma perception is challenging, but it is
treatable [1]. Globally, glaucoma is the leading reason for permanent blindness and
has a progressive effect on the optic nerve [2]. Analysis of glaucoma is determined
by the healthcare history of the person, intraocular pressure, the width of the layer
of visual nerve impulses, and modifications to the structure of the optic disk,
especially length across, size, and region. In 2013, there were 64.3 million cases of
glaucoma among those aged 40–80 years, according to a survey [3]. Currently, it is
detected using four tests: (1) identification of high intraocular pressure, (2) eva-
luation of optic disk injury using the optic neuropathies ratio, (3) estimation of
choroidal thickness, and (4) identification of typical line of sight abnormalities.
Combining diagnostic organization and function techniques such as non-invasive
diagnostic and field of vision evaluation, glaucoma can be diagnosed [4]. Deep
learning algorithms have enhanced computer vision in recent years and are now a
part of our life [5]. The methods for machine learning are suitable for glaucoma
diagnosis. Parallelization and functional approaches are the two most used techni-
ques for diagnosing glaucoma. Glaucoma is diagnosed utilizing digitally acquired
fundus images. In recent chapters, the researchers proposed a plan for computerized
ophthalmology diagnosis and classification by extracting features from cup seg-
mentation [6]. For the computer-aided system, segmenting the blind spot and optic
neuropathies regions is a difficult process. Combination of enhanced image tech-
niques and field of study is required to identify the attributes with the highest
degree of bias. Methods for diagnosing images of the fundus of the eye are estab-
lished on the edge detection of vascular structures and the optic disk. Using the
textural characteristics of digital fundus images, nerve fiber layer damage is
detected [7].
Developing a computerized approach for detecting glaucoma by analyzing
samples is the purpose of this project. This framework includes the gathering of a
visual image dataset, pre-processing to decrease image noise, feature routine pro-
cess, and the grouping of images as glaucomatous or not. For learning inputs, a
neural network architecture based on convolutions will be responsible. Various
performance measures and receiver operating attributes/areas under the curve true
positive rate are frequently applied as evaluation criteria for diagnostic systems.
Deep learning classifier and CNN layer-automated perimetry 227
A database containing retinal fundus images from patients at a medical center will
be utilized to evaluate our suggested framework.
14.3 Methodology
Required methodological sections describe the suggestion of deep CNN procedures
for identifying and classifying low-tension eye problems that cause problems for
the optic nerve. The current state of ocular glaucoma detection using AI and
algorithms is limited in filtering options and is laborious to implement. Image
classification using deep neural networks has been offered as a viable method. An
in-depth CNN was trained for this research with a focus on classification. We use
image collections to investigate the condition of the optical fundus.
The proposed study builds on prior work by creating and deploying a deep
CNN to detect and categorize glaucomatous eye diseases. It is composed of various
layer components of CNN. It is implemented in accordance with the seven phases
of layering as shown in Figure 14.1. The images-associated categorization label is
dense_4 (Dense)
dense_5 (Dense)
generated at the conclusion of the layer split-up analysis to help with the prediction.
The subsequent CNN network uses this categorization as an input to determine
ocular pictures (normal or glaucoma).
Input: No. of eye images with two various class labels where a [ n.
Outputs: Classification of each image and identification of glaucoma for each
image sequence.
1. Glaucoma detection estimation—CNN Layer
2. Pre-Process = Input (n)
3. Partition input into sets for training and testing
4. Eye diseases (Layer Spilt Analysis with Accuracy)
5. if finding ordinary
6. stop
7. else eye-illness
8. end if
- CONV2D - MAXPOOLING2D
Confusion matrix
140
120
0 25 49
100
True label
80
56 141 60
1
40
0
1
Predicted label
network, such as numerous layers of neurons, different learning rates, and so on.
As indicated in Figure 14.2, for network selection in this study, we employed
four distinct CNN layer approaches. Figure 14.1 illustrates how the proposed
DCNN operates. This part contains the implementation of CNN’s layers, as
mentioned below.
In this section of the DCNN architecture, four CNN layers are employed to
classify the glaucoma illness level. Deep CNN is its title (DCNN). This net-
work detects glaucoma-affected pictures using classed images from the DCNN
network. The illness degree of glaucoma is classified into two phases: normal
and glaucoma, with an early stage defining the start of the disease.
Consequently, four distinct CNN layer architectures were developed; one for
each tier of glaucoma detection in the deep classification-net phase. The design
of DCNN is depicted in Figure 14.3; we utilized four distinct layers and four
phases: filter size, kernel shape, input shape, and activation. The dimension-
ality of the variance in a deep convolutional neural network’s layers is shown
in Figure 14.2.
14.4.1 Pre-processing
The regional visual glaucoma image collection eliminates image noise with adap-
tive histogram equalization. Images of localized retinal glaucoma are gathered from
232 Deep learning in medical image processing and analysis
a variety of commercial and public medical centers. The collection includes 3,060
visual images in total. Each image falls into one of two categories, namely, normal
and glaucoma. 54% of photos belong to one class, while 46% belong to the other.
The distribution of the dataset’s training, validation, and testing subsets are pre-
sented in Table 14.1. The 710 and 820 pictures are separated into normal and
glaucoma categories images for evaluation purposes on the supplied dataset. The
collection contains test, training, and validation datasets. Three professional clin-
ical assistants were tasked with distinguishing between the two stages of glaucoma
eye illness outlined in Tables 14.1–14.3. Where experts disagreed, the majority
vote was used to label the photos.
No. Types Sensitivity (%) Specificity (%) Accuracy (%) Precision (%)
1 Glaucoma 72.63 69.23 72.85 66.65
2 Normal 92.00 95.62 88.90 93.45
Average 82.31 82.42 80.87 80.03
0.6 20
accuracy
loss
0.5 15
0.4 10
0.3
5
0.2
0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
epoch epoch
0.6
4
loss
0.5
3
0.4
2
0.3
1
0.2
0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
epoch epoch
0.6
loss
2
0.4
1
0.2
0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
epoch epoch
● The layer level 1 analysis graph yielded a poor level of accuracy, i.e., 36.45%.
● The layer level 2 analysis graph yielded a moderate level of accuracy, between
80% and 70%.
● The layer level 3 analysis graph yielded the highest accuracy, i.e., 80.87%.
14.5 Conclusion
References
[1] Abbas, Q. (2017). Glaucoma-deep: detection of glaucoma eye disease on
retinal fundus images using deep learning. International Journal of
Advanced Computer Science and Applications, 8(6), 41–45.
[2] Shaikh, Y., Yu, F., and Coleman, A. L. (2014). Burden of undetected and
untreated glaucoma in the United States. American Journal of
Ophthalmology, 158(6), 1121–1129.
[3] Tham, Y. C., Li, X., Wong, T. Y., Quigley, H. A., Aung, T., and Cheng, C.
Y. (2014). Global prevalence of glaucoma and projections of glaucoma
burden through 2040: a systematic review and meta-analysis.
Ophthalmology, 121(11), 2081–2090.
[4] Taketani, Y., Murata, H., Fujino, Y., Mayama, C., and Asaoka, R. (2015).
How many visual fields are required to precisely predict future test results in
glaucoma patients when using different trend analyses?. Investigative
Ophthalmology & Visual Science, 56(6), 4076–4082.
[5] Aamir, M., Irfan, M., Ali, T., et al. (2020). An adoptive threshold-based
multi-level deep convolutional neural network for glaucoma eye disease
detection and classification. Diagnostics, 10(8), 602.
Deep learning classifier and CNN layer-automated perimetry 235
[6] Raghavendra, U., Fujita, H., Bhandary, S. V., Gudigar, A., Tan, J. H., and
Acharya, U. R. (2018). Deep convolution neural network for accurate diagnosis
of glaucoma using digital fundus images. Information Sciences, 441, 41–49.
[7] Mookiah, M. R. K., Acharya, U. R., Lim, C. M., Petznick, A., and Suri, J. S.
(2012). Data mining technique for automated diagnosis of glaucoma using
higher order spectra and wavelet energy features. Knowledge-Based
Systems, 33, 73–82.
[8] Dua, S., Acharya, U. R., Chowriappa, P., and Sree, S. V. (2011). Wavelet-
based energy features for glaucomatous image classification. IEEE
Transactions on Information Technology in Biomedicine, 16(1), 80–87.
[9] Yadav, D., Sarathi, M. P., and Dutta, M. K. (2014, August). Classification of
glaucoma based on texture features using neural networks. In 2014 Seventh
International Conference on Contemporary Computing (IC3) (pp. 109–112).
IEEE.
[10] Chen, X., Xu, Y., Wong, D. W. K., Wong, T. Y., and Liu, J. (2015, August).
Glaucoma detection based on deep convolutional neural network. In 2015
37th Annual International Conference of the IEEE Engineering in Medicine
and Biology Society (EMBC) (pp. 715–718). IEEE.
[11] Devalla, S. K., Chin, K. S., Mari, J. M., et al. (2018). A deep learning
approach to digitally stain optical coherence tomography images of the optic
nerve head. Investigative Ophthalmology & Visual Science, 59(1), 63–74.
[12] Acharya, U. R., Ng, E. Y. K., Eugene, L. W. J., et al. (2015). Decision
support system for the glaucoma using Gabor transformation. Biomedical
Signal Processing and Control, 15, 18–26.
[13] Zilly, J., Buhmann, J. M., and Mahapatra, D. (2017). Glaucoma detection
using entropy sampling and ensemble learning for automatic optic cup and
disc segmentation. Computerized Medical Imaging and Graphics, 55, 28–41.
[14] Chai, Y., Liu, H., and Xu, J. (2020). A new convolutional neural network
model for peripapillary atrophy area segmentation from retinal fundus ima-
ges. Applied Soft Computing, 86, 105890.
[15] Li, L., Xu, M., Wang, X., Jiang, L., and Liu, H. (2019). Attention based
glaucoma detection: a large-scale database and CNN model. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(pp. 10571–10580).
[16] Liu, H., Li, L., Wormstone, I. M., et al. (2019). Development and validation
of a deep learning system to detect glaucomatous optic neuropathy using
fundus photographs. JAMA Ophthalmology, 137(12), 1353–1360.
[17] Bajwa, M. N., Malik, M. I., Siddiqui, S. A., et al. (2019). Two-stage fra-
mework for optic disc localization and glaucoma classification in retinal
fundus images using deep learning. BMC Medical Informatics and Decision
Making, 19(1), 1–16.
This page intentionally left blank
Chapter 15
Deep learning applications in
ophthalmology—computer-aided diagnosis
M. Suguna1 and Priya Thiagarajan1
Artificial intelligence (AI) is proving to be a fast, versatile, and accurate tool to aid
and support healthcare professionals in diagnosing and screening for a multitude of
diseases and disorders. Several specialties have successfully incorporated AI into
their healthcare services. The eye care specialty of ophthalmology has several
successful applications of AI in disease detection. The applications of AI to analyze
images, mainly the retinal fundus image (RFI) in ophthalmology, are proving to be
very effective tools, not only for ophthalmologists but also for other specialists
including neurologists, nephrologists, and cardiologists. The diseases that are
diagnosable using AI are discussed in detail as an essential guide for AI designers
working in the medical imaging domain in ophthalmology. The challenges and
future trends including the use of multi-disease detection systems and smartphone
RFI cameras are studied. This would be a game changer in screening programs and
rural health centers and remote locations. Intelligent systems work as an effective
and efficient tool to analyze RFI and assist healthcare specialists in diagnosing,
triaging, and screening for a variety of diseases. More testing and better models
need to be introduced to enhance the performance metrics further. More medical
image datasets need to be available in the public domain to encourage further
research. Though intelligent systems can never replace healthcare specialists, they
can potentially be life-saving and cost-effective, especially in rural and remote
locations.
15.1 Introduction
Ophthalmology is the field of medicine which has made significant strides in
employing Artificial Intelligence (AI) to analyze images to detect diseases and
disorders. In a first of its kind, the United States Food and Drug Administration (US
FDA) has approved a device that uses AI to detect diabetic retinopathy (DR) in
adult diabetics [1].
1
Department of Computer Science and Engineering, Thiagarajar College of Engineering, India
238 Deep learning in medical image processing and analysis
AI disease detection in
ophthalmology
Introduction
Ophthalmology
Neuro
ophthalmology
Systemic diseases
Challenges
Future trends
Conclusion
The eye, often referred to as the window of the soul, is now providing us a
window to our systemic health too.
This chapter deals with the medical applications of AI, more specifically deep
learning (DL) and neural networks (NN) for image analysis in ophthalmology. It is
divided into the following sections (Figure 15.1):
● Ophthalmology
● Neuro-ophthalmology
● Systemic disease detection in ophthalmology
● Challenges
● Future trends
The main image which opens up several diagnostic avenues in ophthalmology
is the retinal fundus image (RFI).
The Ophthalmology section starts with a brief description of the human eye,
the location, and also the parts of the retinal fundus. The process of retinal fundus
image capture with retinal fundus cameras is also described.
Deep learning applications in ophthalmology 239
Then we present a brief introduction to ocular diseases and evidence for the
successful use of AI for the detection and screening of the following diseases:
● Diabetic retinopathy (DR)
● Age-related macular degeneration (ARMD or AMD)
● Glaucoma
● Cataract
In the Neuro-ophthalmology section, we discuss the following diseases and the
current use of DL for the detection of the following diseases from retinal images:
● Papilledema/pseudopapilledema
● Alzheimer’s disease (AD)
In the Systemic disease detection in ophthalmology section, we discuss how the
same retinal fundus images can be used to detect and monitor even renal diseases
like chronic kidney disease (CKD) and cardiovascular diseases (CVD) by just
visualizing the microvascular structures in the retinal fundus.
Also, several epidemiologic studies suggest that DR and diabetic nephropathy
(DN) usually progress in parallel and share a close relationship. Monitoring DR
gives a good indication of the status of DN too. So ophthalmic imaging is also
found to play a major role in screening for and early detection of systemic illnesses.
The Challenges in using intelligent systems for image analysis and classifi-
cation are also discussed briefly.
In the last section of this chapter, Future trends, we present two new areas
which have exciting applications and show good results in recent studies, especially
in screening program:
● Smartphone capture of retinal fundus images (with a lens assembly)
● Multi-disease detection using a single retinal fundus image
15.2 Ophthalmology
Ophthalmology is a specialty in medicine that deals with the diseases and disorders
of the eye. Ophthalmologists are doctors who have specialized in ophthalmology.
Ophthalmology is one of the main specialties to apply AI in healthcare. With the
first US FDA-approved AI medical device, ophthalmology can be considered a
pioneer in AI disease detection research [1].
Here, we will focus on the applications of AI in image analysis and classification
for disease detection. The following two images are mainly used in ophthalmology:
● Retinal fundus image (RFI)
● Optical coherence tomography (OCT)
Though OCT is proving to be very useful in studying various layers of the
retina, we consider only retinal fundus imaging in this chapter. Retinal fundus
imaging is widely used and cost-effective, thus making it more suitable for use in
remote and rural health centers.
240 Deep learning in medical image processing and analysis
The retina in our eye is very important for vision. The lens focuses light from
images on the retina. This is converted by the retina into neural signals and sent to
the brain through the optic nerve. Basically, the retina consists of light-sensitive or
photoreceptor cells, which detect characteristics of the light such as color and
intensity. This information is used by the brain to visualize the whole image.
The photoreceptors in the retina are of two types: rods and cones. The rods are
responsible for scotopic vision (low light conditions). They have low spatial acuity.
Cones are responsible for photopic vision (higher levels of light). They provide
color vision and have high spatial acuity.
The rods are mainly concentrated in the outer regions of the retina. They are
useful for peripheral vision. Cones are mainly concentrated on the central region of
the retina and are responsible for our color vision in bright light.
There are three types of cones based on the wavelengths to which they are
sensitive. They are long, middle, and short wavelength-sensitive cones. The brain
perceives the images based on all the information collected and transmitted by
these rods and cones.
The inner surface of the eyeball which includes the retina, the optic disk, and
the macula is called as retinal fundus. A normal retinal fundus is shown in
Figure 15.2. This portion of the inner eye is what is visible to the healthcare pro-
fessional by looking through the pupil.
The retinal fundus or the ocular fundus can be seen using an ophthalmoscope
or photographed using a fundus camera. A fundus camera is a specialized camera
with a low-power microscope.
The retina, the retinal blood vessels, and the optic nerve head or the optic disk
can be visualized by fundus examination (Figure 15.3). The retinal fundus camera
is a medical imaging device. It usually has a different set of specialized lenses and a
multi-focal microscope attached to a digital camera. The digitized images can also
be displayed on a monitor in addition to recording (Figure 15.4).
AI is proving to be a big boon to ophthalmologists and patients in screening,
diagnosing, assessing, and staging various diseases of the eye. This has reduced
waiting times for patients and unnecessary referrals to ophthalmologists. Intelligent
systems in rural health centers, general practitioners’ offices, and emergency
departments can help with quicker diagnosis and expedite the treatment of vision
and even life-threatening diseases.
OPTIC
CHOROID
NERVE
FOVEA
MACULA
SCLERA RETINA
The retinal fundus image reveals several diseases of the eye. An ophthalmol-
ogist viewing the retinal fundus or the captured image of the retinal fundus can
diagnose several diseases or disorders of the eye. With a suitable number of
training images labeled by an ophthalmologist, intelligent systems can be trained to
analyze the retinal fundus image and capture the characteristics to help in the
decision process to diagnose the disease.
Neovascularization
Macula
Optic nerve Microaneurysms, edema
Retinal blood
vessels Cotton wool & exudates
(a) spots (b)
Figure 15.5 (a) Normal RFI and (b) RFI in diabetic retinopathy with
neovascularization and microaneurysms. Source: [3].
Deep learning applications in ophthalmology 243
15.2.3 Glaucoma
Glaucoma is usually caused by an abnormal fluid buildup and hence increased
intraocular pressure in the eye. This causes damage to the optic nerve which may
lead to visual losses. The excess fluid may be caused by any abnormality in the
drainage system of the eye. It can cause hazy or blurred vision, eye pain, eye
redness, and colored bright circles around light. A healthy optic disk and a glau-
comatous disk are shown in Figure 15.7.
Figure 15.6 Normal retina in comparison with early and late AMD. Early AMD
with extra-cellular drusen deposits around the macula. Late AMD
with hyperpigmentation around the drusen. Source: [5].
Figure 15.7 Healthy optic disk and glaucomatous optic disk with cupping
(increase in optic cup size and cup–disk ratio). Source: [6].
244 Deep learning in medical image processing and analysis
15.2.4 Cataract
Globally, cataract is a leading cause of blindness. It can be treated and blindness
prevented by timely diagnosis and surgical intervention. A cataract is defined as
opacity in any part of the lens in the eye. This opacity is usually caused by protein
breakdown in the lens. When the lens has increased opacity, focusing images on the
retina is not done efficiently, and this may lead to blurry vision and loss of sight.
The progression of this disease is slow. Early diagnosis and timely surgical inter-
vention can save vision.
RFI in various stages of cataracts is shown in Figure 15.8. Cataract-related AI
systems are still under development [8]. Studies are going on for disease detection
and also for calculating pre-cataract surgery intraocular lens power. In addition to
retinal fundus images, slit lamp images are also used with AI for cataract detection.
Table 15.1 lists the existing literature on the use of AI in ophthalmology. The
dataset(s) used and the successful models along with significant results are also
tabulated.
Figure 15.8 Comparison of normal RFI with various stages of cataract images
showing blurriness due to lens opacity. Source: [7].
Deep learning applications in ophthalmology 245
15.3 Neuro-ophthalmology
15.3.1 Papilledema
Papilledema is caused by an increase in the intracranial pressure of the brain. This
causes the swelling of the optic nerve which is visible as a swelling of the optic disk in
246 Deep learning in medical image processing and analysis
Figure 15.9 RFI in papilledema showing various grades of optic disk swelling
(A - mild, B - moderate, C&D - severe). Source: [17].
retinal fundus images (Figure 15.9). This is a dangerous condition and if left undiag-
nosed and untreated, can lead to blindness or in some cases may even lead to death.
Symptoms may include blurry vision, loss of vision, headaches, nausea, and vomiting.
The increase in intracranial pressure may be caused by space-occupying
lesions or infections or hydrocephalus and sometimes idiopathic intracranial
hypertension[18]. The treatment for papilledema is to treat the underlying cause
which will bring down the intracranial pressure to normal levels.
Swelling of the optic disk due to non-brain-related conditions is termed pseu-
dopapilledema though it is not as dangerous as papilledema, it still needs further
evaluation. A timely and accurate diagnosis helps identify papilledema earlier and
avoids unnecessary referrals and further invasive tests.
useful. RFI in Alzheimer’s disease is shown in Figure 15.10. Research is in its early
stages but the results obtained are promising.
Table 15.2 lists the existing literature for AI disease detection in neurology
using RFI, along with significant results.
Figure 15.10 RFI in Alzheimer’s disease shows reduced retinal vascular fractal
dimension and increased retinal vascular tortuosity. Source: [19].
Table 15.3 AI for disease detection in nephrology and cardiology (using RFI)
Several challenges are present in building and using deep learning decision-making
systems in healthcare applications. A few are discussed below.
● Availability of data—Deep learning systems need large volumes of data for
effective training and testing. Not much data is available, especially in the
public domain. Privacy and legal issues need to be addressed and data made
available for researchers. This would be hugely beneficial for future research.
Deep learning applications in ophthalmology 251
Bias-free data, covering all edge cases will lead to highly accurate disease
detection systems.
● Training and education of healthcare professionals—Continuous training
and education of healthcare professionals in using intelligent systems will help
them integrate them quickly and efficiently in their healthcare practice.
● Collaborative research—Collaborative research ventures between system
designers and medical experts will help in the creation of newer models
catering to the needs of doctors and patients.
Figure 15.12 Smartphone-based RFI capture using lens assembly. Source: [39].
252 Deep learning in medical image processing and analysis
diagnosis made using the camera was in substantial agreement with the clinical
diagnosis for DR and in moderate agreement for DME [41].
Gupta et al. (2022) have proposed a DIY low-cost smartphone-enabled camera
which can be assembled locally to provide images which can then be analyzed
using CNN-based deep learning models. They achieved high accuracy with a
hybrid ML classifier [42].
Nakahara et al. (2022) studied a deep learning algorithm for glaucoma
screening and concluded it had a high diagnostic ability, especially if the disease
was advanced [43].
Mrad et al. (2022) have used data from retinal fundus images acquired from
smartphone cameras and achieved high accuracy in detecting glaucoma. This can
be a cost-effective and efficient solution for screening programs and telemedicine
programs using retinal fundus images [13].
The concept of using smartphone-based cameras for image capture is a sig-
nificant development in screening program and also a huge boon for remote and
rural areas. A centrally located intelligent system can then be used with these
images for assessing, triaging, and assisting medical experts.
15.8 Conclusion
AI systems are proving to be a big boon for doctors and patients alike. It is a very
useful tool for medical experts as it can save them a lot of time by triaging the
patient’s needs and also alerting the doctors if immediate medical care is indicated
by the AI findings.
The RFI is a non-invasive, cost-effective imaging tool which finds applications
in disease detection and monitoring systems in several specialties. Research to find
new applications of AI for retinal diseases and to improve the performance of
current intelligent systems is ongoing and has seen a huge surge since 2017
(Figure 15.13) [48]. Collaboration between the medical experts to provide domain
knowledge and the AI experts will lead to the development of better systems.
800
Number of Articles Published
700
600
500
400
300
200
100
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Publication year
Figure 15.13 Science citation index (SCI) papers published between 2012 and
2021 on AI to study various retinal diseases. Source: [48].
254 Deep learning in medical image processing and analysis
Further work suggested includes trying models other than CNN-based var-
iants to see if performance can be enhanced with the usage of fewer resources.
Also, curation of more public datasets, especially for rarer diseases and condi-
tions is essential for further research. The smartphone-based RFI capture needs to
be studied further as it can revolutionize screening programs at higher perfor-
mance and lower costs. High-performance metrics and reliability will also
improve the confidence of the doctors and patients in AI-based healthcare
systems.
References
[1] https://www.fda.gov/news-events/press-announcements/fda-permits-market-
ing-artificial-intelligence-based-device-detect-certain-diabetes-related-eye
retrieved on 10.01.2023.
[2] https://ophthalmology.med.ubc.ca/patient-care/ophthalmic-photography/
color-fundus-photography/ retrieved on 10.01.2023.
Deep learning applications in ophthalmology 255
[17] Mollan, S., Markey, K., Benzimra, J., et al. (2014). A practical approach to,
diagnosis, assessment and management of idiopathic intracranial hyperten-
sion. Practical Neurology, 14, 380–390. 10.1136/practneurol-2014-000821.
[18] Guarnizo, A., Albreiki, D., Cruz, J. P., Létourneau-Guillon, L., Iancu, D.,
and Torres, C. (2022). Papilledema: a review of the pathophysiology, ima-
ging findings, and mimics. Canadian Association of Radiologists Journal,
73(3), 557–567. doi:10.1177/08465371211061660.
[19] Liao, H., Zhu, Z., and Peng, Y. (2018). Potential utility of retinal imaging for
Alzheimer’s disease: a review. Frontiers in Aging Neuroscience, 10, 188.
10.3389/fnagi.2018.00188.
[20] Saba, T., Akbar, S., Kolivand, H., and Ali Bahaj, S. (2021). Automatic
detection of papilledema through fundus retinal images using deep learning.
Microscopy Research and Technique, 84(12), 3066–3077.
[21] Avramidis, K., Rostami, M., Chang, M., and Narayanan, S. (2022, October).
Automating detection of Papilledema in pediatric fundus images with
explainable machine learning. In 2022 IEEE International Conference on
Image Processing (ICIP) (pp. 3973–3977). IEEE.
[22] Milea, D., Najjar, R. P., Jiang, Z., et al. (2020). Artificial intelligence to
detect papilledema from ocular fundus photographs. New England Journal
of Medicine, 382, 1687–1695. doi:10.1056/NEJMoa1917130.
[23] Vasseneix, C., Najjar, R. P., Xu, X., et al. (2021). Accuracy of a deep
learning system for classification of papilledema severity on ocular fundus
photographs. Neurology, 97(4), e369–e377.
[24] Cheung, C. Y., Ran, A. R., Wang, S., et al. (2022). A deep learning model
for detection of Alzheimer’s disease based on retinal photographs: a retro-
spective, multicentre case-control study. The Lancet Digital Health, 4(11),
e806–e815.
[25] Leong, Y. Y., Vasseneix, C., Finkelstein, M. T., Milea, D., and Najjar, R. P.
(2022). Artificial intelligence meets neuro-ophthalmology. Asia-Pacific
Journal of Ophthalmology (Phila), 11(2), 111–125. doi:10.1097/
APO.0000000000000512. PMID: 35533331.
[26] Mortensen, P. W., Wong, T. Y., Milea, D., and Lee, A. G. (2022). The eye
is a window to systemic and neuro-ophthalmic diseases. Asia-Pacific
Journal of Ophthalmology (Phila), 11(2), 91–93. doi:10.1097/APO.00000
00000000531. PMID: 35533329.
[27] Ahsan, M., Alam, M., Khanam, A., et al. (2019). Ocular fundus abnormal-
ities in pre-dialytic chronic kidney disease patients. Journal of Biosciences
and Medicines, 7, 20–35. doi:10.4236/jbm.2019.711003.
[28] Mitani, A., Hammel, N., and Liu, Y. (2021). Retinal detection of kidney
disease and diabetes. Nature Biomedical Engineering, 5, 487–489. https://
doi.org/10.1038/ s41551-021-00747-4.
[29] Farrah, T. E., Dhillon, B., Keane, P. A., Webb, D. J., and Dhaun, N. (2020).
The eye, the kidney, and cardiovascular disease: old concepts, better tools,
and new horizons. Kidney International, 98(2), 323–342.
Deep learning applications in ophthalmology 257
[30] Gupta, K. and Reddy, S. (2021). Heart, eye, and artificial intelligence: a
review. Cardiology Research, 12(3), 132–139. doi:10.14740/cr1179.
[31] https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-
(cvds) retrieved on 10.01.2023
[32] Zhang, K., Liu, X., Xu, J., et al. (2021). Deep-learning models for the
detection and incidence prediction of chronic kidney disease and type 2
diabetes from retinal fundus images. Nature Biomedical Engineering, 5(6),
533–545.
[33] Kang, E. Y. C., Hsieh, Y. T., Li, C. H., et al. (2020). Deep learning–based
detection of early renal function impairment using retinal fundus images:
model development and validation. JMIR Medical Informatics, 8(11),
e23472.
[34] Sabanayagam, C., Xu, D., Ting, D. S., et al. (2020). A deep learning algo-
rithm to detect chronic kidney disease from retinal photographs in
community-based populations. The Lancet Digital Health, 2(6), e295–e302.
[35] Ma, Y., Xiong, J., Zhu, Y., et al. (2021). Development and validation of a
deep learning algorithm using fundus photographs to predict 10-year risk of
ischemic cardiovascular diseases among Chinese population. medRxiv.
[36] Rim, T. H., Lee, C. J., Tham, Y. C., et al. (2021). Deep-learning-based
cardiovascular risk stratification using coronary artery calcium scores
predicted from retinal photographs. The Lancet Digital Health, 3(5),
e306–e316.
[37] Chang, J., Ko, A., Park, S. M., et al. (2020). Association of cardiovascular
mortality and deep learning-funduscopic atherosclerosis score derived from
retinal fundus images. American Journal of Ophthalmology, 217, 121–130.
[38] Huang, F., Lian, J., Ng, K. S., Shih, K., and Vardhanabhuti, V. (2022).
Predicting CT-based coronary artery disease using vascular biomarkers
derived from fundus photographs with a graph convolutional neural network.
Diagnostics, 12(6), 1390.
[39] Karakaya, M. and Hacisoftaoglu, R. (2020). Comparison of smartphone-
based retinal imaging systems for diabetic retinopathy detection using deep
learning. BMC Bioinformatics, 21, 259. 10.1186/s12859-020-03587-2.
[40] Chalam, K. V., Chamchikh, J., and Gasparian, S. (2022). Optics and utility
of low-cost smartphone-based portable digital fundus camera system for
screening of retinal diseases. Diagnostics, 12(6), 1499.
[41] Shah, D., Dewan, L., Singh, A., et al. (2021). Utility of a smartphone
assisted direct ophthalmoscope camera for a general practitioner in screening
of diabetic retinopathy at a primary health care center. Indian Journal of
Ophthalmology, 69(11), 3144.
[42] Gupta, S., Thakur, S., and Gupta, A. (2022). Optimized hybrid machine
learning approach for smartphone based diabetic retinopathy detection.
Multimedia Tools and Applications, 81(10), 14475–14501.
[43] Nakahara, K., Asaoka, R., Tanito, M., et al. (2022). Deep learning-assisted
(automatic) diagnosis of glaucoma using a smartphone. British Journal of
Ophthalmology, 106(4), 587–592.
258 Deep learning in medical image processing and analysis
[44] Li, B., Chen, H., Zhang, B., et al. (2022). Development and evaluation of a
deep learning model for the detection of multiple fundus diseases based on
colour fundus photography. British Journal of Ophthalmology, 106(8),
1079–1086.
[45] Müller, D., Soto-Rey, I., and Kramer, F. (2021). Multi-disease detection in
retinal imaging based on ensembling heterogeneous deep learning models.
In German Medical Data Sciences 2021: Digital Medicine: Recognize–
Understand–Heal (pp. 23–31). IOS Press.
[46] Cen, L. P., Ji, J., Lin, J. W., et al. (2021). Automatic detection of 39 fundus
diseases and conditions in retinal photographs using deep neural networks.
Nature Communications, 12(1), 1–13.
[47] Guo, C., Yu, M., and Li, J. (2021). Prediction of different eye diseases based
on fundus photography via deep transfer learning. Journal of Clinical
Medicine, 10(23), 5481.
[48] Zhao, J., Lu, Y., Qian, Y., Luo, Y., and Yang, W. (2022). Emerging trends
and research Foci in artificial intelligence for retinal diseases: bibliometric
and visualization study. Journal of Medical Internet Research, 24(6),
e37532. doi:10.2196/37532. PMID: 35700021; PMCID: PMC9240965.
Chapter 16
Brain tumor analyses adopting a deep learning
classifier based on glioma, meningioma, and
pituitary parameters
Dhinakaran Sakthipriya1, Thangavel Chandrakumar1,
S. Hirthick1, M. Shyam Sundar1 and M. Saravana Kumar1
Brain tumors are one of the major causes of death. Due to the aforementioned, a
brain tumor may be seen using a variety of procedures. Early discovery of a brain
tumor is crucial for enabling therapy. Magnetic resonance imaging is one such
method. In contrast, current methods such as deep learning, neural networks, and
machine learning have been used to handle a number of classification-related
challenges in medical imaging in recent years. convolutional neural network
(CNN) reports that magnetic resonance imaging was utilized in this study to clas-
sify three separate types of brain cancer: glioma, meningioma, and pituitary gland.
This study’s data set includes 3,064 contrast-enhanced T1 scans from 233 indivi-
duals. This research compares the proposed model to other models to demonstrate
that our technique is superior. Pre-data and post-data preparation and enhancement
outcomes were investigated.
16.1 Introduction
Our brain is composed of billions of cells; the brain is one of the body’s most
complex organs. When these cells in or near the brain multiply uncontrollably,
brain tumors occur. This population of cells that divide uncontrolled can impair the
brain’s and more functional cell functions. These tumors of the brain can be clas-
sified as benign (low grade) or malignant (high grade) depending on their location,
form, and texture [1–4]. For clinicians to construct cancer treatments, early cancer
detection and automated tumor classification are required [5].
Imaging modalities like CT and magnetic resonance imaging (MRI) can help
find brain cancers. MRI is one of the most popular therapies, because it can produce
high-quality images in two dimensions (D) and three dimensions (3D) without
causing the patient any pain or exposing them to radiation [6]. Moreover, MRI is
1
Thiagarajar College of Engineering, India
260 Deep learning in medical image processing and analysis
regarded as the most effective and extensively used method for the identification
and categorization of brain tumors [7] due to its ability to produce high-quality
images of brain tissue. However, it requires a great deal of time and effort for
specialists to manually examine several MR pictures simultaneously in order to
discover problems. Recent years have seen a rise in the importance of Artificial
Intelligence (AI) technology as a means of preventing this catastrophe. Computer-
aided diagnostic (CAD) technologies are increasingly used in concert with advan-
ces in AI technology. Several diseases, including brain tumors and cancer, can be
identified with speed and precision using CAD technology. The first phase of a
typical CAD system is to detect and segment lesions from images, the second is to
analyze these segmented tumors with numerical parameters to extract their fea-
tures, and the third is to use the proper machine learning (ML) approach to predict
abnormality categorization [8].
Applications for smart systems based on ML have recently been employed
in many additional industries. For these systems to work effectively, useful char-
acteristics must be found or extracted. Deep learning is a very effective subcategory
of retraining machine algorithms. Its architecture comprises a number of nonlinear
layers, each of which collects characteristics with greater skill by using the result of
the prior layer as input [9]. The most modern machine learning technology, con-
volutional neural network (CNN) algorithms, is used to diagnose diseases from
MRI scans. They have also been employed in many other areas of medicine,
including image processing [10–12]. CNN is commonly used to categorize and
grade medical pictures because preprocessing and feature extraction are not
necessary before the training phase. By first classifying MR pictures as normal or
abnormal and then recognizing aberrant brain MR images in accordance with
various types of brain malignancies [13,14], ML- and DL-based techniques for
brain tumor identification can be broken down into two main categories.
In this regard, some contemporary literary works are listed. Three distinct
CNN deep learning architectures for classifying several tumor kinds (pituitary
gland tumors, glioma tumors, and meningioma tumors) using brain MRI data sets
(GoogleNet, AlexNet, and VGGNet). Using the VGG16 architecture, they were
able to attain 98.69% accuracy [15]. To present a capsule network for categorizing
brain tumors (CapsNet). To improve accuracy performance, they additionally
compiled CapsNet feature maps from several convolution layers. We were able to
accurately classify 86.50% of data, according to the final tally [16]. A variation of a
CNN called AlexNet was used to develop a method for diagnosing glioma brain
tumors. Using whole-brain MR imaging, they achieved a respectable 91.16%
accuracy [17]. They proposed a technique based on deep CNN (DCNN) for finding
and categorizing brain tumors. Fuzzy C-Means (FCM) is the suggested method for
brain segmentation. The application’s accuracy rate was 97.5% according to the
final data [18]. An approach that uses both DWT and DL techniques was proposed
the addition, the fuzzy k-mean approach and principal component analysis (PCA)
were used to segment the brain tumor in an effort to streamline the analysis. In the
end, they were successful with a 96.97% rate of accuracy [19]. An approach for
classifying brain tumors was developed by using the CNN architecture and the
Brain tumor analyses adopting a deep learning classifier 261
gray-level conformation matrix (GLCM). They looked at each picture from four
different angles and picked out four features: energy, correlation, contrast, and
homogeneity (0, 45, 90, and 135 degrees). A total of 82.27% of the study’s
hypotheses were correct [20].
The objective of this project is to create a computer-aided method for detecting
tumors by analyzing materials. In this framework, brain tumor images are col-
lected, pre-processed to reduce noise, subjected to a feature routine, and then
categorized according to tumor grade. A CNN architecture will be in charge of
taking in data for training purposes. True positive rate is one of many performance
metrics and receiver operating characteristics/areas under the curve that are used to
assess diagnostic systems. To test our proposed architecture, we will use a database
of retinal fundus images collected from patients at a medical facility.
Siar and Teshnehlab (2019) [21] analyzed a CNN that has been trained to recognize
tumors using images from brain MRI scans. The first to utilize visuals was CNN. In
terms of categorization, the softmax fully connected layer achieved a remarkable
98% accuracy. It’s worth noting that while the radial basis function classifier has a
97.34% success rate, the decision tree (DT) classifier only manages a 94.24%
success rate. We use accuracy standards, as well as sensitivity, specificity, and
precision benchmarks to measure the efficacy of our networks. Komarasamy and
Archana (2023) [22], a variety of specialists, have created a number of efficient
methods for classifying and identifying brain tumors. The detection time, accuracy,
and tumor size challenges that currently exist for existing methods are numerous.
Brain tumor early diagnosis increases treatment choices and patient survival rates.
It is difficult and time-consuming to manually segregate brain tumors from a large
volume of MRI data for brain tumor diagnosis.
Correctly, diagnosing a brain tumor is vital for improving treatment outcomes and
patient survival rates (Kumar, 2023) [23]. However, manually analyzing the numerous
MRI images generated in a medical facility may be challenging (Alyami et al., 2023)
[24]. To classify brain tumors from brain MRI images, the authors of this research use
a deep convolutional network and the salp swarm method to create a powerful deep
learning-based system. The Kaggle dataset on brain tumors is used for all tests.
Preprocessing and data augmentation procedures are developed, such as ideas for
skewed data, to improve the classification success rate (Asad et al., 2023) [25]. Using a
series of cascading U-Nets, it was intended to identify tumors. DCNN was also created
for patch-based segmentation of tumor cells. Prior to segmentation, this model was
utilized to pinpoint the location of brain tumors. The “BraTS-2017” challenge data-
base, consisting of 285 trained participants, 146 testing subjects, and 46 validation
subjects, was used as the dataset for the proposed model.
Ramtekkar et al. (2023) [26] proposed a fresh, upgraded, and accurate method
for detecting brain tumors. The system uses a number of methods, such as pre-
processing, segmentation, feature extraction, optimization, and detection. A filter
262 Deep learning in medical image processing and analysis
made up of Gaussian, mean, and median filters is used in the preprocessing system.
The threshold and histogram techniques are used for image segmentation.
Extraction of features is performed using a co-occurrence matrix of gray-level
occurrences (Saladi et al., 2023) [27]. Brain tumor detection remains a difficult task
in medical image processing. The purpose of this research is to describe a more
precise and accurate method for detecting brain cancers in neonatal brains. In
certain ways, the brain of an infant differs from that of an adult, and adequate
preprocessing techniques are advantageous for avoiding errors in results.
The extraction of pertinent characteristics is an essential first step in order to
accomplish appropriate categorization (Doshi et al., 2023) [28]. In order to refine
the segmentation process, this research makes use of the probabilistic FCM
approach. This research provides a framework for lowering the dimensionality of
the MRI brain picture and allows for the differentiation of the regions of interest for
the brain’s MRI scan to be disclosed (Panigrahi & Subasi, 2023) [29]. Early iden-
tification of brain tumors is the essential need for the treatment of the patient. Brain
tumor manual detection is a highly dangerous and intrusive procedure. As a result,
improvements in medical imaging methods, such as magnetic resonance imaging,
have emerged as a key tool in the early diagnosis of brain cancers.
Chen (2022) [30] analyses brain disorders, such as brain tumors, which are serious
health issues for humans. As a result, finding brain tumors is now a difficult and
demanding process. In this research, a pre-trained ResNeXt50(324d) and an
interpretable approach are suggested to use past knowledge of MRI pictures for brain
tumor identification. Youssef et al. (2022) [31] developed an ensemble classifier
model for the early identification of many types of patient infections associated with
brain tumors that combine data augmentation with the VGG16 deep-learning feature
extraction model. On a dataset with four different classifications (glioma tumor,
meningioma tumor, no tumor, and pituitary tumor), we do the BT classification using
the suggested model. This will determine the kind of tumor if it is present in the MRI.
The proposed approach yields a 96.8% accuracy for our model (ZainEldin et al., 2022)
[32]. It takes a while to identify a brain tumor, and the radiologist’s skills and expertise
are crucial. As the number of patients has expanded, the amount of data that must be
processed has greatly increased, making outdated techniques both expensive and use-
less [40] (Kandimalla et al., 2023) [33]. The major goal is to provide a feasible method
for using MRIs to identify brain tumors so that choices about the patients’ situations
may be made quickly, effectively, and precisely. On the Kaggle dataset, collected from
BRATS 2015 for brain tumor diagnosis using MRI scans, including 3,700 MRI brain
pictures, with 3,300 revealing tumors, our proposed technique is tested.
16.3 Methodology
The DCNN approaches recommended for finding and categorizing various forms of
tumors that create difficulties for the brain are described in the required methodolo-
gical parts. Deep neural networks have been proposed as a workable solution for image
categorization. For this study, a CNN that specializes in classification was trained.
Brain tumor analyses adopting a deep learning classifier 263
Sequential_11 Dense_5
Sequential_10 (None, 4)
Sequential_9 (None, 128)
Sequential_8 (None, 25, 25, 32)
(None, 20000) Glioma tumor
Sequential_7 (None, 50, 50, 64)
(None, 100, 100, 128)
Meningioma
tumor
No tumor
Input
Images MAX POOLING
Pituitary
MAX POOLING
MAX POOLING tumor
MAX POOLING
MAX POOLING
A dataset for brain tumors would likely include medical imaging such as MRI
along with patient information such as age, sex, and medical history. The data may
also include labels or annotations indicating the location and type of tumor present
in the images. The dataset could be used for tasks such as training machine learning
models to detect and classify brain tumors, or for research on the characteristics of
different types of brain tumors. Preprocessing of brain tumor images typically
includes steps such as image registration, intensity normalization, and noise
reduction. Image registration aligns multiple images of the same patient acquired at
different times or with different modalities to a common coordinate system. The
CNN would be trained using this data to learn to recognize the features of a brain
tumor. A testing set would also consist of medical imaging data, but this data would
not be used during the training process. A CNN with seven layers could be used for
brain tumor detection. A large dataset with labeled brain tumors would be needed.
Once trained, the network could be used to identify brain tumors in new images.
Performance analysis in brain tumors typically involves evaluating various treat-
ment options and determining which ones are most effective at treating the specific
type of brain tumor. Factors that are commonly considered in performance analysis
include overall survival rates, progression-free survival rates, and the side effects
associated with each treatment. Additionally, imaging techniques such as MRI are
often used to evaluate the size and progression of the tumor over time.
By developing and applying a DCNN to identify and classify various forms of
brain tumors, the suggested study advances previous research. It is made up of
different CNN layer components. It is carried out in line with the seven layering
processes. Particularly, the naming and classification of brain tumors. The recom-
mended method represents a positive advancement in the field of medical analysis.
Additionally, radiologists are predicted to gain from this applied research activity.
Obtaining a second opinion will help radiologists determine the kind, severity, and
264 Deep learning in medical image processing and analysis
size of tumors much more quickly and easily. When brain tumors are found early,
professionals can create more efficient treatment plans that will benefit the
patient’s health. At the end of the layer split-up analysis, a categorization label for
the picture is created to aid with prediction.
Input: The first step in the algorithm is to collect a large dataset of brain MRI
images. This dataset should include both normal and abnormal images, such as
those with brain tumors.
Outputs: Classification of each image and identification of Brain
Tumor for each image sequence.
1. Brain Tumor detection estimation – CNN Layer
2. Pre-Process = The next step is to preprocess the images by removing noise
and enhancing the quality of the images. This can be done using techni-
ques such as image denoising and image enhancement.
3. Partition input into sets for training and testing.
4. Brain Tumor diseases (Layer Spilt Analysis with Accuracy)
5. if finding ordinary
6. stop
7. else Brain Tumor
8. end if
DATA SPLIT
PERFORMANCE
7 LAYERS CNN MODEL REFLECTION OF THE
ANALYSIS
7 LAYERS CNN
distinct CNN layer approaches. Figure 16.2 illustrates how the proposed DCNN
operates. This part contains the implementation of CNN’s layers, as
mentioned below.
Four CNNs are utilized in this section of the ML-CNN architecture to classify the
level of brain tumor illness. It goes by the moniker classification-net CNN
(CN-CNN). To identify photos impacted by brain tumors (pituitary tumor, menin-
gioma tumor, and glioma tumor) and four categories of images, this network employs
classed images from the DN-CNN network. The progression of a brain tumor is
broken down into four stages: advanced, early, moderate, and normal. Early refers to
the glaucoma illness’s onset, moderate refers to the disease’s medium value, advanced
refers to the peak value, and normally refers to the no tumor disease value. We
constructed one CNN method for each stage of brain tumor identification in the
classification-net phase, using a total of four CNN architectures in this section.
CN-internal CNN’s structure. We employed 7 layers, 40 epochs in size, and a
learning rate of 0.001. In this table, the input picture has a size of 128128 and a
filter size of 33, there are 6 filters, and the first convolutional layer’s stride is 1. The
second convolutional layer has a smaller size (64 64), but the stride and number of
266 Deep learning in medical image processing and analysis
filters remain the same (16 filters). The size is 3,232 with 25 filters in the third
convolutional layer, and the filter size and stride are also constant.
16.4.1 Preprocessing
As such, preprocessing serves primarily to enhance the input image and build it in a
highly efficient human or machine vision system. Preprocessing also aids in
increasing the SNR, removing noisy artifacts, smoothing the image from the inside
out, and preserving the image’s edges, which is very important when dealing with
human subjects. The raw male image can be seen more clearly by increasing the
SNR settings. To prepare an image for analysis by a human or machine vision
system, pre-processing is essential. Pre-processing also aids in boosting SNR,
removing noisy artifacts, smoothing the image from the inside out, and conserving
the image’s edges, which is very important when dealing with human subjects. In
order to improve the SNR values and, by extension, the clarity of raw human
photographs, it is usual practice to employ adjective differentiation improvement-
assisted modified sigmoid processes (Tables 16.1 and 16.2).
No. Types Sensitivity (%) Specificity (%) Accuracy (%) Precision (%)
1 Pituitary 93.03 88.48 93.25 94.08
2 Glioma 88.56 83.61 88.95 89.30
3 Meningioma 82.00 85.62 78.90 83.45
4 No tumor 79.67 82.59 76.54 84.28
Average 85.81 85.07 84.39 67.78
Brain tumor analyses adopting a deep learning classifier 267
800
700
False 14 85 600
500
True label
400
300
True 91 810 200
100
False True
Predicted label
16.5 Conclusion
Deep learning is a branch of machine learning that involves training artificial
neural networks to perform tasks such as image or speech recognition. In the
medical field, deep learning algorithms have been used to assist in the detection and
diagnosis of brain tumors. These algorithms can analyze medical images, such as
MRI, and identify regions of the brain that may contain a tumor. However, it’s
268 Deep learning in medical image processing and analysis
important to note that deep learning should be used in conjunction with a radi-
ologist’s expertise and other medical diagnostic tools to make a definitive
diagnosis. A brain tumor is an abnormal growth of cells within the brain or the
skull. Symptoms of a brain tumor can include headaches, seizures, vision or
speech problems, and changes in personality or cognitive function. Treatment
options for a brain tumor can include surgery, radiation therapy, and che-
motherapy, and the choice of treatment depends on the type and location of the
tumor, as well as the patient’s overall health. A conclusion of a brain tumor
using a CNN would involve analyzing medical imaging data, such as MRI or CT
scans, using the CNN to identify any potential tumors. The CNN would be
trained on a dataset of labeled images to learn the features that indicate a tumor.
Once the CNN has been trained, it can then be used to analyze new images and
make predictions about the presence of a tumor. The accuracy of the predictions
will depend on the quality of the training dataset and the specific architecture of
the CNN.
Brain tumor analyses adopting a deep learning classifier 269
References
[1] Mohsen, H., El-Dahshan, E. S. A., El-Horbaty, E. S. M., and Salem, A. B. M.
(2018). Classification using deep learning neural networks for brain tumors.
Future Computing and Informatics Journal, 3(1), 68–71.
[2] Khambhata, K. G. and Panchal, S. R. (2016). Multiclass classification of
brain tumor in MR images. International Journal of Innovative Research in
Computer and Communication Engineering, 4(5), 8982–8992.
[3] Das, V. and Rajan, J. (2016). Techniques for MRI brain tumor detection: a
survey. International Journal of Research in Computer Applications &
Information Technology, 4(3), 53e6.
[4] Litjens, G., Kooi, T., Bejnordi, B. E., et al. (2017). A survey on deep
learning in medical image analysis. Medical Image Analysis, 42, 60–88.
[5] Pereira, S., Meier, R., Alves, V., Reyes, M., and Silva, C. A. (2018).
Automatic brain tumor grading from MRI data using convolutional neural
networks and quality assessment. In Understanding and Interpreting
Machine Learning in Medical Image Computing Applications: First
International Workshops, MLCN 2018, DLF 2018, and iMIMIC 2018, Held
in Conjunction with MICCAI 2018, Granada, Spain, September 16–20,
2018, Proceedings 1 (pp. 106–114). Springer International Publishing.
[6] Le, Q. V. (2015). A tutorial on deep learning. Part 1: nonlinear classifiers
and the backpropagation algorithm. Google Brain, Google Inc. Retrieved
from https://cs.stanford.edu/~quocle/tutorial1.pdf
[7] Kumar, S., Dabas, C., and Godara, S. (2017). Classification of brain MRI tumor
images: a hybrid approach. Procedia Computer Science, 122, 510–517.
[8] Vidyarthi, A. and Mittal, N. (2015, December). Performance analysis of
Gabor-Wavelet based features in classification of high grade malignant brain
tumors. In 2015 39th National Systems Conference (NSC) (pp. 1–6). IEEE.
[9] Deng, L. and Yu, D. (2014). Deep learning: methods and applications.
Foundations and Trends in Signal Processing, 7(3–4), 197–387.
[10] Zikic, D., Glocker, B., Konukoglu, E., et al. (2012, October). Decision for-
ests for tissue-specific segmentation of high-grade gliomas in multi-channel
MR. In MICCAI (3) (pp. 369–376).
[11] Pereira, S., Pinto, A., Alves, V., and Silva, C. A. (2016). Brain tumor seg-
mentation using convolutional neural networks in MRI images. IEEE
Transactions on Medical Imaging, 35(5), 1240–1251.
[12] Alam, M. S., Rahman, M. M., Hossain, M. A., et al. (2019). Automatic
human brain tumor detection in MRI image using template-based K means
and improved fuzzy C means clustering algorithm. Big Data and Cognitive
Computing, 3(2), 27.
[13] Tharani, S. and Yamini, C. (2016). Classification using convolutional neural
network for heart and diabetics datasets. International Journal of Advanced
Research in Computer and Communication Engineering, 5(12), 417–422.
270 Deep learning in medical image processing and analysis
[14] Ravı̀, D., Wong, C., Deligianni, F., et al. (2016). Deep learning for health
informatics. IEEE Journal of Biomedical and Health Informatics, 21(1), 4–21.
[15] Rehman, A., Naz, S., Razzak, M. I., Akram, F., and Imran, M. (2020). A deep
learning-based framework for automatic brain tumors classification using
transfer learning. Circuits, Systems, and Signal Processing, 39, 757–775.
[16] Afshar, P., Mohammadi, A., and Plataniotis, K. N. (2018, October). Brain tumor
type classification via capsule networks. In 2018 25th IEEE International
Conference on Image Processing (ICIP) (pp. 3129–3133). IEEE.
[17] Khawaldeh, S., Pervaiz, U., Rafiq, A., and Alkhawaldeh, R. S. (2017).
Noninvasive grading of glioma tumor using magnetic resonance imaging
with convolutional neural networks. Applied Sciences, 8(1), 27.
[18] Abiwinanda, N., Hanif, M., Hesaputra, S. T., Handayani, A., and Mengko, T. R.
(2019). Brain tumor classification using convolutional neural network. In World
Congress on Medical Physics and Biomedical Engineering 2018: June 3–8,
2018, Prague, Czech Republic (Vol. 1) (pp. 183–189). Singapore: Springer.
[19] Anaraki, A. K., Ayati, M., and Kazemi, F. (2019). Magnetic resonance
imaging-based brain tumor grades classification and grading via convolu-
tional neural networks and genetic algorithms. Biocybernetics and
Biomedical Engineering, 39(1), 63–74.
[20] Widhiarso, W., Yohannes, Y., and Prakarsah, C. (2018). Brain tumor clas-
sification using gray level co-occurrence matrix and convolutional neural
network. IJEIS (Indonesian Journal of Electronics and Instrumentation
Systems), 8(2), 179–190.
[21] Siar, M. and Teshnehlab, M. (2019, October). Brain tumor detection using
deep neural network and machine learning algorithm. In 2019 9th
International Conference on Computer and Knowledge Engineering
(ICCKE) (pp. 363–368). IEEE.
[22] Komarasamy, G. and Archana, K. V. (2023). A novel deep learning-based
brain tumor detection using the Bagging ensemble with K-nearest neighbor.
Journal of Intelligent Systems, 32.
[23] Kumar, K. S., Bansal, A., and Singh, N. P. (2023, January). Brain tumor
classification using deep learning techniques. In Machine Learning, Image
Processing, Network Security and Data Sciences: 4th International
Conference, MIND 2022, Virtual Event, January 19–20, 2023, Proceedings,
Part II (pp. 68–81). Cham: Springer Nature Switzerland.
[24] Alyami, J., Rehman, A., Almutairi, F., et al. (2023). Tumor localization and
classification from MRI of brain using deep convolution neural network and
salp swarm algorithm. Cognitive Computation. https://doi.org/10.1007/
s12559-022-10096-2.
[25] Asad, R., Imran, A., Li, J., Almuhaimeed, A., and Alzahrani, A. (2023).
Computer-aided early melanoma brain-tumor detection using deep-learning
approach. Biomedicines, 11(1), 184.
[26] Ramtekkar, P. K., Pandey, A., and Pawar, M. K. (2023). Innovative brain
tumor detection using optimized deep learning techniques. International
Brain tumor analyses adopting a deep learning classifier 271
17.1 Introduction
High-quality medical resonance (MR) images are difficult to capture due to pro-
longed scan time, low spatial coverage, and signal-to-noise ratio. Super-resolution
(SR) helps to resolve this by converting low-resolution MRI images to high-quality
MRI images. SR is a process of merging low-resolution images to achieve high-
resolution images. SR is categorized into two types, namely, multi-image SR
(MISR) and single image SR (SISR). MISR reconstructs a high-resolution image
from multiple degraded images. However, MISR is rarely employed in practice,
due to the unavailability of multiple frames of a scene. On the contrary, a high-
resolution image is intended to be produced using SISR from a single low-
resolution image.
SISR is categorized into non-learning-based methods and learning-based
methods. Interpolation and wavelet methods fall under the category of non-learning
techniques. Interpolation methods re-sample an image to suit transmission channel
requirements and reconstruct the final image.
The commonly used techniques for interpolation are nearest neighbor, bi-
cubic and bi-linear up-scaling. Bi-linear and bi-cubic interpolations calculate
1
Electronics and Communications Engineering, Velagapudi Ramakrishna Siddhartha Engineering
College, India
274 Deep learning in medical image processing and analysis
17.2 Preliminaries
17.2.1 Encoder–decoder residual network
Encoder–decoder structure enhances the context information of input shallow features.
We used a coarse-to-fine method in the network to recover missed data and eliminate
noise. The coarse-to-fine method firstly rebuilds the coarse information by small fea-
tures and further recreates the finer step by step, in order to reduce the impact of noise,
batch normalization is employed to scale down/up convolution layers.
We introduced an encoder–decoder residual network (EDRN) for restoring
missed data and to decrease noise. The encoder–decoder was developed to capture
connections among large-range pixels. With additional data, the structure can
encode the data. The EDRN is divided into four sections: the network of feature
encoder (FE), the network of large-scale residual restoration (L-SRR), the network
of middle-scale residual restoration (M-SRR), and the network of small-scale
residual restoration (S-SRR). A network of full convolution (FCN) has been pro-
posed for image semantic segmentation and object recognition.
After removing the completely linked layers, FCN is made up of convolution
and de-convolution processes, which are commonly referred to as encoder and
decoder. Convolution is always followed by pooling in FCNs, whereas
de-convolution is always followed by un-pooling, although image restoration
operations in FCNs result in the loss of image data.
The M-SRR and L-SRR algorithms were employed to improve the quality of
blurred images. Because of its light-weight structure, the SRCNN model has
become a framework for image super resolution. However, acknowledges that
deeper and more complicated networks can lead to higher SR performance, raising
the complexity of network training drastically. Our network’s decoder structure is
made up of L-SRR, M-SRR, and S-SRR.
A convolution layer with stride 2, rectified linear units (ReLU), and batch
normalization (BN) are the three processes that make up the down-scaling process
Ke1. Ke1 reduces the spatial dimension of input. P1 characterizes first downscale
features, by using the second skip connection and second down-scaling process.
P2 ¼ Ke2 ðP1 Þ; (17.3)
where Ke1 and Ke2 are similar and reduce the spatial dimension of the input features
by half and extract Ke2 256 features. P2 stands for the innermost skip connection.
The L-SRR PL,mc output can be written as
PL;mc ¼ KL;mc ðKL;4 ð ðKL;1 ðP2 ÞÞ ÞÞ þ P2 ; (17.4)
where PL,mc stands for the last convolution layer of the LSRR and KL,1, KL,2, KL,3,
and KL,4 stand for the RIRBs. We merge the first down-scaled features with the
coarse large-scale residual features after and then we send the resulting data to the
M-SRR for refinement.
The large-scale features are present in the input to M-SRR. The M-SRR
objective is to recover missed data and reduce noise at a finer level. Additionally,
the finer features are added to the first down-scaled features using a
PM ¼ KM;mc ðKM;2 ðKM;1 ðKjn1 ðPL;mc ÞÞÞ þ P1 ; (17.5)
where KM,mc stands for the final convolution layer of the M-SRRN, KM,1 and KM,2
for the RIRBs, ReLU layer, and BN layer is the de-convolution layer with stride 1,
or Kjn1. Both Kjn1 and M-convolution SRRN’s layer extract 128 features. PM
stands for block, and RCAB incorporates an adaptive channel-wise attention
mechanism to detect the channel-wise relevance. As a result, rather than evaluating
all features fairly, RCAB re-scales the extracted residual features based on channel
significance. We inherit this by introducing RIRB. To maintain shallow informa-
tion flow, our RIRB layers have multiple RCABs, one convolution layer, and one
skip connection.
Zhang et al. suggested RIRBs, which are composed of several Residual Channel-
wise Attention Blocks (RCAB). Unlike commonly used residual blocks, RCAB
incorporates an adaptive channel-wise attention mechanism to detect the channel-
wise relevance. As a result, RCAB re-scales the extracted residual features based
on channel relevance rather than evaluating all features equally. We inherit this by
introducing RIRB. To maintain shallow information flow, our RIRB layers have
multiple RCABs, one convolution layer, and one skip connection.
Deep learning method on X-ray image super-resolution 277
17.5.1 EDRN
PLR stands for the input image, and PSR for the comparable output image. For both
lost information and interfering noise, we use a scale of 3. First, we extract low-
level features. SRRN’s layer extracts 128 features. PM stands for features that have
been recovered from large- and medium-scale data loss. In the S-SRR network, to
recover lost data and eliminate noise at the most granular level, we use a RIRB and
a convolution layer.
PS ¼ KS;mc ðKS;1 ðKjn2 ðPMÞÞ þ P0 ; (17.6)
where KS,mc stands for the SSRRN’s final convolution layer, KS,1 for the RIRB, and
Kjn2 for deconvolution layer. KS,1 convolution layer and Kjn2 extract 64 features. When
mapping onto the RGB color space, PS stands for the qualities that have been restored,
for each of the three lost information scales. To map the returning features to a super-
resolved high-resolution image, we use a convolution layer. PSR = FEDRN (PLR),
where the term FEDRN refers to EDRN’s whole architecture.
The RIRB’s jth result PR, j can be written as
PR; j ¼ KR;j ðPR;j1 Þ ¼ PR;j ðKR;j1 ð ðKR;1 ðPR;0 ÞÞ ÞÞ; (17.7)
where KR,j stands for the jth RCAB and PR,0 for the input of the RIRB. As a result,
the output of the RIRB can be written as
PR ¼ KR;mc ðKR;J ð ðKR;1 ðPR;0 ÞÞ ÞÞ þ PR;0 ; (17.8)
LR IMAGES SR IMAGES
CONVOLUTION CONVOLUTION
64*128*128
feature maps
CONV+BN
DECONV+BN
128*64*64
CONV+BN
where KR,mc stands for the final convolution layer of the RIRB. The skip connec-
tion keeps the previous network’s information. It improves network resilience.
once and three times would be 1 dB and 0.03 dB higher, respectively, when the
coarse-to-fine approach is not used, as indicated in the last lines. The comparison
shows that using two down-scaling/up-scaling processes is the best and highlights the
efficiency of the encoder–decoder structure.
Figure 17.4 Low-resolution X-ray images on the top in that the information of the
figure is inappropriate; the below images are the high-resolution
images of (a), (b), (c), (d), and (e)
50 50 50
50 50 50
Figure 17.5 High images, low images, and predicted images. In this, we take the
images of high and low from the datasets. The predicted images are
from the experimental result.
RCAN [9] for scaling 2, but similarly with EDSR [7] and D-DBPN [8]. Only 74
convolution layers overall in our EDRN. With more than 400 convolution layers,
RCAN [9] stacks 10 residual groups made up of 20 residual channel-wise blocks of
attention. Our results cannot match the performance of the compared results.
Deep learning method on X-ray image super-resolution 281
RCAN [9], D-DBPN [8], and EDSR [7] were used for scaling 3 and 4. The
usefulness and reliability of our EDRN can be further illustrated by comparing it to
traditional SISR. First, due to the vast dataset and lack of noise influence, BN is
not appropriate for traditional single-image SR. Second, the relationship between
large-range pixels is captured by the encoder–decoder structure. When scaling 3
and 4, the input itself has a big receptive field, thus the down-scaling operation
would lose a lot of details, making it more challenging to recover finer lost infor-
mation. Third, our EDRN has a quicker execution time compared to EDSR [7],
D-DBPN [8], and RCAN [9]. As the strictly fair comparison demonstrated, our
EDRN can nevertheless produce comparable results even when using certain incor-
rect components and a smaller network. The usefulness and reliability of our EDRN
can be further illustrated by comparing it to traditional single-picture super-resolution.
17.7 Conclusion
We presented an EDRN for actual single-image super-resolution in this chapter.
Because of the bigger receptive field, the encoder–decoder structure may extract
features with more context information. The coarse-to-fine structure can gradually
restore lost information while reducing noise impacts. We also spoke about how to
use normalization. The batch normalization provided for down-scaling/up-scaling
convolution layers can minimize the effect of noise. Our EDRN can effectively
recover a high-resolution image from a distorted input image and is capable of
high-frequency details.
References
[1] M.H. Yeh The complex bidimensional empirical mode decomposition.
Signal Process, 92(2), 523–541, 2012.
[2] H. Li, X. Qi, and W. Xie. Fast infrared and visible image fusion with
structural decomposition. Knowledge-Based Systems, 204, 106182, 2020.
[3] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual
networks for single image super-resolution. In 2017 IEEE Conference on
Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1132–
1140, July 2017.
[4] J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolution using
very deep convolutional networks. In 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 1646–1654, June 2016.
[5] D. Kingma and J. Ba. Adam: a method for stochastic optimization. In
International Conference on Learning Representations (ICLR) 2015,
December 2015.
[6] X. Mao, C. Shen, and Y. Yang. Image restoration using very deep con-
volutional encoder-decoder networks with symmetric skip connections. In
Advances in Neural Information Processing Systems, vol. 29, pp. 2802–
2810, 2016. Curran Associates, Inc.
282 Deep learning in medical image processing and analysis
1
Thiagarajar College of Engineering, India
284 Deep learning in medical image processing and analysis
18.1 Introduction
The skin is the largest organ in the human body, and skin cancer is the most pre-
valent worldwide health issue. The skin regulates body temperature in the human
body. In general, the skin connects to other organs such as muscles, tissues, and
bones, and protects us from harmful heat, ultraviolet rays, and infections. The
nature of skin cancer varies depending on the weather, making it utterly unpre-
dictable. Diepgen and Mahler (2002) have described the epidemiology of skin
cancer briefly. The best and most efficient strategy to improve the survival rate of
those who have been diagnosed with melanoma is to identify and treat it in its
earliest stages. The advancement of dermoscopy techniques can significantly
improve the accuracy of melanoma diagnosis, thereby increasing the rate of sur-
vival for patients. Dermoscopy is a methodology for monitoring the skin carefully.
It is a method that employs polarized light to render the contact region translucent
and display the subsurface skin structure. Manual interpretation of dermoscopy
images is an arduous and longtime process to implement. Even though melanoma is
early diagnosis can result in a relatively high chance of survival. Skin cancer has
become the most prevalent medical condition globally. In general, the skin of the
human body connects to other organs such as muscles, tissues, and bones, and
protects us from harmful heat, ultraviolet rays, and infections. The nature of skin
cancer varies depending on the weather, making it utterly unpredictable. Diepgen
and Mahler (2002) have described the epidemiology of skin cancer briefly.
automates the detection of skin diseases to enable early diagnosis. This model
encompasses three key phases: data collection and augmentation, design and
development of the model architecture, and ultimately accurate disease prediction.
She has used machine learning techniques such as SVM and CNN and has aug-
mented the model with image processing tools for a more accurate prediction,
resulting in 85% accuracy for the developed model. Li et al. (2016) have used novel
data synthesis techniques to merge the individual images of skin lesions with the
fill body images and have used deep learning techniques like CNN to build a model
that uses the synthesized images as the data for the model to detect malignant skin
cancer with greater accuracy than the traditional detection and tracking methods.
Monika et al. (2020) have used ML techniques and image processing tools to
classify skin cancer into various types of skin-related cancers and have used der-
moscopic images as the input for the pre-processing stage; they have removed the
unwanted hair particles that are present in the skin lesions using the dull razor
method and have performed image smoothing using the median filter as well as the
gaussian filter are both employed to remove the noise. Nawaz et al. (2022) have
developed a fully automated method for the earlier detection of skin cancer using
the techniques of RCNN and FKM (fuzzy k-means clustering) and have evaluated
the developed method using three standard datasets, namely, PH2, International
Skin Imaging Collaboration dataset (2017), and International Symposium on
Biomedical Imaging dataset (2016), achieving an accuracy of 95.6%, 95.4%, and
93.1%, respectively. Hasan et al. (2019) have used ML and image processing to
design an artificial skin cancer system; they have used feature extraction methods
to extract the affected skin cells features from the skin images and segmented using
the DL techniques have achieved 89.5% accuracy and 93.7% training accuracy for
the publicly available dataset.
Hossin et al. (2020) have used multilayered CNN techniques in conjunction
with regularization techniques such as batch normalization and dropout to classify
dermoscopic images for the earlier detection of skin cancer, which helps to reduce
the medical cost, which may be high if the cancer is detected at a later stage. Ansari
and Sarode (2017) used SVM and image processing techniques for the rapid
diagnosis of skin cancer, mammogram images were used as model input, and pre-
processed input for better image enhancement and to remove noise; the thresh-
olding method is used for the segmentation purpose and GLCM methods are
utilized to extract the image’s features, and support vector machine is used to
identify the input image. Garg et al. (2018) have used image processing techniques
to detect skin cancer from a digital image; the image is pre-processed to avoid the
excessive noise that is present in the image, followed by segmentation and feature
extraction from the pre-processed image, and implemented the ABCD rule, which
assesses the dermoid cyst using a variety of criteria like color of the skin tissue,
asymmetry, diameter, and border irregularity of the lesion.
Alquran et al. (2017) have used image processing tools for the earlier detection
of melanoma skin cancer; their process of detection involves the collection of
dermoscopic images, followed by segmentation and feature extraction; they have
used thresholding to perform segmentation and have extracted the statistical
286 Deep learning in medical image processing and analysis
features using techniques such as GLCM and ABCD; and they have used PCA for
the selection of features, followed by the total dermoscopy score calculation. Jana
et al. (2017) have developed a technology for skin cancer detection that may
require four steps: the removal of hair, the removal of noise, resizing of the image,
and sharpening of the image in the image pre-processing step; they have used
techniques, such as threshold in the histogram, k-means, etc., for the segmentation
purpose; extraction of features from the segmented images; and classification using
the techniques such as SVM, FFNN, and DC. Thaajwer and Ishanka (2020) have
used image processing techniques and SVM for the development of an efficient
skin cancer diagnosis system; they have pre-processed the image to have a
smoothed and enhanced image; they have used thresholding and morphological
methods for segmentation; they have extracted the features using the GLCM
methods; and the extracted features are used for classification with the SVM.
Table 18.1 gives the clear view of comparative analysis for this proposed work.
18.3 Methodology
Figure 18.1 describes the process of collecting data, pre-processing it, and then
evaluating the model on the trained dataset. The dataset was retrieved from the
Kaggle resource with all rights bound to the ISIC-Archive. Both the training and
testing classes contained an equal number of images. The aim of the melanoma
project of the International Skin Imaging Collaboration is to decrease the increas-
ing number of deaths caused to melanoma and to enhance the effectiveness of
CNNs-based deep learning classification 287
Data Pre-Processing
Skin Cancer
(Melanoma)
Dataset
Prediction - benign
Deep Learning Models 0
50
150
200
P 250
Inception E
Mobilenet
V3 V2 D Prediction - malignant
0
Melanoma I 50
150
ConvPad T 200
I 250
diagnostic testing. The dataset contains two distinct class labels, benign and
malignant, denoting the less harmful and severe stages of the melanoma skin cancer
disease, respectively. The primary objective of our model is to visually categorize these
benign and malignant types using robust algorithms. Tensor Flow, an open-source
library developed by Google that supports distributed training, immediate model
iteration, simple debugging with Keras, and much more, was imported. Tensor Flow’s
288 Deep learning in medical image processing and analysis
inner computation is primarily utilized for machine learning and deep learning projects,
where it contributes significantly. It consists of standard normalization and image
enhancement processes. Normalization is a method used in image processing and is
frequently called histogram stretching or contrast stretching. As its name implies,
image enhancement is the process of improving the graphical fidelity of photographs
by means of the extraction of detail which is more reliable and accurate information
from them. Widely employed to call attention to or emphasize particular image ele-
ments. The sharpness and color quality of the images were improved by image
enhancement, resulting in high-quality images for both benign and malignant tumors.
In addition, some data management tasks contained in data include data aug-
mentation and methods currently used for data augmentation (black-box methods) that
use deep neural networks and histogram-based methods which fall into two major
categories in the problem of image classification. Additionally, subplots are required to
narrow the focus on the lesion that is predicted to be cancerous or normal. Inception v3
is a deep learning model on convolution neural networks with greater and Inception
v3 has deeper neural connections over Inceptions v1 and v2, however, its performance
is unchanged. It is the next step in model construction, and it is a significant contributor
to the success of the project. It employs adjunct classifiers as regularizers.
18.3.1 MobileNetv2
The objective of MobileNetV2 is to train a discriminative network using transfer
learning to predict benign versus malignant skin cancers (Figure 18.2). This study’s
dataset consists of training and testing data culled from the literature. Experiments
were conducted on a GPU-based cluster with the CUDA and cuDNN libraries
installed. The results indicate that, except for Inception v3, the MobileNet model
outperforms the other approaches in regard to precision, recall, and accuracy. The
batch size is decisive that the amount of parallel processing necessary to execute the
algorithm. The larger the batch size, the greater the number of parallel computations
performed. This permits multiple instances of the algorithm to run concurrently on a
single machine, making it efficient and quick. The batch size grows as the number of
parallel computations performed increases. It is essential to ensure that the batch size
is sufficient for your needs, although this technique is utilized by many algorithms. If
you are running a small number of instances, a large batch size could potentially
hinder your performance.
18.3.2 Inception v3
The Inception v3 model has 42 layers and a reduced margin of error than its precedents
(Figure 18.3). The trained network is capable of identifying 1,000 distinct types of
objects in images. The network has obtained holistic feature representations for a variety
of image categories. Some of the symmetric and asymmetric building blocks of the
model encompass convolutions, maximum pooling, expulsions, and fully connected
layers. The model makes extensive utilization regularization, which is implemented in
the activation components as well. Softmax is used to calculate the loss. MobileNetv2, a
convolution neural network for image classification with 53 layers, is a further sup-
porting algorithm. MobileNet’s architecture is distinctive in that it requires minimal
DETECTION OF MELANOMA SKIN CANCER WITH MOBILEV2
ARCHITECTURE DIAGRAM
processing power to function. This makes it possible for computers, embedded systems,
and mobile devices to operate without GPUs. MobileNetV2 contains two distinct vari-
eties of blocks. The remaining block has a single pace. Another obstacle for shrinking by
two strides. There are three levels for both kinds of blocks. It is built on a reversed
residual arrangement in which residual connections connect the bottleneck layers.
Lightweight fully convolutional convolutions are used as a variation source to filter
intermediate expansion layer characteristics. MobileNetV2’s architecture is comprised of
a fully convolutional 32-filter initial layer and 19 supplemental bottleneck layers.
18.4 Results
Experimental analysis was conducted in Google Colab with 12.6GB RAM and
2.3GHz GPU: 1xTesla K80 with Python 3 Google Compute Engine backend
(GPU). An experimental analysis was conducted.
Table 18.2 Classification of ISIC images within the various classes and subsets
Model accuracy
1000
0.975
0.950
0.925
Accuracy
0.900
0.875
0.850
0.825 Train accuracy
validation accuracy
0.800
0 5 10 15 20 25
Epoch
Figure 18.4 Image depicting the graph variance of the model accuracy
Model loss
train loss
0.8 validation loss
0.6
Loss
0.4
0.2
0.0
0 5 10 15 20 25
Epoch
Figure 18.5 Image depicting the graph variance of the model loss
50 50
100 100
150 150
200 200
250 250
Figure 18.6 Image depicting the prediction result of the malignant type and
benign type
294 Deep learning in medical image processing and analysis
18.5 Conclusion
Multiple deep learning methodologies are utilized in this research to identify
melanoma-affected and unaffected skin cancer images. The most accurate techni-
que for deep learning is identified. To create a multi-layer deep convolutional
neural network (ML-DCNN) for glaucoma classification and detection, 1,559 raw
pixel skin cancer images are prepared with MobileNetV2 and Inception v3 to
extract features for the deep learning model. The deep learning model is deployed
using MobileNetV2 and Inception v3: the former employs 53 layers for identifying
melanoma skin cancer and the latter utilizes 48 layers for categorizing melanoma
and non-melanoma skin cancers. To assess the effectiveness of deep learning
models, we utilize the statistical measures of precision, validation precision, loss,
and validation loss to ensure the effectiveness of our models. Inception V3 model
achieves an accuracy of 99.17%, validation accuracy of 0.9176, loss of 0.0249, and
validation loss of 0.7740. Evaluated by comparing to Mobile Netv2, which has an
accuracy of 92.46%, validation accuracy of 0.8788, loss of 0.0368, and validation
loss of 0.8820. The proposed deep learning model with InceptionV3 yielded dis-
tinct statistical values for distinct melanoma skin cancer stage categories. The
obtained results are comparable to previous benchmarks, and the classification of
melanoma skin cancer was made more efficient. The proposed method performs
admirably; however, in the future, this model will be integrated with web appli-
cations to facilitate accessibility.
References
[1] Daghrir, J., Tlig, L., Bouchouicha, M., and Sayadi, M. (2020, September).
Melanoma skin cancer detection using deep learning and classical machine
learning techniques: a hybrid approach. In 2020 5th International
Conference on Advanced Technologies for Signal and Image Processing
(ATSIP) (pp. 1–5). IEEE.
[2] Dildar, M., Akram, S., Irfan, M., et al. (2021). Skin cancer detection: a
review using deep learning techniques. International Journal of
Environmental Research and Public Health, 18(10), 5479.
[3] Hosny, K. M., Kassem, M. A., and Foaud, M. M. (2018, December). Skin
cancer classification using deep learning and transfer learning. In 2018 9th
Cairo International Biomedical Engineering Conference (CIBEC) (pp. 90–
93). IEEE.
[4] Nahata, H., and Singh, S. P. (2020). Deep learning solutions for skin cancer
detection and diagnosis. In Machine Learning with Health Care Perspective
(pp. 159–182). Springer, Cham.
[5] Vidya, M., and Karki, M. V. (2020, July). Skin cancer detection using
machine learning techniques. In 2020 IEEE International Conference on
Electronics, Computing and Communication Technologies (CONECCT)
(pp. 1–5). IEEE.
CNNs-based deep learning classification 295
[19] Demir, A., Yilmaz, F., and Kose, O. (2019, October). Early detection of skin
cancer using deep learning architectures: resnet-101 and inception-v3. In
2019 Medical Technologies Congress (TIPTEKNO) (pp. 1–4). IEEE.
[20] Emara, T., Afify, H. M., Ismail, F. H., and Hassanien, A. E. (2019,
December). A modified inception-v4 for imbalanced skin cancer classifica-
tion dataset. In 2019 14th International Conference on Computer
Engineering and Systems (ICCES) (pp. 28–33). IEEE.
[21] Yélamos, O., Braun, R. P., Liopyris, K., et al. (2019). Usefulness of der-
moscopy to improve the clinical and histopathologic diagnosis of skin can-
cers. Journal of the American Academy of Dermatology, 80(2), 365–377.
[22] Barata, C., Celebi, M. E., and Marques, J. S. (2018). A survey of feature
extraction in dermoscopy image analysis of skin cancer. IEEE Journal of
Biomedical and Health Informatics, 23(3), 1096–1109.
[23] Leiter, U., Eigentler, T., and Garbe, C. (2014). Epidemiology of skin cancer.
In Reichrath J. (ed.), Sunlight, Vitamin D and Skin Cancer (pp. 120–140).
Springer.
[24] Argenziano, G., Puig, S., Iris, Z., et al. (2006). Dermoscopy improves
accuracy of primary care physicians to triage lesions suggestive of skin
cancer. Journal of Clinical Oncology, 24(12), 1877–1882.
[25] Diepgen, T. L. and Mahler, V. (2002). The epidemiology of skin cancer.
British Journal of Dermatology, 146, 1–6.
[26] Gloster Jr, H. M. and Brodland, D. G. (1996). The epidemiology of skin
cancer. Dermatologic Surgery, 22(3), 217–226.
[27] Armstrong, B. K. and Kricker, A. (1995). Skin cancer. Dermatologic Clinics,
13(3), 583–594.
Chapter 19
Deep learning applications in ophthalmology and
computer-aided diagnostics
Renjith V. Ravi1, P.K. Dutta2, Sudipta Roy3 and S.B. Goyal4
Recently, artificial intelligence (AI) that is based on deep learning has gained a lot of
attention. Deep learning is a new technique that has a wide range of potential uses in
ophthalmology. To identify diabetic retinopathy (DR), macular edema, glaucoma,
retinopathy of prematurity, and age-related macular degeneration (AMD or ARMD),
DL has been utilized in optical coherence tomography, images of fundus, and visual
fields in ophthalmology. DL in ocular imaging can be used along with telemedicine
as an effective way to find, diagnose, and check up on serious eye problems in people
who need primary care and live in residential institutions. However, there are also
possible drawbacks to the use of DL in ophthalmology, such as technical and clinical
difficulties, the inexplicability of algorithm outputs, medicolegal concerns, and
doctor and patient resistance to the “black box” AI algorithms. In the future, DL
could completely alter how ophthalmology is performed. This chapter gives a
description of the cutting-edge DL systems outlined for ocular applications, possible
difficulties in clinical implementation, and future directions.
19.1 Introduction
Artificial intelligence (AI) is used in computer-aided diagnostics (CAD), which is
one way to make the process of diagnosis more accurate and easier to use. “Deep
learning” (DL) is the best way to use AI for many tasks, including problems with
medical imaging. It has been utilized for diagnostic imaging tasks for various dis-
eases in ophthalmology.
The fourth industrial revolution is in the development of AI. Modern AI
methods known as DL have attracted a lot of attention world widen in recent years
[1]. The representation-learning techniques used by DL to process the input data
1
Department of Electronics and Communication Engineering, M.E.A. Engineering College, India
2
Department of Engineering, Amity School of Engineering and Technology, Amity University Kolkata,
India
3
Artificial Intelligence & Data Science, Jio Institute, India
4
City University College of Science and Technology, Malaysia
298 Deep learning in medical image processing and analysis
have many degrees of abstraction, eliminating the need for human feature engi-
neering. This lets DL automatically find complicated systems in high-dimensional
data by projecting those systems onto lower-dimensional manifolds. DL has
achieved noticeably higher accuracy than traditional methods in several areas,
including natural-language processing, machine vision, and speech synthesis [2].
In healthcare and medical technology, DL has primarily been used for medical
imaging analysis, where DL systems have demonstrated strong diagnostic perfor-
mance in identifying a variety of medical conditions, including malignant mela-
noma on skin photographs, and tuberculosis from chest X-rays [1]. Similarly,
ophthalmology has benefited from DL’s incorporation into the field.
An advancement in the detection, diagnosis, and treatment of eye illness
is about to occur in ophthalmology. DL technology that is computer-based
is driving this transformation and has the capacity to redefine ophthalmic
practice [3].
Visual examination of the eye and its surrounding tissues, along with pattern
recognition technology, allows ophthalmologists to diagnose diseases. Diagnostic
technology in ophthalmology gives the practitioner additional information via
digital images of the same structures. Because of its reliance on imagery, oph-
thalmology is well-positioned to gain from DL algorithms. The field of ophthal-
mology is starting to use DL algorithms, which have the potential to alter the core
kind of work done by ophthalmologists [3]. In the next few years, computer-aided
intelligence will probably play a significant role in eye disease screening and
diagnosis. These technological developments may leave human resources free to
concentrate on face-to-face interactions between clinicians and patients, such as
discussions of diagnostic, prognostic, and available treatments. We anticipate that
for the foreseeable future, a human physician will still be required to get per-
mission and perform any necessary medical or surgical procedures. Decision-
making in ophthalmology is likely to use DL algorithms sooner than many would
anticipate.
19.1.1 Motivation
There is a lot of work to be done in the industrialized environment of today,
utilizing a variety of electronic devices, including tablets, mobiles, laptops, and
many more. Due to COVID-19’s effects, most people worked mostly from home
last year, utilizing a variety of internet platforms. Most individuals have vision
problems as a result of these conditions. Additionally, those who have visual
impairments are more susceptible to other illnesses, including diabetes, heart
conditions, stroke, increased blood pressure, etc. They also have a higher risk of
falling and getting depressed [4]. According to current publications, reviews, and
hospital records, many people have been identified with different eye disorders
such as AMD, DR, cataracts, choroidal neovascularization, glaucoma, keratoco-
nus, Drusen, and many more. As a consequence, there is a worldwide problem
that must be dealt with. According to the WHO study, medical professionals’
perspectives, and researchers’ theories, these eye illnesses are the main reasons
Ophthalmology and computer-aided diagnostics 299
why people go blind. As the world’s population ages, their population will
increase exponentially.
Overall, relatively few review papers that concurrently cover all DED
detection methods are published in academic databases. As a result, this review
of the literature is crucial for gathering research on the subject of DED detection.
A detailed review of eye problems such as glaucoma, diabetic retinopathy (DR),
and AMD was published by Ting et al. [1]. In their study, they summarized a
number of studies that were chosen and published between 2016 and 2018. They
provided summaries of the publications that made use of transfer learning (TL)
techniques using fundus and optical coherence tomography images. They
excluded the diagnosis of ocular cataract illness from their study’s scope and did
not include recent (2019–2020) papers that used TL techniques in their metho-
dology. Similarly to this, Hogarty et al.’s [5] work applying AI in ophthalmology
was lacking in comprehensive AI approaches. Mookiah et al. [6] evaluated
research on computer-assisted DR identification, the majority of which is lesion-
based DR. In [7], Ishtiaq et al. analyzed thorough DR detection techniques from
2013 to 2018, but papers from 2019 and 2020 were not included in their eva-
luation. Hagiwara et al. [8] examined a publication on the utilization of fundus
images for computer-assisted diagnosis of GL. They spoke about computer-aided
systems and optical disc segmentation systems. Numerous papers that use DL and
TL approaches for GL detection but are not included in their review article exist.
Reviewing publications that take into account current methods of DED diagnosis
is crucial. In reality, the majority of researchers did not include in their review
papers the time period of publications addressed by their investigations. Both the
clinical scope (DME, DR, Gl, and Ca) and methodological scope (DL and ML) of
the existing reviews were inadequate. Therefore, to cover the current techniques
for DR detection developed by DL-based methods and solve the shortcomings of
the aforementioned research, this paper gives a complete study of DL meth-
odologies for automated DED identification published during the period 2014
and 2020. The government of India had launched National Programme for
Control of Blindness and Visual Impairment (NPCB&VI) and has conducted a
survey [9] about blindness in India. The major causes of blindness in India and
the rate of blindness according to this survey are shown in Figures 19.1 and 19.2,
respectively.
Despite being used to diagnose and forecast the prognosis of several ophthal-
mic and ocular illnesses, DL still has a lot of unrealized potentials. It would be a
promising future for the healthcare sector since DL-allied approaches would radi-
cally revolutionize vision treatment, even if the elements of DL are just now
beginning to be unveiled. Therefore, the use of DL to enhance ophthalmologic
treatment and save healthcare costs is of particular relevance [4]. In order to keep
up with ongoing advancements in the area of ophthalmology, this review delves
further into researching numerous associated methods and datasets. Therefore, this
study aims to open up opportunities for new scientists to understand ocular eye
problems and research works in the area of ophthalmology to create a system which
is completely autonomous.
300 Deep learning in medical image processing and analysis
Surgical
Complication 1%
Glaucoma
6%
Corneal Blindness
1%
Refractive Error
20% Cataract
62%
1.20% 1.10%
1%
Prevalence of blindness
1.00%
0.80%
0.60%
0.45%
0.40% 0.30%
0.20%
0.00%
2001–02 2006–07 2015–18 Target of 2020
Year of Survey
Medical image evaluation is one of many phrases often used in connection with
computer-based procedures that deal with analysis and decision-making situations.
The term “computer-aided diagnosis” describes methods in which certain clinical
traits linked to the illness are retrieved using image-processing filters and tools
[10]. In general, any pattern categorization technique needing a training program,
either supervised or unsupervised, to identify potential underlying patterns is
referred to as “machine learning (ML).” Most often, the term DL belongs to
Ophthalmology and computer-aided diagnostics 301
machine learning techniques using convolutional neural networks (CNNs) [11]. The
basic structure of a CNN is shown in Figure 19.3. According to the information learned
from the training data, which is a crucial component of feature classification approa-
ches, such networks use a collection of filters for image processing to retrieve different
kinds of picture characteristics that the system considers suggestive of pathological
indications [12]. In order to find the best image-processing filters or tools that can
quantify different illness biomarkers, DL might be seen as a brute-force method.
Finally, in a very broad sense, the term “artificial intelligence” (AI) denotes any sys-
tem, often based on machine learning, that is able to recognize key patterns and
characteristics unsupervised, without the need for outside assistance—typically from
people. Whether a real AI system currently exists or not is up for debate [3].
Initially, a lack of computer capacity was the fundamental problem with CNN
applications in the real world. Running CNNs with larger depths (deep learners) is
now considerably more powerful and time-efficient because of the emergence of
graphics processing units (GPUs), which have far exceeded the computing cap-
ability of central processing units (CPUs). These deep neural networks, sometimes
termed as “DL” solutions, include neural network architectures like SegNet,
GoogLeNet, and VGGNet [13]. These approaches hold great potential for both
industrial applications like alternative suggestive algorithms that are akin to
Netflix’s reference system and for generalizing the entire content of a picture.
There have been several initiatives in recent years to evaluate these systems for use
in medical settings, especially biomedical image processing.
The two primary types of DL solutions for biological image analysis are:
1. Providing the DL network simply with photos and the diagnoses, labels, and stages
that go along with them, sometimes known as “image-classification approaches.”
2. “Semantic-segmentation approaches” refers to the process of giving the structure
of image data and its accompanying ground-truth masks (black-and-white photos)
wherein the pathological states associated with the illness are hand-drawn.
302 Deep learning in medical image processing and analysis
Vitreous humor
Cornea
Retina
Fovea
Pupil Lens
Iris
Conjunctiva
Optic nerve
19.4.3 Glaucoma
The optic nerve of the eye may be harmed by a variety of glaucoma-related dis-
orders, which may result in blindness and loss of vision. Glaucoma occurs as the
normal fluid pressure inside the eyes progressively rises. However, recent studies
show that even with normal eye pressure, glaucoma may manifest. Early therapy
may often protect your eyes from severe vision loss.
The two types of glaucoma conditions are “closed angle” and “open angle”
glaucoma. The open-angle glaucoma is a serious illness that develops gradually
over time without the patient experiencing loss of vision until the ailment is quite
far along. It is known as the “sneak thief of sight” for this reason. Ankle closure
may come on quickly and hurt. Visual loss may worsen fast, but the pain and
Retina
Blood vessels
Optic nerve
Damaged
macula
Macular degeneration
suffering prompt individuals to seek medical care before irreversible damage takes
place. The normal eye versus the eye with glaucoma is shown in Figure 19.7.
19.4.4 Cataract
The cataract is a clouded region that is created on the lens of the eye, which causes
a reduction in one’s ability to see. The progression of cataracts is often sluggish and
might impair one or both eyes. Halos around lights, fading colors, blurred or double
vision, trouble with bright lights, and trouble seeing at night are just a few symp-
toms. As a consequence, you can have problems reading, driving, or recognizing
faces [4]. Cataracts can make it hard to see, which can also make you more likely to
trip and feel down. 51% of instances of vision loss and 33% of cases of vision
impairment are brought on by cataracts.
The most common cause of cataracts is age, although they may also be brought
on by radiation or trauma, be present at birth, or appear after eye surgery for
another reason. Risk factors include diabetes, chronic corticosteroid use, smoking,
prolonged sunlight exposure, and alcohol use [4]. The basic procedure comprises
the lens’s ability to gather protein clumps or yellow-brown pigment, which reduces
light’s ability to reach the retina at the back of the eye. An eye exam is used to
make the diagnosis. The problem of cataracts is depicted in Figure 19.8.
Normal vision
Vision with glaucoma
Light
Retina Lens
Cloudy lens, or cataract, causes blurry vision
This model achieved accurate diagnosis in several ethnic groups with specificity,
AUC, and sensitivity of 91.6%, 0.936, and 90.5%, respectively. Despite the fact
that the majority of research has created reliable DL-based models for the diagnosis
of DR and screening using CFP or optical coherence tomography (OCT) photos,
other studies have concentrated on automatically detecting DR lesions in fundus
fluorescein angiogram (FFA) images. To create an end-to-end DL framework for
staging the severity of DR, non-perfusion regions, vascular discharges, and
microaneurysms were categorized automatically under many labels based on DL
models [16,17]. Additionally, DL-based techniques have been utilized to forecast
the prevalence of DR and associated systemic cardiovascular risk factors [18] as
well as predict the severity of diabetic macular edema (DME) based on the OCT
from two-dimensional fundus images (sensitivity of 85%, AUC of 0.89, and spe-
cificity of 80%) [19]. Additionally, since the American Food and Drug Agency
(FDA) authorized IDx-DR as the [20] first electronic AI diagnosis system and the
EyRIS SELENA [21] was given clearance for medical usage in the European
Union, commercial devices for DR screening have been created [22].
More recently, DL systems with good diagnostic performance were reported
by Gulshan and team members [23] from Google AI Healthcare. A team of 54 US-
licensed medical experts and ophthalmology fellows rated 128 and 175 retinal
pictures for DR and DMO three to seven times each from May to December
2015 in order to build the DL framework. About 10,000 photos from two freely
accessible datasets (EyePACS-1 and Messidor-2) were included in the test set,
which at least seven US board-certified medical experts assessed with good intra-
grade accuracy. For EyePACS-1 and Messidor-2, the AUC was 0.991 and 0.990,
accordingly.
19.5.2 Glaucoma
If glaucoma sufferers do not get timely detection and quick treatment, they risk
losing their visual fields (VF) permanently [22]. This is a clear clinical need that
may benefit from using AI. AI research on glaucoma has come a long way, even
though there are problems like not enough multimodal assessment and long-term
natural spread. Several studies [15,24–28] have used AI to effectively identify
structural abnormalities in glaucoma using retinal fundus images and OCT [29–31].
Utilizing an SVM classifier, Zangwill et al. [32] identified glaucoma with high
accuracy. In order to diagnose glaucoma, Burgansky et al. [33] employed five
classifiers, including a machine learning assessment of OCT image dataset.
In the diagnosis and treatment of glaucoma, VF evaluation is crucial. In order
to create a DL system, VF offers a number of proven parameters. With 17 proto-
types, Elze et al. [34] created an unsupervised method to categorize glaucomatous
visual loss. VF loss in early glaucoma may be found using the unsupervised
approach. DL algorithms have been employed to forecast the evolution of glau-
coma in the VF. A DL model was trained by Wen et al. [35] that produced point-
wise VF predictions up to 5.5 years in the future with an average difference of
0.410 dB and a correlation of 0.920 between both the MD of the projected and real
Ophthalmology and computer-aided diagnostics 309
future HVF. For an accurate assessment and appropriate care, clinical problems
need thorough investigation and fine-grained grading. Huang et al. [36] suggested a
DL method to accurately score the VF of glaucoma using information from two
instruments (the Octopus and the Humphrey Field Analyzer). This tool might be
used by glaucomatous patients for self-evaluation and to advance telemedicine.
Li et al. [37] used fundus photographs that resembled a glaucoma-like optic
disc to train a machine-learning system to recognize an optical disc with a cup-to-
disc proportion of 0.7 vertically. The results showed that the algorithm has a sig-
nificant degree of specificity (92%), sensitivity (95.60%), and AUC for glaucoma
optic neuropathy detection (0.986). In a similar study, Phene et al. [38] proposed a
DL-based model with an AUC of 0.945 to screen for attributable glaucoma using
data from over 080,000 CFPs. Furthermore, its effectiveness was shown when used
with two more datasets, where the performance of AUC marginally decreased to
0.8550 and 0.8810, respectively. According to Asaoka et al. [39], a DL-based
model trained on 4316 OCT images for the early detection of glaucoma has an
AUC of 93.7%, a specificity of 93.9%, and a sensitivity of 82.5%. Xu et al. [40]
identified gonioscopic angle closures and the primary angle closure disorder
(PACD) based on a completely autonomous analysis with an AUC of 0.9280 and
0.9640 using over 4000.0 anterior regions and OCT (AS-OCT) images.
Globally, adults aged 40–80 have a 3.4% risk of developing glaucoma, and by
2040, it is anticipated that there will be roughly 112 million cases of the condition
[41]. Developments in disease identification, functional and structural damage
assessments throughout time, therapy optimization to avoid visual impairment and a
precise long-term prognosis would be welcomed by both patients and clinicians [1].
19.5.3 Age-related macular degeneration
The primary factor behind older people losing their eyesight permanently is AMD.
CFP is the most commonly used screening technique, and it can detect abnormal-
ities such as drusen, retinal hemorrhage, geographic atrophy, and others. CFP is
crucial for screening people for AMD because of its quick, non-invasive, and
affordable benefits. AMD was completely diagnosed and graded with the same
precision by ophthalmologists using a CFP-based DL algorithm.
The macular portion of the retina may be seen using optical coherence tomo-
graphy (OCT). In 2018, Kermany et al. [42] used a transfer learning technique on
an OCT database for choroidal neovascularization (CNV) and three other cate-
gories, using a tiny portion of the training data from conventional DL techniques.
Their model met senior ophthalmologists’ standards for accuracy, specificity, and
sensitivity, with scores of 97.8%, 96.6%, and 97.4%. Studies on the quantitative
evaluation of OCT images using AI techniques have grown in number recently. In
order to automatically detect and measure intraretinal fluid (IRF) and subretinal
fluid (SRF) on OCT images, Schlegl et al. [43] created a DL network. They found
that their findings were extremely similar to expert comments. Erfurth et al. [44]
also investigated the association between the quantity of evacuation and the visual
system after intravitreal injection in AMD patients by identifying and quantifying
retinal outpourings, including IRF, SRF, and pigmented epithelial detachment,
310 Deep learning in medical image processing and analysis
using a DL algorithm. The quantitative OCT volume mode study by Moraes et al.
[45] included biomarkers like subretinal hyperreflective material and hyperre-
flective foci on OCT images in addition to retinal effusion, and the results showed
strong clinical applicability and were directly connected to the treatment choices of
AMD patients in follow-up reviews. Yan et al. [46] used an attention-based DL
technique to decipher CNV activity on OCT images to help a physician diagnose
AMD. Zhang et al. [47] used a DL model for evaluating photoreceptor degradation,
hyperprojection, and retinal pigment epithelium loss to quantify geographic atro-
phy (GA) in addition to wet AMD on OCT images. As additional indicators linked
to the development of the illness, Liefers et al. [48] measured a number of
important characteristics on OCT pictures of individuals with early and late AMD.
One of the hotspots in healthcare AI technologies is the integrated use of
several modalities, which has been found to be closer to clinical research deploy-
ment. In order to diagnose AMD and polypoidal choroidal vasculopathy (PCV), Xu
et al. [49] joined CFP and OCT images and attained 87.40% accuracy, 88.80%
sensitivity, and 95.60% specificity. OCT and optical coherence tomography
angiography (OCTA) pictures were used by Jin et al. [50] to evaluate the features
of a multimodal DL model to evaluate CNV in neovascular AMD. On multimodal
input data, the DL algorithm obtained an AUC of 0.97960 and an accuracy of
95.50%, which is equivalent to that of retinal experts.
In 2018, Burlina et al. [51] created a DL algorithm that automatically per-
formed classification and feature extraction on more than 130,000 CFP sets of data.
Compared to older ways of classifying things into two groups, their DL algorithm
had more potential to be used in clinical settings. A 13-category AMD fundus
imaging dataset was created by Grassmann et al. [52], which had 12 instances of
AMD severity and one that was unable to be assessed due to low image quality.
Finally, they presented an ensemble of networks on an untrained independence test
set after training six sophisticated DL models. In order to detect AMD severity-
related events precisely defined at the single-eye level and be able to deliver a
patient-level final score paired with binocular severity, Peng et al. [53] employed a
DeepSeeNet DL technique interconnected by three seed networks. Some AI
research has centered on forecasting the probability of progression of the diseases
along with AMD diagnosis based on the CFP. The focus on enhancing the DL
algorithm was further expanded in 2019 by Burlina et al. [54], who not only
investigated the clinical impact of the DL technique on the 4 and 9 classification
systems of AMD severity but also observationally used a DL-based regression
analysis to provide patients with a 5-year risk rating for their estimated develop-
ment of the disease too advanced AMD. In a research work carried out in 2020,
Yan et al. [55] used DL algorithms to estimate the probability of severe AMD
creation by combining CFP with patients’ matched genotypes linked with AMD.
detecting and treating a wide range of disorders, such as automated detection and
criticality classifying of cataracts using slit-lamp or fundus images. In identifying
various forms of cataracts [56], AI algorithms have shown good to exceptional
overall diagnostic accuracy, with high AUC (0.860–1.00), accuracy (69.00%–
99.50%), sensitivity (60.10%–99.50%), and specificity (63.2%–99.6%). In [57], Long
et al. used the DL method to create an artificial intelligence platform for genetic
cataracts that performs three different tasks: population-wide congenital cataract
identification, risk delamination for patients with inherited cataracts, and supporting a
channel of treatment methods for medical doctors. By segregating the anatomy and
labeling pathological lesions, Li et al. [58] enhanced the efficacy of a DL algorithm
for identifying anterior segment disorders in slit-lamp pictures, including keratitis,
cataracts, and pterygia. In slit-lamp pictures collected by several pieces of equipment,
including a smartphone using the super macro feature, another DL system performed
admirably in automatically diagnosing keratitis, a normal cornea, and other corneal
abnormalities (all AUCs > 0.96). AI’s keratitis diagnosis sensitivity and specificity
were on track with those of skilled cornea experts. Ye et al. [59] proposed the DL-
based technique to identify and categorize myopic maculopathy in patients with
severe myopia and to recommend treatment. Their model had sensitivities compar-
able to or superior to those of young ophthalmologists. Yoo et al. [60] used 10,561
eye scans and incorporated preoperative data to create a machine-learning model that
could predict if a patient would be a good candidate for refractive surgery. They
achieved an AUC of 0.970 and an accuracy of 93.40% in cross-validation. In order to
create a set of convolutional neural networks (CNN) for recognizing malignant
tumors in ocular tumors, a large-scale statistical study of demographic and clin-
icopathological variables was carried out in conjunction with universal health data-
bases and multilingual clinical data. Besides having a sensitivity of 94.70% and an
accuracy of 94.90%, the DL diagnostic method [61] for melanoma visualization was
able to discriminate between malignant and benign tumors.
19.6.4 Limitations
The use of DL in medical practice might come with various hazards. Some
computer programs use algorithms that have a high risk of false-negative retinal
disease diagnosis. Diagnostic mistakes might come from improperly interpret-
ing the false-positive findings, which could have catastrophic clinical effects on
the patient’s eyesight. In rare cases, the eye specialist may be unable to assess
the performance indicator values utilized by the DL computer program to
examine patient information. The technique through which a software program
reaches its conclusion and its logic is not always clear. It is probable that a lack
of patient trust might be a challenge for remote monitoring (home screening)
using DK-powered automated machines. Studies indicate that several patients
are more likely to prefer in-person ophthalmologist appointments over
computer-aided diagnosis [64,65]. Additionally, there is a chance that physi-
cians would lose their diagnostic skills due to technological addiction. It is vital
to create and implement medicolegal and ethical norms in certain specific cir-
cumstances, such as when a doctor disagrees with the findings of a DL eva-
luation or when a patient is unable to receive counseling for the necessary
therapy. All of these possible issues demonstrate the need for DL technology to
progress over time.
314 Deep learning in medical image processing and analysis
19.8 Conclusion
DL is a new tool that helps patients and physicians alike. The integration of DL
technology into ophthalmic care will increase as it develops, relieving the clinician
of tedious chores and enabling them to concentrate on enhancing patient care.
Ophthalmologists will be able to concentrate their efforts on building patient con-
nections and improving medical and surgical treatment because of DL. Even
though medical DL studies have made significant strides and advancements in the
realm of ophthalmology, they still confront several obstacles and problems. The
advent of big data, the advancement of healthcare electronics, and the public’s need
for high-quality healthcare are all pushing DL systems to the limit of what they can
do in terms of enhancing clinical medical processes, patient care, and prognosis
evaluation. In order to employ cutting-edge AI ideas and technology to address
ophthalmic clinical issues, ocular medical professionals and AI researchers should
collaborate closely with computer scientists. They should also place greater
emphasis on the translation of research findings. AI innovations in ophthalmology
seem to have a bright future. However, a significant amount of further research and
development is required before they may be used routinely in therapeutic settings.
References
[2] H.-C. Shin, H. R. Roth, M. Gao, et al., “Deep convolutional neural networks
for computer-aided detection: CNN architectures, dataset characteristics and
transfer learning,” IEEE Transactions on Medical Imaging, vol. 35,
pp. 1285–1298, 2016.
[3] P. S. Grewal, F. Oloumi, U. Rubin, and M. T. S. Tennant, “Deep learning in
ophthalmology: a review,” Canadian Journal of Ophthalmology, vol. 53,
pp. 309–313, 2018.
[4] P. Kumar, R. Kumar, and M. Gupta, “Deep learning based analysis of
ophthalmology: a systematic review,” In EAI Endorsed Transactions on
Pervasive Health and Technology, p. 170950, 2018.
[5] D. T. Hogarty, D. A. Mackey, and A. W. Hewitt, “Current state and future
prospects of artificial intelligence in ophthalmology: a review,” Clinical &
Experimental Ophthalmology, vol. 47, pp. 128–139, 2018.
[6] M. R. K. Mookiah, U. R. Acharya, C. K. Chua, C. M. Lim, E. Y. K. Ng, and
A. Laude, “Computer-aided diagnosis of diabetic retinopathy: a review,”
Computers in Biology and Medicine, vol. 43, pp. 2136–2155, 2013.
[7] U. Ishtiaq, S. A. Kareem, E. R. M. F. Abdullah, G. Mujtaba, R. Jahangir, and
H. Y. Ghafoor, “Diabetic retinopathy detection through artificial intelligent
techniques: a review and open issues,” Multimedia Tools and Applications,
vol. 79, pp. 15209–15252, 2019.
[8] Y. Hagiwara, J. E. W. Koh, J. H. Tan, et al., “Computer-aided diagnosis of
glaucoma using fundus images: a review,” Computer Methods and
Programs in Biomedicine, vol. 165, pp. 1–12, 2018.
[9] DGHS, National Programme for Control of Blindness and Visual
Impairment (NPCB&VI), Ministry of Health & Family Welfare,
Government of India, 2017.
[10] N. Dey, Classification Techniques for Medical Image Analysis and
Computer Aided Diagnosis, Elsevier Science, 2019.
[11] L. Lu, Y. Zheng, G. Carneiro, and L. Yang, Deep Learning and
Convolutional Neural Networks for Medical Image Computing: Precision
Medicine, High Performance and Large-Scale Datasets, Springer
International Publishing, 2017.
[12] Q. Li and R. M. Nishikawa, Computer-Aided Detection and Diagnosis in
Medical Imaging, CRC Press, 2015.
[13] D. Ghai, S. L. Tripathi, S. Saxena, M. Chanda, and M. Alazab, Machine
Learning Algorithms for Signal and Image Processing, Wiley, 2022.
[14] M. D. Abramoff, Y. Lou, A. Erginay, et al., “Improved automated detection
of diabetic retinopathy on a publicly available dataset through integration of
deep learning,” Investigative Opthalmology & Visual Science, vol. 57,
pp. 5200, 2016.
[15] D. S. W. Ting, C. Y.-L. Cheung, G. Lim, et al., “Development and validation
of a deep learning system for diabetic retinopathy and related eye diseases
using retinal images from multiethnic populations with diabetes,” JAMA,
vol. 318, pp. 2211, 2017.
316 Deep learning in medical image processing and analysis
Clinical techniques used for the timely identification, observation, diagnostics, and
therapy assessment of a wide range of medical problems are just a few examples of
how biomedical imaging is crucial in these clinical applications. Grasping medical
image analysis in computer vision requires a fundamental understanding of the
ideas behind artificial neural networks and deep learning (DL), as well as how they
are implemented. Due to its dependability and precision, DL is well-liked among
academics and researchers, particularly in the engineering and medical disciplines.
Early detection is a benefit of DL approaches in the realm of medical imaging for
illness diagnosis. The simplicity and reduced complexity of DL approaches are
their key characteristics, which eventually save time and money while tackling
several difficult jobs at once. DL and artificial intelligence (AI) technologies have
advanced significantly in recent years. In every application area, but particularly in
the medical one, these methods are crucial. Examples include image analysis,
image processing, image segmentation, image fusion, image registration, image
retrieval, image-guided treatment, computer-aided diagnosis (CAD), and many
more. This chapter seeks to thoroughly present DL methodologies and the potential
for biological imaging utilizing DL, as well as explore problems and difficulties.
20.1 Introduction
Nowadays, medical practice has been utilizing extensive use of biomedical imaging
technology. Experts are doing manually analyze biological images and then piece
all of the clinical evidence together to get the correct diagnosis, depending on their
1
Department of Electronics and Communication Engineering, M.E.A. Engineering College, India
2
Department of Engineering, Amity School of Engineering and Technology, Amity University Kolkata,
India
3
School of Engineering and Technology, Amity University Kolkata, India
4
City University College of Science and Technology, Malaysia
322 Deep learning in medical image processing and analysis
own expertise. Currently, manual biological image analysis confronts four sig-
nificant obstacles: (i) Since manual analysis is constrained by human experience,
the diagnosis may vary from person to person. (ii) It costs a lot of money and takes
years of work to train a skilled expert. (iii) Specialists are under tremendous strain
due to the rapid expansion of biological images in terms of both quantity and
modality. (iv) Specialists get quickly exhausted by repetitive, tiresome analytical
work on unattractive biological images, which might result in a delayed or incorrect
diagnosis, putting patients in danger. In some ways, these difficulties make the lack
of healthcare resources worse, particularly in developing nations [1]. Medical
image analysis with computer assistance is therefore an alternate option.
The use of artificial intelligence (AI) in computer-aided diagnostics (CAD) offers
a viable means of increasing the effectiveness and accessibility of the diagnosis process
[2]. The most effective AI technique for many tasks, particularly issues with medical
imaging, is deep learning (DL) [3]. It is cutting-edge in terms of a variety of computer
vision applications. It has been employed in various medical imaging projects, such as
the diagnosis of Alzheimer’s, the detection of lung cancer, the detection of retinal
diseases, etc. Despite obtaining outstanding outcomes in the medical field, a medical
diagnostic system must be visible, intelligible, and observable in order to gain the
confidence of clinicians, regulators, and patients. It must be able to reveal to everybody
why a specific decision was taken in a particular manner in an idealistic situation.
DL tools are among the most often utilized algorithms compared to the
machine learning approaches for obtaining better, more adaptable, and more pre-
cise results from datasets. DL is also used to identify (diagnose) disorders and
provide customized treatment regimens in order to improve the patient’s health.
The most popular biological imaging techniques for diagnosing patients with the
least amount of human involvement include EEG, ECG, MEG, MRI, etc. [4]. The
possibility of noise in these medical photographs makes correct analysis of them
challenging. DL is able to provide findings that are accurate and precise while also
being more trustworthy. Each technology has benefits and drawbacks.
Similar to DL, it has significant limitations, including promising outcomes for
huge datasets and the need for a GPU to analyze medical pictures, which calls for
more complex system setups [4]. DL is popular now despite these drawbacks
because of its capacity to analyze enormous volumes of data. In this chapter, the
most recent developments in DL for biological pictures are covered. Additionally,
we will talk about the use of DL in segmentation, classification, and registration, as
well as its potential applications in medical imaging.
has limitations caused by the physics of how energy interacts with the physical
body, the equipment used, and often physiological limitations. Since Roentgen
discovered X-rays in 1895, there has been medical imaging. Later, Hounsfield’s
realistic computed tomography (CT) machines introduced computer systems into
clinical practice and medical imaging. Since then, computer systems have evolved
into essential elements of contemporary medical imaging devices and hospitals,
carrying out a range of functions from image production and data collection to
image processing and presentation [6]. The need for image production, modifica-
tion, presentation, and analysis increased significantly as new imaging modalities
were created [7].
In order to identify disorders, medical researchers are increasingly focusing on
DL. As a result of using drugs, drinking alcohol, smoking, and eating poorly,
individuals nowadays are predominantly affected by lifestyle illnesses, including
type 2 diabetes, obesity, heart disease, and neurodegenerative disorders [4]. DL is
essential for predicting these illnesses. For diagnosing and treating any illness using
CAD, single photon emission computed tomography (SPECT), positron emission
tomography (PET), magnetic resonance imaging (MRI), and other methods, it is
preferred in our daily lives. DL may increase the 2D and 3D metrics for more
information and speed up the diagnostic processing time. Additionally, it can tackle
overfitting and data labeling problems.
assess cancer, the efficacy of medicines, cardiac issues, and neurological illnesses,
including Alzheimer’s and multiple sclerosis.
20.2.4 Ultrasound
The body receives high-frequency sound waves, converts them into images, and
then returns them. By mixing sounds and images, medical sonography, ultra-
sonography, or diagnostic ultrasound, may provide acoustic signals, such as the
flow of blood, that let medical experts diagnose the health condition of the patient
[8,9]. A pregnancy ultrasound is often used to check for blood vessel and heart
abnormalities, pelvic and abdominal organs, as well as signs of discomfort, edema,
and infection.
Choose the
Select the suitable Design the analytical
appropriate deep
dataset model
learning algorithm
The complexity of the issue increases with the number of layers in the neural
network. There are many hidden layers in a deep neural network (DNN). At the
326 Deep learning in medical image processing and analysis
moment, a neural network may have thousands or even tens of thousands of layers.
When trained on a huge amount of data, a network of this scale is capable of
remembering every mapping and producing insightful predictions [10]. DL has
therefore had a big influence on fields including voice recognition, computer
vision, medical imaging, and more. Among the DL techniques used in research are
DNN, CNN, RNN, deep conventional extreme learning machine (DC-ELM), deep
Boltzmann machine (DBM), DBN, and deep autoencoder (DAE). In computer
vision and medical imaging, the CNN is gaining greater traction.
Hidden
Layer 3 Hidden
Hidden
Layer 2 Layer 4
Input Hidden Hidden
Layer Layer 1 Layer n
Input
1
Output
Input Layer
2
Output
Input 1
3
Input Output
4 2
Input
5
Input
6
6 6 50
Neurons Neurons Neurons
100 200
Neurons 600 Neurons
Neurons
Hidden Layers
Feature maps
f.maps
Input f.maps
Output
Hidden Layer
Output Layer
I
n
p
u
t
L
a
y
e
r
Figure 20.7 Deep Boltzmann machine with two hidden layers [12]
Encoded Data
processing are desirable, though not essential, choices in the medical sciences [4].
CAD is also important in disease progression modeling [14]. A brain scan is
essential for several neurodegenerative disorders (NDD), including strokes,
Parkinson’s disease (PD), Alzheimer’s disease (AD), and other types of dementia.
Detailed maps of the brain’s areas are now accessible for analysis and illness pre-
diction. We may also include the most common CAD applications in biomedical
imaging, cancer detection, and lesion intensity assessment. CNN has gained greater
popularity in recent years as a result of its incredible performance and depend-
ability. The effectiveness and efficacy of CNNs are shown in an overview of CNN
techniques and algorithms where DL strategies are employed in CAD, shape pre-
dictions, and segmentations as well as brain disease segmentation.
It may be especially difficult to differentiate between various tumor kinds,
sizes, shapes, and intensities in CAD while still employing the same neuroimaging
method. There have been several cases when a concentration of infected tissues has
gathered alongside normal tissues. It is difficult to manage various forms of noise,
such as intensity-based noise, Ricardian noise effects, and non-isotropic fre-
quencies in MRI, using simple machine learning (ML) techniques. These data
issues are characterized using a unique method that blends hand-defined traits with
tried-and-true ML techniques.
Automating and integrating characteristics with classification algorithms is possi-
ble with DL techniques [15,16]. Because CNN has the ability to learn more compli-
cated characteristics, it can handle patches of images that focus on diseased tissues. In
the field of medical imaging, CNN can classify TB manifestations using X-ray images
[17] and respiratory diseases using CT images [18]. With hemorrhage identification in
color fundus images, CNN can identify the smallest and most discriminatory regions in
the pre-training phase [19]. The segmentation of isointense stage brain cells [10] and the
separation of several brain areas from multi-modality magnetic resonance images
(MRI) [20] have both been suggested by CNN. Several hybrid strategies that combine
CNN with other approaches have been presented. For instance, a DL technique is
suggested in [21] to encode the characteristics of distorted models and the procedure for
segmenting the left ventricle of the heart from short-axis MRI. The left ventricle is
identified using CNN, and its morphology is inferred using DAE.
This method seeks to assist computers in identifying and characterizing data
that may be relevant to a particular problem. Many machine-learning algorithms
are based on this idea. Increasingly complex models that are built on top of one
another transform input images into results. For image analysis, CNNs are prefer-
able. They classify and categorize data that could be relevant to a certain problem.
Many ML algorithms are based on this idea. Increasingly complex models that are
built on top of each other transform input images into responses. For image ana-
lysis, CNNs are a superior model [22]. The CNNs analyze the input using many
filter layers. By subjecting them to a variety of input representations, such as three-
dimensional data, DL techniques are usually used in the medical profession to
familiarize themselves with contemporary architecture. Due to the size of 3D
convolutions and the extra restrictions they imposed, CNNs previously ignored
dealing with the high amount of interest.
DL for biomedical image analysis 331
20.5.3 Detection
In the last several decades, academics have paid a lot of attention to object detec-
tion tasks. Researchers have started to think about applying object detection pro-
cesses to healthcare to increase the effectiveness of doctors by using computers to
aid them in the detection and diagnosis of images. DL methods are still making
significant advancements, and object detection processes in healthcare are popu-
larly used in clinical research as part of the AI medical field [28]. From the per-
spective of regression and classification, the problem of object identification in the
medical profession is difficult. Because of their importance in CAD and detection
procedures, several researchers are adapting object detection methods to the med-
ical profession.
A typical objective is finding and recognizing minor illnesses within the entire
image domain. Many researchers have conducted a thorough investigation in this
regard. By recognizing various medical images, computer-aided detection methods
have a long and sordid history and are intended to increase detection performance
or shorten the reading time for individual professionals. For pixel (or particle)
classifications, CNNs continue to be used in most reported deep-learning object
identification approaches [29]. This is proceeded by some image capture to provide
image recommendations.
system was made. To find nodules in X-ray images, it employed a CNN with four
layers [33]. The majority of research on DL-based object identification systems
conducts first-pixel classification with CNN before obtaining object candidates via
post-processing. Multi-stream CNNs may also incorporate 3D or context information
into medical images [34].
Detecting and categorizing objects and lesions are comparable with classifi-
cation. The main distinction is that in order to identify lesions, we must first con-
duct a segmentation task, and only then can we classify or forecast a disease [35].
Currently, DL offers encouraging outcomes that allow for the correct timing of
early diagnosis and therapy for the patient [36,37].
20.5.4 Segmentation
Segmentation is essential for disease/disorder prediction by dividing an image into
several parts and associating them with testing outcomes [38]. The most exten-
sively used framework for 3D image segmentation lately has been the CNN fra-
mework, which is broadly utilized in segmentation. Segmentation is an essential
part of medical image analysis, in which the image is broken up into smaller parts
based on shared characteristics such as color, contrast, grey level, and brightness.
20.5.6 Registration
A frequent activity in the image analysis process is registration, also known as
spatial alignment, which involves calculating a common coordinate to align a
certain object in the images [46]. A unique kind of parameterized transform is
assumed, iterative registration is employed, and then a set of matrices is optimized.
Lesion recognition and segmentation are two of the more well-known applications
of DL, but researchers have shown that DL also produces the greatest results in
registration [47]. Two deep-learning registration algorithms are now being exten-
sively used in research [48]. The first one involves evaluating the similarities
between two pictures in order to derive an iterative optimization approach, while
the second one involves using a DNN to forecast transformation characteristics.
expanding area of study, the use of DL-based techniques in the medical industry is
currently being slowed down by a number of obstacles [4,58,59]. The next sub-
section goes through each of them.
20.9 Conclusion
References
[1] P. Zhang, Y. Zhong, Y. Deng, X. Tang, and X. Li, “A Survey on Deep
Learning of Small Sample in Biomedical Image Analysis,” 2019, arXiv
preprint arXiv:1908.00473.
[2] A. Singh, S. Sengupta, and V. Lakshminarayanan, “Explainable deep learning
models in medical image analysis,” Journal of Imaging, vol. 6, p. 52, 2020.
[3] F. Altaf, S. M. S. Islam, N. Akhtar, and N. K. Janjua, “Going deep in medical
image analysis: concepts, methods, challenges, and future directions,” IEEE
Access, vol. 7, p. 99540–99572, 2019.
[4] M. Jyotiyana and N. Kesswani, “Deep learning and the future of biomedical
image analysis,” in Studies in Big Data, Springer International Publishing,
2019, p. 329–345.
DL for biomedical image analysis 339
[20] W. Zhang, R. Li, H. Deng, et al., “Deep convolutional neural networks for
multi-modality isointense infant brain image segmentation,” NeuroImage,
vol. 108, p. 214–224, 2015.
[21] M. R. Avendi, A. Kheradvar, and H. Jafarkhani, “A combined deep-learning
and deformable-model approach to fully automatic segmentation of the left
ventricle in cardiac MRI,” Medical Image Analysis, vol. 30, p. 108–119,
2016.
[22] S. Panda and R. Kumar Dhaka, “Application of artificial intelligence in
medical imaging,” in Machine Learning and Deep Learning Techniques for
Medical Science, 2022, p. 195–202.
[23] J. Antony, K. McGuinness, N. E. O’Connor, and K. Moran, “Quantifying
radiographic knee osteoarthritis severity using deep convolutional neural
networks,” in 2016 23rd International Conference on Pattern Recognition
(ICPR), 2016.
[24] E. Kim, M. Corte-Real, and Z. Baloch, “A deep semantic mobile application
for thyroid cytopathology,” in SPIE Proceedings, 2016.
[25] S. Singh, J. Maxwell, J. A. Baker, J. L. Nicholas, and J. Y. Lo, “Computer-
aided classification of breast masses: performance and interobserver varia-
bility of expert radiologists versus residents,” Radiology, vol. 258, p. 73–80,
2011.
[26] R. Liu, H. Li, F. Liang, et al., “Diagnostic accuracy of different computer-
aided diagnostic systems for malignant and benign thyroid nodules classifi-
cation in ultrasound images,” Medicine, vol. 98, p. e16227, 2019.
[27] A. Anaya-Isaza, L. Mera-Jiménez, and M. Zequera-Diaz, “An overview of
deep learning in medical imaging,” Informatics in Medicine Unlocked,
vol. 26, p. 100723, 2021.
[28] Y. Shou, T. Meng, W. Ai, C. Xie, H. Liu, and Y. Wang, “Object detection in
medical images based on hierarchical transformer and mask mechanism,”
Computational Intelligence and Neuroscience, vol. 2022, p. 1–12, 2022.
[29] J. Moorthy and U. D. Gandhi, “A survey on medical image segmentation
based on deep learning techniques,” Big Data and Cognitive Computing,
vol. 6, p. 117, 2022.
[30] A. S. Lundervold and A. Lundervold, “An overview of deep learning in
medical imaging focusing on MRI,” Zeitschrift für Medizinische Physik,
vol. 29, p. 102–127, 2019.
[31] K. A. Tran, O. Kondrashova, A. Bradley, E. D. Williams, J. V. Pearson, and
N. Waddell, “Deep learning in cancer diagnosis, prognosis and treatment
selection,” Genome Medicine, vol. 13, Article no. 152, 2021.
[32] K. Lee, J. H. Lockhart, M. Xie, et al., “Deep learning of histopathology
images at the single cell level,” Frontiers in Artificial Intelligence, vol. 4,
p. 754641–754641, 2021.
[33] S.-C. B. Lo, S.-L. A. Lou, J.-S. Lin, M. T. Freedman, M. V. Chien, and S. K.
Mun, “Artificial convolution neural network techniques and applications for
lung nodule detection,” IEEE Transactions on Medical Imaging, vol. 14,
p. 711–718, 1995.
DL for biomedical image analysis 341