Applications of Artificial Intelligence in Medical
Applications of Artificial Intelligence in Medical
Applications of Artificial Intelligence in Medical
APPLICATIONS OF
ARTIFICIAL
INTELLIGENCE IN
MEDICAL IMAGING
Edited by
Abdulhamit Subasi
Department of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia;
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2023 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and retrieval system, without permission in writing from the
publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found
at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may
be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
information, methods, compounds, or experiments described herein. In using such information or methods they should be
mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any
injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-443-18450-5
A huge thanks to my parents for every time To those who read this book, and appreciate
expecting me to do my best, and telling me I the work that goes into them. If you have any
could accomplish anything, no matter what it feedback, please let me know.
was.
Abdulhamit Subasi
To my wife, Rahime, for her patience and
support.
vii
viii Contents
xi
This page intentionally left blank
Series preface
xiii
xiv Series preface
in electrical, computer engineering and science, many new AI trends in the application of sev-
finance, economy, and security. Nowadays AI eral fields. Among them the most important
is one of the hot topics, which is used in sev- ones are healthcare, cyber security, and
eral data analysis in the world. Hence, there finance.
are many applications areas and audience of
this book series. This book series will include Abdulhamit Subasi
Preface
Artificial intelligence (AI) plays an impor- as deep learning have assisted to uncover
tant role in the field of medical image analysis, information which entirely altered the
including computer-aided diagnosis, image- approach in different areas. AI has reached cer-
guided therapy, image registration, image seg- tain maturity as an academic subject, and there
mentation, image annotation, image fusion, are many useful books related to this subject.
and retrieval of image databases. With Since AI is an interdisciplinary subject, it must
advances in medical imaging, new imaging be implemented in different ways depending
methods and techniques are needed in the field on the application field. Nowadays, there is a
of medical imaging, such as cone-beam/multi- great interest in AI applications in several dis-
slice CT, MRI, positron emission tomography ciplines. This edited book presents how AI and
(PET)/CT, 3D ultrasound imaging, diffuse ML methods can be used in the medical image
optical tomography, and electrical impedance analysis. Different AI applications in different
tomography, as well as new AI algorithms/ fields, including biomedical engineering, elec-
applications. To provide adequate results, trical engineering, computer science, informa-
single-sample evidence given by the patient’s tion technology, medical science, and
imaging data is often not appropriate. It is usu- healthcare, are the applications of AI to pro-
ally difficult to derive analytical solutions or blems in these fields.
simple equations to describe objects such as This book provides the description of vari-
lesions and anatomy in medical images, due to ous biomedical image analyses in several dis-
wide variations and complexity. Tasks in medi- ease detection using AI and can therefore be
cal image analysis therefore require learning used to incorporate knowledge obtained from
from examples for correct image recognition different medical imaging devices such as CT,
(IR) and prior knowledge. This book offers X-ray, PET, and ultrasound. In this way, a
advanced or up-to-date medical image analysis more integrated and, thus, more holistic
methods through the use of algorithms/techni- research on biomedical image analysis may
ques for AI, machine learning (ML), and IR. A contribute significantly to the successful
picture or image is worth a thousand words, enhancement of a single patient’s clinical
indicating that, for example, IR may play a crit- knowledge.
ical role in medical imaging and diagnostics. This book includes several medical image
Data/information can be learned through AI, analysis techniques using AI approaches
IR, and ML in the form of an image, that is, a including deep learning techniques. Deep
collection of pixels, as it is impossible to recruit learning algorithms such as convolutional neu-
experts for big data. ral networks and transfer learning techniques
AI tools have been employed in different are widely used in medical imaging. Medical
areas for several years. New AI methods such image analysis using AI is widely employed in
xv
xvi Preface
the area of medical image classification, seg- methodologies, such as supervised and unsu-
mentation, and detection. The applications of pervised learning. Hence, the key AI algo-
AI in medical imaging are widely used as a rithms are discussed briefly in this chapter.
decision support system for physicians. AI can Relevant PYTHON programming codes and
be used in the diagnosis of different types of routines are provided in each section.
cancers including cervical cancer, ovarian can- Chapter 2 provides lung cancer detection from
cer, breast cancer, prostate cancer, lung cancer, histopathological lung tissue images using AI.
and liver cancer. Chapter 3 provides MRI-based automated
The author of this book has a lot of hands- brain tumor detection by means of AI.
on experience using Python and MATLABs to Chapter 4 presents breast cancer detection
solve real-world problems in the context of the from mammograms using AI. Chapter 5
ML ecosystem. Applications of Artificial includes AI-based breast tumor detection using
Intelligence in Medical Imaging aims to provide ultrasound images. Chapter 6 includes AI-
the readers of various skill levels with the based skin cancer diagnosis. Chapter 7 pre-
knowledge and experience necessary to sents brain stroke detection from CT images
develop useful AI solutions. Additionally, this using deep learning algorithms. Chapter 8 pro-
book serves as a solution manual for creating vides a deep learning approach for COVID-19
sophisticated real-world systems. This pro- detection from CT scans. Chapter 9 includes
vides a structured framework with guidelines, detection and classification of diabetic retinop-
instructions, real-world examples, and code. athy lesions using deep learning. Chapter 10
Additionally, this book benefits from the cru- presents automated detection of colon cancer
cial knowledge that its readers require to com- using histopathological images. Chapter 11
prehend and resolve a variety of ML includes brain hemorrhage detection in CT
difficulties. images utilizing deep learning. Chapter 12 pre-
The book covers different subjects, involving sents AI-based retinal disease classification
cancer diagnosis, including lung cancer, pros- using OCT images. Chapter 13 presents diag-
tate cancer, breast cancer, and skin cancer; nosis of breast cancer from histopathological
COVID-19 detection; histopathological image images with deep learning architectures.
classification; classification of diabetic retinopa- Chapter 14 includes AI-based Alzheimer dis-
thy lesions using CT, MRI, X-ray, and ultra- ease detection using MRI images.
sound; and pathological medical imaging. This
book consists of 14 chapters. Chapter 1 pre- Abdulhamit Subasi
sents topics relevant to the numerous AI
Acknowledgments
First of all, I would like to thank my pub- provided excellent support and did a lot of
lisher Elsevier and its team of dedicated pro- work for this book. Additionally, I would like
fessionals who have made this book writing to thank to Fahmida Sultana for being patient
journey very simple and effortless and many in getting everything necessary completed for
who have worked in the background to make this book.
this book a success. Abdulhamit Subasi
I would like to thank Rafael Teixeira, Linda
Versteeg-Buschman, and Pat Gonzalez, who
xvii
This page intentionally left blank
C H A P T E R
1
Introduction to artificial intelligence
techniques for medical image analysis
Abdulhamit Subasi1,2
1
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 2Department of
Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
Normal
Abnormal
FIGURE 1.1 The building blocks of CAD framework. CAD, Computer-aided diagnostic.
input kinds, there exist two possible output such as removing cancerous tissues from body
types: hierarchical clustering, in which a scans, are an important aspect of medical
nested partition tree is produced, and flat diagnostics. One of the initial steps in image
clustering, also known as partition clustering, recognition is to segment them and discover
in which the objects are divided into disjoint distinct things inside them. This may be
sets. Some methods state that D is a true dis- accomplished through the use of features such
tance matrix, whereas others do not. If we as frequency-domain transformations and his-
have a similarity matrix S, we may transform togram plots [2,24].
it to a dissimilarity matrix by using any mono- Image segmentation is a critical preproces-
tonically decreasing function. The most fre- sing step in computer vision and image recog-
quent technique to describe item dissimilarity nition. Image segmentation, which is the
is through the dissimilarity of their properties. breakdown of an image into a number of non-
Some typical attribute dissimilarity functions overlapping relevant sections with the same
include the hamming distance, city block dis- qualities, is a critical method in digital image
tance, square (Euclidean) distance, and corre- processing, and segmentation accuracy has a
lation coefficient [2,22]. direct impact on the efficacy of subsequent
Clustering is one of the simple techniques activities [25]. Because image segmentation is
employed by humans to accommodate the critical in many image processing applica-
massive quantity of information they get every tions, various image segmentation algorithms
day. Handling each piece of information as a were built during the last few decades.
separate object would be tough. As a result, However, these methods are always being
humans appear to group things into clusters. sought since image segmentation is a difficult
Each cluster then characterizes the precise issue, which necessitates a better solution for
qualities of the entities that form it. As with the successive image processing stages.
supervised learning, it is assumed that all pat- Although the clustering approach was not
terns are described in terms of features that designed specifically for image processing, it
constitute one-dimensional feature vectors. In a is utilized for image segmentation by the com-
number of circumstances, a stage known as the puter vision community. The k-means cluster-
clustering inclination should be present. It cov- ing method, for example, requires previous
ers a few tests that determine if there is a clus- information of the number of clusters (k) to be
tering pattern in the provided data or not. For categorized into. Every pixel in the picture is
example, if the dataset is completely random, iteratively and repeatedly assigned to the
attempting to untangle clusters is futile. cluster, the centroid of which is closest to the
Different feature options and proximity mea- pixel. The centroid of each cluster is identified
surements are available. Clustering criteria and based on the pixels assigned to that cluster.
clustering methods may provide wildly differ- Both the selection of pixel membership in the
ing clustering results [2,23]. clusters and the computation of the centroids
are based on distance calculations. Because it
is straightforward to compute, the Euclidean
1.3.1 Image segmentation with clustering distance is the mostly utilized. The utilization
Images are widely recognized as one of the of Euclidean distance produces error in the
most important approaches of delivering final image segmentation [2,26]. A simple
information. An example would be the usage k-means clustering Python code for image
of images for robotic navigation. Other uses, segmentation is given below.
img = cv2.imread(filepath)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
r, g, b = cv2.split(img)
r = r.flatten()
g = g.flatten()
b = b.flatten()
K=3
attempts=10
ret,label,center=cv2.kmeans(vectorized,K,None,criteria,attempts,cv2.KMEANS_RANDOM_CENTERS)
label = label.flatten()
center = np.uint8(center)
res = center[label.flatten()]
result_image = res.reshape((img.shape))
plt.imshow(result_image)
plt.show()
In this scenario, the classifier provides the 1.4.1.1 Support vector machine
average of the real-valued labels related to the Support vector machine (SVM) is a classifica-
unknown instance’s k-NNs. A test set is tion technique for both linear and nonlinear
employed to assess the error rate of the classi- data. SVM transforms the original training data
fier starting with k 5 1. This method is into a higher dimension via a nonlinear map-
employed as many times as needed by increas- ping. It seeks the linear optimum separation
ing k to achieve one additional neighbor. The k hyperplane inside this new dimension. The
value, which generates the lowest error rate, is SVM employs support vectors and margins to
selected. Distance-based assessments are uti- get hyperplane. Despite the fact that even the
lized by k-NN classifiers, which assign equal quickest SVMs have a long training period, they
weight to each attribute. As a result, when are remarkably precise due to their ability to
given noisy or irrelevant qualities, they may predict complicated nonlinear decision limits.
suffer from low accuracy. However, the They have a lower risk of overfitting than other
approach has been tweaked to include attribute approaches. The discovered support vectors also
weighting as well as the pruning of noisy data serve as a concise representation of the learnt
instances. The distance measure you choose model. SVMs may be used for both classification
can have a big impact. Other distance metrics, and numerical prediction. Medical imaging,
such as the Manhattan (city block) distance, object recognition, handwritten digit recognition,
may also be used. When classifying test and speaker identification are just a few of the
instances, nearest-neighbor classifiers can be domains where they have been used [27].
incredibly slow. The use of partial distance Separating lines can be drawn in an infinite
computations and modifying the stored number of ways. We want to find the “best” one,
instances are two further methods for reducing which will (hopefully) have the least amount of
classification time. The distance is computed classification error on previously unseen dataset.
using a subset of the n characteristics in the How are we going to find the best line? We
partial distance technique. If the distance sur- want to find the optimal hyperplane when we
passes a certain threshold, the process will stop generalize to n dimensions. Regardless of the
working on the current stored instance and go amount of input features, we’ll refer to the deci-
on to the next one. The editing approach gets sion boundary we’re looking for as a “hyper-
rid of any training instances that are not useful. plane.” To put it another way, how can we
Because it minimizes the overall amount of determine the optimum hyperplane? An SVM
instances saved, this strategy is also known as solves this problem by investigating for the max-
pruning [27]. A simple Python code for k-NN imum marginal hyperplane. Nevertheless, we
is given below. anticipate the hyperplane with the bigger margin
of linked units. Each neuronal connection has The selection of an activation function of the
the ability to send a signal from one to the neural network design process is crucial. The
other. The signal(s) can be processed by the necessity to forecast a binary class label drives
receiving neuron, which can then transfer them the selection of the sign activation function in the
to downstream neurons linked to it. Neurons case of the perceptron. Other scenarios in which
have states that are normally characterized by other goal variables might be predicted are also
real numbers, ranging from 0 to 1. Besides, possible. Nonlinear functions such as the hyper-
neurons contain weights, which change as they bolic tangents, sigmoid, or sign can be utilized in
learn, allowing them to change the intensity of different layers. The identity or linear activation
the signal they transmit downstream to other function is the simplest fundamental activation
neurons. They may also have a threshold, with function Φ(.) represented by a linear function:
the downstream signal only being transmitted
if the combined output is below (or above) that ΦðvÞ 5 v
When the target is a real value, the linear application. Conventional neural networks
activation function is frequently utilized at the contain one input layer, two or three hidden
output node. When a smoothed surrogate loss layers, and one output layer. Deep neural net-
function is required for discrete outputs, it is works contain one input layer, many hidden
also employed. The hyperbolic tangent, sig- layers, and one output layer. The greater the
moid, and sign functions were the classic acti- number of hidden layers, the deeper the net-
vation functions utilized early in the work. The layers are linked, with the previous
construction of neural networks. The sign acti- layer’s output becoming the next layer’s input.
vation cannot be utilized to create the loss The network’s performance is determined by
function at training time due to its nondiffer- the weights of its inputs and outputs. Training
entiability, since it may be utilized to map to the network includes specifying the right
binary outputs at prediction time. The sigmoid weights for numerous layers. Deep networks
activation produces a result in the range of (0, need more computational speed, processing
1), which is useful for calculations, which capacity, a huge database, and the right paral-
should be interpreted as probabilities. The tanh lel processing software [20].
function is shaped similarly to the sigmoid Deep learning is an AI area that focuses on
function, with the exception that it is horizon- building huge neural network models capa-
tally and vertically rescaled to [ 2 1, 1]: ble of generating correct data-driven deci-
tanhðvÞ 5 2 sigmoid ð2vÞ 2 1
sions. Deep learning is best suited to
scenarios where the data is complicated and
When the outputs of the calculations must huge datasets are accessible. Deep learning is
be both positive and negative, the tanh func- used in the health-care industry to interpret
tion is preferred to the sigmoid. It is also easier medical images (X-rays, MRI scans, and CT
to train owing of its mean-centering and bigger scans) to diagnose health issues [34]. Based
gradient (due to stretching) as compared to the on the given data, ML algorithms build their
sigmoid. The tanh and sigmoid functions have own logic. The algorithm learns on its own,
been widely used for introducing nonlinearity thus no coding is required to tackle every
into neural networks. However, recently, a problem. A large number of medical images
variety of piecewise linear activation functions must be given into the algorithm in order for
have gained popularity [34]: it to learn to classify. It is supervised learning
if the images have previously been catego-
ΦðvÞ 5 maxfv; 0gðRectified Linear Unit ½ReLUÞ rized and fed. Otherwise, it is unsupervised
ΦðvÞ 5 maxfmin½v; 1; 2 1gðhard tanhÞ learning. The most basic application is cate-
gorizing a pattern into two groups, such as
determining whether a medical image
1.4.9 Deep learning belongs to tumor tissue or not [20]. A simple
It has been discovered that deep neural net- Python code for deep neural network is given
works are best fitted for the image processing below.
Generally, more complicated models with reg- the filter and the spatial area in a layer is
ularization are preferable to simpler models performed at every available location to cre-
without regularization [34]. ate the next layer, in which the activations
keep their spatial connections from the prior
layer. Since every activation in a specific
1.4.11 Convolutional neural networks layer is a function of just a small spatial
Convolutional neural network (CNN) is a region in the preceding layer, the connec-
deep learning network, which has grown in tions in a CNN are relatively sparse. Except
popularity for image categorization. Fig. 1.2 for the final pair of two of three levels, all
depicts the CNN architecture. It is made up layers retain their spatial structure. As a
of an input layer as well as hundreds of fea- result, it is feasible to physically visualize
ture detection layers. Feature detection which elements of a picture impact which
layers conduct one of the three actions. portions of activations in a layer. Lower level
Convolution, pooling, and Rectified Linear layer features capture lines or other primi-
Unit (ReLU) are all terms used to describe tive forms, but higher level layer features
the process of convolution [20]. CNNs are catch more complex shapes. As a result, sub-
biologically inspired networks utilized in sequent layers can generate numbers by
computer vision to classify images and rec- assembling the shapes in these intuitive
ognize objects. A convolution process is characteristics. Furthermore, a subsampling
specified for the convolution layers, where a layer simply averages the data in the local
filter is employed to transfer the activations areas of size 2 3 2 to reduce the geographical
from one layer to the next. A convolution footprints of the layers by a factor of 2.
operation employs a three-dimensional CNNs were the most effective of all forms of
weighted filter with the same depth as the neural networks in the past. They are com-
current layer just a smaller spatial area. The monly employed in image identification,
dot product of all the weights in the filter object detection and location, and even lan-
and any choice of spatial region in a layer guage processing [35].
describes the value of the hidden state in the Convolution processes an image by passing
subsequent layer. The interaction between it through convolution filters, which trigger
procedures in its levels are spatially structured performed just over the area of the layer
and the links between layers are sparse (and wherever the values are defined [35].
carefully planned). Convolution, pooling, and
ReLU are the three types of layers that are typi- 1.4.11.3 Strides
cally seen in a CNN. The activation of the Convolution can also be used to minimize
ReLU is similar to that of a standard neural the image’s spatial footprint in various ways.
network. Furthermore, a final set of layers is The method described above executes convo-
frequently completely linked and translates to lution at every point in the feature map’s spa-
a set of output nodes in an application-specific tial location. However, the convolution does
manner. The CNN’s input data is structured not have to be performed at every spatial
into a two-dimensional grid structure, with place in the layer. The concept of strides may
pixels representing the values of individual be used to decrease the granularity of the con-
grid points. As a result, each pixel in the pic- volution. A stride of one is the most usual,
ture correlates to a certain spatial position. but a stride of two is also rarely utilized. In
However, a multidimensional array values at normal conditions, strides of more than two
each grid place is required to represent the spe- are uncommon. Larger strides can aid with
cific hue of the pixel. We have an intensity of memory constraints or decrease overfitting if
the three primary hues in the RGB color the spatial resolution is excessively high. A
scheme, which are red, green, and blue, respec- bigger receptive field is beneficial to detect a
tively [35]. complex feature in a greater spatial region of
the image. The hierarchical feature engineer-
1.4.11.2 Padding ing method of a CNN captures increasingly
One thing to keep in mind is that the con- complicated forms in later layers.
volution procedure shrinks the (q 1 1)th layer Historically, another process known as max-
in contrast to the qth layer. In general, this pooling has been used to improve the recep-
style of image reduction is undesirable since tive fields [35].
it has a tendency to lose some information
near the image’s edges. Padding can be used 1.4.11.4 The Rectified Linear Unit layer
to remedy this problem. To preserve the spa- The pooling and ReLU operations are com-
tial footprint, padding is used to add (Fq 1)/2 bined with the convolution operation. The
“pixels” all around the edges of the feature activation of the ReLU is similar to how it is
map. In the case of padding hidden layers, done in a standard neural network. Because
these pixels are really feature values. the ReLU is a basic one-to-one mapping of
Regardless of whether the input or hidden activation values, it has no effect on the layer’s
layers are padded, the value of each of these dimensions. The activation function is paired
padded feature values is set to 0. As a result, with a linear transformation with a matrix of
the input volume’s spatial height and breadth weights to produce the next layer of activa-
will both grow by (Fq 21), which is exactly tions in classic neural networks. An ReLU
what they would decrease after the convolu- layer is frequently not clearly displayed in
tion. Because their values are set to 0, the graphical representations of convolution neu-
padded parts have no effect on the final dot ral network designs, and it often follows a
product. Padding allows the convolution convolution operation. It’s worth noting that
operation to be performed with a piece of the the ReLU activation function is a very new
filter “sticking out” from the layer’s bound- addition to neural network architecture.
aries, allowing the dot product to be Saturating activation functions like tanh and
network’s input and hidden layers are sam- gradient-descent strategy is stopped. The size
pled. Dropout is a technique of combining of the parameter space is effectively reduced to
node sampling with weight sharing. a smaller neighborhood inside the starting
Backpropagation is then used by the training values of the parameters when early stopping
procedure to update the weights of the sam- is used. Early stopping functions as a regulari-
pled network utilizing a single sampled zer in this case since it effectively limits the
example. Dropout has the major consequence parameter space [34].
of incorporating regularization into the learn-
ing process. Dropout efficiently introduces
noise into both the input data and the hidden 1.4.11.10 Batch normalization
interpretations by dropping both input and Batch normalization is a relatively new
hidden units. Regularization is a type of noise technique for dealing with vanishing and
addition. Dropout prevents hidden units exploding gradient issues that leads to activa-
from adapting to one other’s features, a pro- tion gradients in consecutive layers to either
cess known as feature coadaptation. Because decrease or increase in magnitude. Internal
the impact of dropout is a masking noise that covariate shift is another significant issue in
eliminates part of the hidden units, this deep network training. The issue is that
method imposes some redundancy between throughout training, the parameters are
the characteristics learnt at the various hid- adjusted, and hence the hidden variable acti-
den units. Increased resilience is the result of vations are adjusted as well. The goal of batch
this form of redundancy. Since every one of normalization is to create features with identi-
the sampled subnetworks is trained with a cal variance by adding extra “normalization
limited number of sampled examples, drop- layers” between hidden layers, which with-
out is efficient. As a result, just the additional stand this sort of actions. This extra node
labor of sampling the hidden units is must be taken into account by the backpropa-
required. However, because dropout is a reg- gation algorithm in order to guarantee the loss
ularization approach, it limits the network’s derivative of layers before the batch normali-
expressive ability. Thus, in order to fully ben- zation layer compensates for the transforma-
efit from dropout, one must employ larger tion entailed with these new nodes. Batch
models and more units. As a result, there is a normalization has the unique characteristic of
hidden computational overhead. Moreover, if acting as a regularizer. It is worth noting that
the initial training dataset is big enough to the same data point might result in slightly
limit the chance of overfitting, the computa- distinct adjustments based on which batch it
tional benefits of dropout may be minor but is in. This impact might be viewed as a form
still noticeable [35]. of noise that is introduced to the updating
process. Adding a tiny bit of noise to the train-
1.4.11.9 Early stopping ing data is a common way to accomplish regu-
Early stopping is a popular type of regulari- larization. Although there is no perfect
zation, where the gradient descent is stopped agreement on this topic, it has been empiri-
after just a few iterations. Keeping a portion of cally noted that regularization approaches
the training data and then assessing the mod- such as dropout do not enhance the perfor-
el’s error on the hold-out set is one technique mance once batch normalization is applied
to determine the stopping point. When the [35]. A simple Python code for CNN is given
error on the hold-out set starts to grow, the below.
#Output Layer
model.add(Dense(3))
model.add(BatchNormalization())
model.add(Activation('softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
1.4.12 Recurrent neural networks The most typical use of recurrent neural
networks (RNNs) is text data.
All neural networks are built to handle multi- 3. Sequences in biological data are often seen,
dimensional data with features that are mainly and the symbols may relate to one of the
independent to each other. Particular data nucleobases or amino acids, which comprise
types, such as biological data, text, and time DNA’s building blocks.
series, do, nevertheless, have sequential rela-
tionships between the properties. The following Particular values in a sequence might be
are some examples of such dependencies: actual or symbolic in nature. Time series is
another term for real-valued sequences. For any
1. The values on successive time stamps in a form of data, RNNs may be employed.
time-series data collection are tightly tied to Symbolic values are more commonly used in
one another. If the values of these time practical applications. The vanishing and
stamps are treated as separate exploding gradient problems are one of the
characteristics, important information about most crucial problems in this field. This prob-
the connections between them is lost. While lem is especially prominent in deep networks,
the values at various time stamps are such as RNNs. Hence, a variety of RNN types
processed separately; however, this causes have been developed, including the gated recur-
information lost. rent unit and long short-term memory (LSTM).
2. Even though text is frequently processed as Image captioning, sequence-to-sequence learn-
a bag of words, the sequencing of the words ing, sentiment analysis, and machine translation
might provide superior semantic insights. In are just a few of the areas where RNNs and
such instances, it is critical to build models their derivatives have been applied [35]. A sim-
that account for the sequencing information. ple Python code for RNN is given below.
1.4.15 Generative adversarial networks unable to tell whether an item is produced syn-
thetically or belongs to the original dataset. The
In generative adversarial networks (GANs), created objects are frequently used to generate
two neural network models are used at the same vast volumes of synthetic data for AI algorithms,
time. First, a generative model generates syn- and they may also be used to augment data.
thetic instances of images, which are comparable Furthermore, by providing context, this
1
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-
networks/.
technique may be used to generate objects with autoencoder. GAN is frequently used to generate
different aspects. The parameters of the genera- image with a variety of contexts. The image setup
tor and discriminator are simultaneously is, without a doubt, the most prevalent use of
updated throughout the training phase of a GANs. The image setting generator is known as a
GAN. The discriminator and the generator are deconvolutional network. As a result, the match-
both neural networks. The generator can be com- ing GAN is also known as a deep convolutional
pared to the decoder part of a variational generative adversarial network (DCGAN) [35]. A
autoencoder. However, the training procedure simple Python code for GAN is given below. For
differs significantly from that of a variational details, check the related KERAS web page2.
# DCGAN generator
def get_generator():
noise_input = keras.Input(shape=(noise_size,))
x = layers.Dense(4 * 4 * width, use_bias=False)(noise_input)
x = layers.BatchNormalization(scale=False)(x)
x = layers.ReLU()(x)
x = layers.Reshape(target_shape=(4, 4, width))(x)
for _ in range(depth - 1):
x = layers.Conv2DTranspose(
width, kernel_size=4, strides=2, padding="same", use_bias=False,
)(x)
x = layers.BatchNormalization(scale=False)(x)
x = layers.ReLU()(x)
image_output = layers.Conv2DTranspose(
3, kernel_size=4, strides=2, padding="same", activation="sigmoid",
)(x)
2
https://keras.io/examples/generative/gan_ada/.
class GAN_ADA(keras.Model):
def __init__(self):
super().__init__()
self.augmenter = AdaptiveAugmenter()
self.generator = get_generator()
self.ema_generator = keras.models.clone_model(self.generator)
self.discriminator = get_discriminator()
self.generator.summary()
self.discriminator.summary()
self.generator_loss_tracker = keras.metrics.Mean(name="g_loss")
self.discriminator_loss_tracker = keras.metrics.Mean(name="d_loss")
self.real_accuracy = keras.metrics.BinaryAccuracy(name="real_acc")
self.generated_accuracy = keras.metrics.BinaryAccuracy(name="gen_acc")
self.augmentation_probability_tracker = keras.metrics.Mean(name="aug_p")
self.kid = KID()
@property
def metrics(self):
return [
self.generator_loss_tracker,
self.discriminator_loss_tracker,
self.real_accuracy,
self.generated_accuracy,
self.augmentation_probability_tracker,
self.kid,
]
# the generator tries to produce images that the discriminator considers as real
generator_loss = keras.losses.binary_crossentropy(
real_labels, generated_logits, from_logits=True
)
# the discriminator tries to determine if images are real or generated
discriminator_loss = keras.losses.binary_crossentropy(
tf.concat([real_labels, generated_labels], axis=0),
tf.concat([real_logits, generated_logits], axis=0),
from_logits=True,
)
# separate forward passes for the real and generated images, meaning
# that batch normalization is applied separately
real_logits = self.discriminator(real_images, training=True)
generated_logits = self.discriminator(generated_images, training=True)
self.augmenter.update(real_logits)
self.generator_loss_tracker.update_state(generator_loss)
self.discriminator_loss_tracker.update_state(discriminator_loss)
self.real_accuracy.update_state(1.0, step(real_logits))
self.generated_accuracy.update_state(0.0, step(generated_logits))
self.augmentation_probability_tracker.update_state(self.augmenter.probability)
# KID is not measured during the training phase for computational efficiency
return {m.name: m.result() for m in self.metrics[:-1]}
self.kid.update_state(real_images, generated_images)
# only KID is measured during the evaluation phase for computational efficiency
return {self.kid.name: self.kid.result()}
fine-tune the deeper layers that are closer to the convolutional layer filters the first convolutional
output layer. The weights (closer to the input) of layer’s response-normalized and pooled output.
the early layers are fixed. While holding the early The third, fourth, and fifth convolutional layers
layers fixed, the purpose for training only the have no intervening pooling or normalizing layers.
deeper layers is that the earlier layers catch only The fully connected layers contain 4096 neurons.
simple features such as edges, whereas the dee- To accomplish classification, AlexNet’s final layer
per layers capture more complex features. For the employs a 1000-way softmax. It is worth noting
application at hand, the simple features do not that the last layer of 4096 activations is frequently
alter too much, while the deeper features might utilized to generate a flat 4096-dimensional repre-
be vulnerable to the desired application [35]. sentation of an image for purposes other than clas-
sification. These characteristics may be extracted
from any out-of-sample image by simply feeding it
1.4.16.1 AlexNet
through the trained neural network. These charac-
It is worth noting that the original AlexNet teristics frequently transfer well to various tasks
architecture had two parallel processing pipelines, and datasets. In most CNNs today, the activation
which are controlled by two GPUs cooperating to function is virtually entirely centered on the ReLU,
create the training model at a faster speed and with which was not the case prior to AlexNet. In order
memory sharing. After each convolutional layer, to enhance generalization, dropout with L2-weight
the ReLU activation function was used, followed decay was utilized [35]. A simple Python code for
by normalization and max-pooling. The second AlexNet is given below.
AlexNet = Sequential()
#Output Layer
AlexNet.add(Dense(3))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation( 'softmax'))
AlexNet.compile(optimizer='adam',loss='categorical_crossentropy' ,metrics=['acc'])
1.4.16.2 Visual geometry group limited part of the image if the network is not
Visual geometry group (VGG) [36] also deep. VGG always employs filters with a spa-
noted the rising trend of increased network tial footprint of 3 3 3 and a pooling size of
depth. The studied networks were built in a 2 3 2. The convolution is performed using
variety of topologies with layer sizes ranging stride 1 and padding of 1. Stride 2 is applied
from 11 to 19, with 16 or more layers being for pooling. Another intriguing feature of
the most effective. VGG’s significant break- VGG’s architecture is that the number of fil-
through was that it lowered filter sizes while ters is frequently doubled after each max-
increasing depth. It is critical to recognize this pooling. The aim is to constantly double the
type of a smaller filter size since it needs a depth when the spatial footprint decreased by
greater depth. A tiny filter can only capture a a factor of two. This design idea leads in some
base_model = tf.keras.applications.VGG16(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
for l in base_model.layers:
l.trainable = False
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])
base_model = tf.keras.applications.ResNet101(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])
base_model = tf.keras.applications.MobileNet(
alpha = 0.75,
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])
1.4.16.5 Inception-v4 and Inception- Inception is tested with many variations. Only
ResNet two of them are discussed in detail here. The
The Inception architecture is highly adjust- first, “Inception-ResNet-v1,” essentially corre-
able, which means that the amount of filters in sponds to the computational cost of Inception-
the various layers can be changed without v3, while “Inception-ResNet-v2” corresponds
affecting the quality of the fully trained net- to the raw cost of the recently proposed
work. To balance the calculation among the Inception-v4 network. Another minor techni-
numerous model subnetworks and improve cal difference between residual and nonresi-
training speed, the layer sizes are adjusted dual Inception variations is that batch
carefully. For Inception-v4, identical selections normalization is utilized only on top of the
are made for the Inception blocks across all standard layers, not on top of the summations,
grid sizes in order to get rid of unnecessary in the case of Inception-ResNet. Although it is
baggage. Cheaper Inception blocks are rational to predict that extensive usage of
employed for the residual versions of the batch normalization would be beneficial, it is
Inception networks than the original aimed to make each model replica trainable
Inception. Each Inception block is followed by on a single GPU. The overall number of
a filter-expansion layer that is used to increase Inception blocks is significantly increased by
the dimensionality of the filter bank before eliminating the batch normalization on top of
adding it to meet the depth of the input crite- those layers. The more usage of computer
ria. This is required to compensate for the resources will eliminate the need for this
reduction in dimensionality caused by the trade-off [39]. A simple Python code for
Inception block. The residual version of Inception-ResNet-V2 is given below.
base_model = tf.keras.applications.InceptionResNetV2(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
model.add(Dropout(02))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy'
metrics=['acc'])
base_model = tf.keras.applications.Xception(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
model.add(Dropout(02))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])
1.4.16.7 Densely connected convolutional layers, which vary the size of feature maps, are
networks an important component of convolutional net-
One benefit of ResNets is that the gradient works. In this architecture, the network is par-
can flow straight via the identity function from titioned into numerous densely linked dense
later layers to previous levels. Yet, the identity blocks to assist downsampling. Layers between
function and the output of H0 are merged blocks are referred to as transition layers
through summing, that might restrict informa- because they perform convolution and pooling.
tion flow in the network. A novel connectivity In testing, a batch normalization layer is
pattern is suggested with direct connections employed with a 1 3 1 convolutional layer, and
from any layer to all following layers, in order a 2 3 2 average pooling layer as transition
to increase information flow between layers layers. DenseNet can have very small layers,
even further. This network design is known as which distinguishes it from conventional net-
a dense convolutional network (DenseNet) due work topologies [41]. A simple Python code for
to its dense connectivity. Downsampling DenseNet121 is given below.
base_model = tf.keras.applications.DenseNet121(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
model.add(Dropout(02))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])
verbose=2,
callbacks=[early_stopping])
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))
1.4.16.8 Feature extraction with pretrained so widespread that training is almost never ini-
models tiated from scratch [35].
Pretrained CNNs from publicly available Many deeper architectures with feed-
resources such as ImageNet are frequently forward topologies include numerous layers
available for usage in different applications and where consecutive transformations of the pre-
datasets. This is accomplished by retaining the ceding layer’s inputs lead to progressively
majority of the pretrained weights in the neural complex data representations. In the output
network with the exception of the final classifi- layer, properly transformed feature interpreta-
cation layer. The final classification layer’s tions are more susceptible to basic sorts of
weights are determined by the dataset at hand. predictions. The nonlinear activations in inter-
The last layer must be trained since the class mediate layers are responsible for this level of
labels in a given situation may differ from those sophistication. Tanh and sigmoid activation
in ImageNet. Nonetheless, the weights in the functions were traditionally the most common
early layers are still valuable since they learn options in the hidden layers, but the ReLU
different sorts of shapes in images which can be activation has gained popularity in recent
used for nearly any type of classification appli- years due to the attractive attribute, which is
cation. In addition, feature activations in the better at avoiding the disappearing and burst-
penultimate layer can be employed for unsuper- ing gradient difficulties. One way to look at
vised applications. It is worth noting that the the division of work between the hidden
usage of pretrained convolutional networks is layers and the final prediction layer is that the
3
https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-
models/.
2
Lung cancer detection from
histopathological lung tissue images using
deep learning
Aayush Rajput1 and Abdulhamit Subasi2,3
1
Indian Institute of Technology, Kharagpur, India 2Institute of Biomedicine, Faculty of Medicine,
University of Turku, Turku, Finland 3Department of Computer Science, College of Engineering,
Effat University, Jeddah, Saudi Arabia
O U T L I N E
is a deadly disease with a 5-year survival rate during that phase. Thus by quitting smoking,
of (18.6%) lower than other cancers such as person can reduce the danger of lung cancer to a
breast and prostate [1]. It is the most common large extent. Keeping a minimum distance from
type of cancer, specifically in the United States, the people diagnosed with cancer is also a way
responsible for 154,050 deaths there alone, to protect ourselves from the cancer-causing
around a quarter of all the deaths are due to agents. Taking a balanced and healthy diet also
cancer in the country [2]. Lung cancer can be helps in reducing the risk factor of lung cancer.
broadly classified into nonsmall-cell lung can- But the reduction in risk factors is way less than
cer (NSCLC) and small-cell lung cancer the increase in it due to smoking. So, first of all,
(SCLC). The NSCLC is the most common type quitting smoking should be the first step.
of lung cancer, which is further classified However, the risk from lung cancer cannot be
into three main categories: adenocarcinoma, zero for anyone as there are instances of people
squamous-cell carcinoma, and large-cell carci- having lung cancer who do not have any high-
noma. The adenocarcinoma begins from the risk factors. The early detection of lung cancer is
mucus-producing cell. Adenocarcinoma is found the most common factor among all the successful
in the outer part of the lungs and can be detected lung cancer treatments. Some types of lung can-
before its spread. It is a common type in other cer start showing the symptoms in the early
types of cancer such as breast, colorectal, and stage of its development, through which it can
prostate. The second one, squamous-cell carci- be detected. The symptoms of lung cancer are
noma, begins its growth from the squamous cells. most likely to be caused by other reasons, but it
It is primarily found in the people who smoke. In is essential to see the doctor so that if lung cancer
the lungs, it affects the central part near the bron- is causing the symptoms, it can be detected and
chus. The third type, large-cell carcinoma cancer, treated in its early phases. The most common
does not involve a particular area in the lungs. It symptoms of lung cancer include a last-longing
can grow in any region and is difficult to diag- cough, rust color phlegm, blood coughing, loss
nose with a faster growth, which makes it more of appetite, breath shortness, the feeling of being
complex to treat. NSCLC has some other types, tired, chest pain, persistent infections such as
but they are not common compared to these pneumonia, and wheezing. Widespread lung
three types, for example, adenosquamous carci- cancer can also affect the other parts of the body,
noma and sarcomatoid carcinoma. SCLC covers which can also cause some symptoms such as
only 10%15% of all lung cancers [3]. bone pain, headache, balance problems or sei-
There are cases in which cancer starts from zures, and yellowing of skin and eyes [5]. These
one part of the body and, after spreading, affects all result from lung cancer spreading to the other
another part of the body, but the cancer type will parts such as the liver and nervous system.
be based on the organ that is affected the first. There are several tests a doctor can use for
For example, breast cancer can sometimes affect testing lung cancer. Tests include imaging
the lungs, but it will not be called lung cancer tests, sputum cytology, and biopsy [6]. An
and the treatment will be to cure breast cancer. X-ray image is used in an imaging test to detect
Till now, no one knows how to prevent the dan- the abnormal mass present in the lung tissues.
ger of lung cancer completely. But there are CT scan can be used to see the more minor
some ways through which the risk factor of can- details which X-ray might not notice. In spu-
cer can be reduced. The first and foremost way tum cytology, the sputum is used and
is to avoid smoking [4]. In the early stage of lung observed under the microscope. The sputum of
cancer, the lung tries to repair the tissue, but the a person with lung cancer contains lung
lung will not improve if the person smokes cancer cells that can be detected under the
under the ROC curve (AUC) score of 0.872 for Chakravarthy et al. [11] used CT lung Dicom
the patch-based classification using augmented images from Lung Image Database Consortium
images. (LIDC) for the study. The MATLAB 2013a soft-
Ausawalaithong et al. [9] used DL and ware was used for the preprocessing. The images
transfer learning to classify lung cancer using were filtered by using the fast type of filter [12]
chest X-ray images. The proposed model pro- for reducing the noise present in the images.
vides a heat map for identifying the location of First, the grayscale lung image was transformed
the lung nodule. The dataset used in the study to binary image type by replacing the more sig-
was taken from more than one source. They nificant than a threshold value to 1 and the rest
used JSRT (Japanese Society of Radiology to 0. Then after the gray level co-occurrence
Technology) datasets consisting of 247 frontal matrices (GLCMs) feature extraction was done.
chest X-ray images with 154 images with lung Then chaotic crow search algorithm (CCSA) was
nodules and 93 images without lung nodules. used to select the GLCM features. A probabilistic
All images have a resolution of 2048 3 2048 neural network was used for training on the
pixels. The data was also taken from the Chest selected features by CCSA. The study results
X-ray14 Dataset containing 112,120 frontal achieved the score of 95%, 85%, 90%, 86.36%, and
chest X-ray images. Every image has a resolu- 94.44% for sensitivity, specificity, accuracy, preci-
tion of 1024 3 1024 pixels. However, the data- sion, and negative predictive value, respectively.
set does not contain any lung cancer images. Sasikala et al. [13] used the CNN for the clas-
This dataset was used to compensate the lung sification of tumors as malignant or benign. The
cancer data that has only 100 cases by first images used were chest CT images taken from
training to recognize nodules, using nodule the LIDC and Image Database Resource
cases as positive and all remaining cases as Initiative (IDRI) [14], consisting of 1000 scans of
negative. The 121-layered densely connected malignant and benign lung tumors. The lung
convolution network (DenseNet121) was used region was first extracted from the images, and
to replace the last fully connected layer with a then slices were segmented to get the tumor
single sigmoid node to get the output probabil- region. In the preprocessing, the median filter
ity. The transfer learning was applied twice in was used. Backpropagation algorithms train the
the study, first for classification as “with nod- model with a Rectified Linear Unit (ReLU) acti-
ule” or “without nodule” and second for vation function and a softmax layer as the out-
classification as “with malignant nodule” and put layer. The data normalization step was not
“without malignant nodule.” The model has performed on the images while giving input to
given the accuracy score, specificity, and sensi- the model for training. The whole study was
tivity score of 74.43% 6 6.01%, 74.96% 6 9.85%, done using MATLAB software. The model gave
and 74.68% 6 15.33%. specificity, sensitivity, and accuracy scores of 1,
Hatuwal et al. [10] used the images from 0.875, and 0.96, respectively.
LC25000 lung histopathological images. Serj et al. [15] also used CNN for lung can-
Images were resized to 180 3 180-pixel resolu- cer diagnosis. They proposed a new DL model
tion, and the pixels were normalized. They for better accuracy and low variance in binary
trained a convolutional neural network (CNN) classification tasks. The dataset used was taken
model having a neural network with three hid- from the Kaggle Data Science Bowl 2017
den layers, an input, and one fully connected (KDSB17) [16]. The proposed model has two
layer. Their model achieved a training accuracy max-pooling layers, a fully body convolution
score and validation accuracy score of 96.11% layer, and one fully connected layer with two
and 97.2%, respectively. softmax units. The input size of the image used
The output range in tanh function is from 21 another classical machine learning model for
to 1, that’s why the output from it is normal- better results. One of the most significant
ized [42]. In the logistic function, the output is advantages of DL is that the DL model can
from 0 to 1, while in the ReLU function, the also be trained using analog data such as
output is continuous. Practically ReLU is faster images with pixel and audio files. The techni-
to compute than the other two mentioned acti- ques used in DL are achieving great success.
vation functions. Tasks that take a lot of time to complete
The ANN is often used for the classifica- through human efforts can be done in little
tion task where the n neurons in the output time using DL techniques. DL is making a
layer represent n classes, and the value of significant amount of contribution toward
each neuron represents the probability class. the betterment of the quality of human life.
The activation used for the output layer will Many companies use DL to show personal-
be softmax. The ANN model can detect the ized recommendations to the user, making
complex features present in the data; hence, the user experience better and making huge
it is more helpful than other models. It has profits. DL makes the training and prediction
been observed that an ANN with only one of the model with significantly less manual
hidden layer with more neurons can model effort. After training the whole process
the most complex functions. One of the becomes completely automatic.
essential features of using DNN is that a pre-
trained model can be used as the initial layers
of another model with the same weights to
extract the input features and thus make the
2.3.3 Convolutional neural networks
model’s training faster and perform better. In CNNs have emerged from the study of the
DL, the last part of the convolution neural brain’s visual cortex. They are used for image rec-
network consists of the ANN layer [43]. ognition since the 1980s. In 1989 LeCun et al. [20]
proposed a method to recognize the handwritten
digits using CNN. With the increase in computa-
tional power and data, CNN can perform very
2.3.2 Deep learning well on image recognition tasks. Even for com-
DL is a subset of machine learning. plex tasks, CNNs can be trained to get fast and
Nowadays, computers are fast enough to accurate results. CNN is powering image search
train these vast networks. A large amount of services, self-driving cars, and more.
data is used to train DL models and increase The most basic block of CNN is the convolu-
the performance of the model. Generally, tion layer. Unlike ANN in CNN, each neuron
with the increase in the quantity of data, the of a layer is not connected to every neuron of
performance of the model increases. DL is the next layer. They are connected to only the
used in many fields such as face recognition, neurons of their respective fields. Through this
speech recognition, etc. In the medical field, architecture, CNN can concentrate on low-level
the use of DL is increasing very rapidly [44]. features in the first hidden layer, and then the
Practitioners use the DL model for the detec- last layers are concentrated on high-level fea-
tion of tumors from X-rays or CT scan tures. This is the reason why CNN performs
images. DL models can automatically extract the task of image recognition so well. A neuron
features from the data, which can train located in the ith row and jth column of a layer
Benign Tissue
Adenocarcinoma
Squamous Cell
Carcinoma
FIGURE 2.1 The general framework for a CNN model. CNN, Convolutional neural network.
X_val = base_model.predict(validate)
y_val = validate.classes
instance can belong to more than one class. such as multilabel decision trees, multilabel ran-
Example of multiclass classification includes dom forests, and multilabel gradient boosting, is
plant species classification and tumor detection. used for this task. Another approach to solving
Algorithms such as KNN [27], decision tree [28], the multilabel classification problem is to use dif-
Naı̈ve Bayes [29], and gradient boosting [30] can ferent models for each class to predict whether
be used for multiclass classification. The general the instance belongs to that class. In this way, the
framework is given in Fig. 2.2. multilabel classification becomes like a binary
In multilabel classification, the algorithms classification, and every model can be used for
used for multiclass classification cannot be used the task. A simple Python code for the classifica-
directly. The modified version of the algorithm, tion is given below.
def get_models():
ANN = Sequential()
ANN.add(Dense(128, input_dim = X_train.shape[1], activation = 'relu'))
ANN.add(BatchNormalization())
ANN.add(Dropout(0.2))
ANN.add(Dense(64, activation='relu'))
ANN.add(Dense(32, activation='relu'))
ANN.add(Dense(16, activation='relu'))
ANN.add(Dense(8, activation='relu'))
ANN.add(Dense(len(train_it.class_indices), activation='softmax'))
ANN.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics =['accuracy'])
KNN = KNeighborsClassifier()
RF = RandomForestClassifier(n_estimators = 50)
ADB = AdaBoostClassifier()
ANN
K-NN
SVM
RF
AdaBoost
XGBoost
Histopathological
Lung Tissue
Dimension
Deep Feature Extraction Classification
Reduction
FIGURE 2.2 The general framework for histopathological image classification using deep feature extraction.
predicted = model.predict(X_test)
print("Test accuracy Score--------->")
print("{0:.3f}".format(accuracy_score(y_test, predicted)*100), "%")
print("F1 Score--------------->")
print("{0:.3f}".format(f1_score(y_test, predicted, average = 'weighted')*100), "%")
print("Recall-------------->")
print("{0:.3f}".format(recall_score(y_test, predicted, average = 'weighted')*100), "%")
print("Precision-------------->")
print("{0:.3f}".format(precision_score(y_test, predicted, average = 'weighted')*100), "%")
plt.subplot(122)
sns.heatmap(cf_matrix_test, annot=True, cmap='Blues')
plt.title("Test Confusion matrix")
plt.show()
TABLE 2.2 Performance of VGG16 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 2.3 Performance of VGG19 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 2.4 Performance of ResNet50 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 2.5 Performance of ResNet101 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 2.7 Performance of MobileNet deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 2.8 Performance of InceptionV3 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 2.9 Performance of InceptionResNetV2 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 2.10 Performance of DenseNet169 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 2.11 Performance of DenseNet121 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision
KNN = KNeighborsClassifier()
RF = RandomForestClassifier(n_estimators = 50)
ADB = AdaBoostClassifier()
VGG19 and VGG16 are similar models, 50 layers, so it has very good weights which
VGG19 has 19 layers, whereas VGG16 has 16. are able to extract many features of an image.
So results are also similar to what obtained Combining it with classical ML models is
using VGG16. Here, best results are obtained giving very good result. The SVM models is
using ANN with test accuracy of 97.38% and
worst are with KNN with value of 83.42%.
ResNet50 is trained on over a million giving highest accuracy score of 98.8% and
images on the ImageNet database and it has worst score is with KNN with test accuracy
score of 87.95%.
MobileNetV2 is 53-layer deep and is also features extracted by MobileNet with a test
trained on the ImageNet database. The resultant accuracy value of 90.97% and worst score is
score obtained using MobileNetV2 are lower than obtained using AdaBoost with a value of
the scores of VGG and ResNet models. Here, the 80.13%.
best score is given by the XGBoost with a value of
90.08% and worst score is given by AdaBoost
with value of 75.15%.
InceptionResNetV2 is a network which is value of 73.33% and the worst accuracy is given
164-layer deep. But the weights of this model are with ANN with a value of 45.82%.
From the above tables, it can be seen that very huge data, computation power, and time.
VGG16, VGG19, ResNet50, and ResNet101 can These models are developed by a team of
extract essential features from the images, and researchers. These models are able to extract
ANN gives the best result among all other mod- every small feature of an image. So these models
els. The ANN used here has the shape of are used to extract import features and then clas-
128 3 64 3 32 3 16 3 8 and is trained for 10 sical ML models ANN, KNN, SVM, Random
Forest, AdaBoost, and XGBoost are trained on AI is helping the medical field to improve
the extracted features to give the output. The and give faster and better results. As we can
output given by pretrained models is flattened to see in the result, the accuracy of the machine
give as input to the ML models. The best test learning model is almost equal to 1. Today,
accuracy score achieved using this method is we have the data and resources to use machine
98.8% which is given by ResNet50 and SVM. learning to automate lung cancer detection.
Through AI, it will be possible for each and
every one to detect cancer in almost no time.
2.5 Discussion Early diagnosis of cancer can be lifesaving in
the initial stages, and the lung is more curable.
The abovementioned results show that all
transfer learning models achieved good perfor-
mance. All the pretrained models are better References
than the self-designed CNN. Pretrained mod- [1] U.S. National Institute of Health, National Cancer
els are also reducing the time to train a model Institute. SEER Cancer Statistics Review, 19752015.
[2] Centres for Disease Control and Prevention, National
on such a vast dataset. Some models such as
Centre for Health Statistics. CDC WONDER On-Line
VGG16, VGG19, ResNet50, and ResNet101 are Database, Compiled from Compressed Mortality File
performing better than other models. The best 19992016 Series 20 No. 2V, 2017.
results have the F1 score of 98.8% given by [3] What is lung cancer? | Types of lung cancer. https://
ResNet50 and SVM Classifier. Models such as www.cancer.org/cancer/lung-cancer/about/what-is.
html (accessed 14.05.21).
Random Forest and XGBoost can fit very well
[4] Facebook and Facebook, 10 Tips for Preventing Lung
on training data. But they are not giving good Cancer, Verywell Health. ,https://www.verywell-
results on test and validation data. SVM can health.com/tips-for-lung-cancer-prevention-2249286.
give more accurate results after training on the (accessed 14.05.21).
features extracted through a pretrained model [5] Healthline, Effects of lung cancer on the body,
,https://www.healthline.com/health/lung-cancer/
than other classical machine learning models.
effects-on-body., May 09, 2017 (accessed 14.05.21).
KNN, AdaBoost, and Random Forest are not [6] How to detect lung cancer | Lung cancer tests.
be able to give accurate results in most cases. ,https://www.cancer.org/cancer/lung-cancer/
Pretrained models are extracting important fea- detection-diagnosis-staging/how-diagnosed.html.
tures with their weights which is enabling sim- (accessed 14.05.21).
[7] Cancer Support Community, Coping with side effects
ple classical ML models to give very good
of lung cancer treatment, ,https://www.cancersup-
results on the complex dataset. For making portcommunity.org/article/coping-side-effects-lung-
own designed CNN model able to make accu- cancer-treatment. (accessed 14.05.21).
rate predictions, a lot of computation power [8] T. Atsushi, T. Tetsuya, K. Yuka, F. Hiroshi,
and time is required so it is always easy and Automated classification of lung cancer types from
cytological images using deep convolutional neural
better to just load the pretrained models.
networks, BioMed. Res. Int. 2017 (2017) 16. Available
from: https://doi.org/10.1155/2017/4067832.
[9] W. Ausawalaithong, A. Thirach, S. Marukatat, T.
2.6 Conclusion Wilaiprasitporn, Automatic lung cancer prediction from
chest X-ray images using the deep learning approach, in:
Lung cancer is a very deadly disease, and 2018 11th Biomedical Engineering International
everyone must take precautions to avoid every Conference (BMEiCON), Nov. 2018, pp. 15. Available
from: https://doi.org/10.1109/BMEiCON.2018.8609997.
possibility of having lung cancer. Smoking [10] B. Hatuwal, H. Thapa, Lung cancer detection using
should be avoided, and a healthy and balanced convolutional neural network on histopathological
diet should be taken. images, Int. J. Comput. Trends Technol. 68 (2020)
[39] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. [44] S. Nagpal, M. Kumar, M.R. Maruthi, Ayyagari,
Chen, MobileNetV2: inverted residuals and linear bot- Kumar, A survey of deep learning and its applica-
tlenecks, ArXiv180104381 Cs, ,http://arxiv.org/abs/ tions: a new paradigm to machine learning, Arch.
1801.04381., Mar. 2019 (accessed 14.04.21). Comput. Methods Eng. (2019). Jul.
[40] S. Ghosh, A. Dasgupta, A. Swetapadma, A study on [45] R. Khandelwal, Convolutional neural network: feature
support vector machine based linear and non-linear map and filter visualization, Medium (2020).
pattern classification, in: 2019 International Available from: https://towardsdatascience.com/con-
Conference on Intelligent Sustainable Systems (ICISS), volutional-neural-network-feature-map-and-filter-
Feb. 2019, pp. 2428. Available from: https://doi. visualization-f75012a5a49c. May 18.
org/10.1109/ISS1.2019.8908018. [46] L. Alzubaidi, et al., Review of deep learning: concepts,
[41] M.-C. Popescu, V. Balas, L. Perescu-Popescu, N. CNN architectures, challenges, applications, future
Mastorakis, Multilayer perceptron and neural net- directions, J. Big Data 8 (1) (2021) 53. Available from:
works, WSEAS Trans. Circuits Syst. 8 (2009). Jul. https://doi.org/10.1186/s40537-021-00444-8.
[42] S. Sharma, S. Sharma, A. Athaiya, Activation functions [47] K. Weiss, T.M. Khoshgoftaar, D. Wang, A survey of
in neural networks, Int. J. Eng. Appl. Sci. Technol. 04 transfer learning, J. Big Data 3 (1) (2016) 9.
(2020) 310316. Available from: https://doi.org/ Available from: https://doi.org/10.1186/s40537-
10.33564/IJEAST.2020.v04i12.054. May. 016-0043-6.
[43] E. Grossi, M. Buscema, Introduction to artificial [48] I.T. Jolliffe, J. Cadima, Principal component analy-
neural networks, Eur. J. Gastroenterol. Hepatol. 19 sis: a review and recent developments, Philos.
(Jan. 2008). Available from: https://doi.org/ Trans. R. Soc. Math. Phys. Eng. Sci. 374 (2065)
10.1097/MEG.0b013e3282f198a0. (2016) 20150202.
3
Magnetic resonance imagining-based
automated brain tumor detection using
deep learning techniques
Abhranta Panigrahi1 and Abdulhamit Subasi2,3
1
Department of Biotechnology and Medical Engineering, National Institute of Technology Rourkela,
Rourkela, India 2Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland
3
Department of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
diagnosed with brain or CNS tumors in 2021. classification task [6]. The success of convolu-
Brain and nervous system cancer is the 10th tional neural networks (CNNs) led the way to
leading cause of death for men and women. their extensive use in diagnosis via medical
An estimated 18,600 adults will die from pri- imaging [911]. Since training a CNN from
mary CNS cancer [1]. Due to the severity of scratch to achieve state of the art performance
the problem, the early diagnosis of a brain tumor requires high computational resources, the
is extremely important for its treatment. Since usage of transfer learning has increased expo-
the invention of X-rays in 1895 by Wilhelm nentially. This has resulted in the adoption of
Roentgen, various medical imaging techniques transfer learning techniques to detect and seg-
have been used to diagnose brain tumors. ment brain tumors [12] and classify [13] them
German neurosurgeon, Fedor Krause used for faster and more efficient diagnosis. In this
X-rays extensively to detect brain tumors. Over chapter, we try to provide a detailed assess-
time, many sophisticated and advanced diagno- ment of various deep learning methods for the
sis techniques were invented. These included detection of brain tumors and evaluate the var-
imaging techniques such as computed tomogra- ious pros and cons of each method, which will
phy (CT) scan [2], positron emission tomography enable faster diagnosis of brain tumors.
scan [3], and various invasive methods such as Overview of the MRI-based brain tumor detec-
biopsy. One of the most significant break- tion is given in Fig. 3.1.
throughs in brain tumor detection came with the
development of magnetic resonance imaging
(MRI). It is a noninvasive medical imaging tech- 3.2 Literature survey
nique that uses strong radio magnetic waves to
generate an accurate image of the required Extensive work has been done in the field of
organ. MRI generates a more detailed picture of deep learning and big data analysis and their
the brain and does not involve radiation. applications in the diagnosis of brain tumors.
Intravenous gadolinium-enhanced MRI is typi- Most of the approaches involve the segmenta-
cally used to help create a clearer picture of the tion of the brain tumor via CNNs. Özyurt et al.
brain tumor [4]. With the advent of MRI technol- [14] conducted a study, which proposed a
ogy, different variations of MRI imaging started hybrid method using neutrosophy and convo-
to play a vital role in the image-based diagnosis lutional neural network. It aimed to classify
of brain tumors. Some of the techniques of MRI brain tumors as benign or malignant. CNNs
imaging that are prevalent in modern medicine were used to extract features from the seg-
are fluid attenuated inversion recovery, mented brain tumor images and then the
T1-weighted precontrast, T1-weighted postcon- tumors were classified as malignant or benign
trast (Gd), and T2-weighted. using support vector machine (SVM) and
With the advent of technology, various K-nearest neighbor (KNN) classifiers. Jalali and
computational techniques proved to be useful Kaur [15] gave a detailed comparative sum-
in medical diagnosis. The advent of deep learn- mary of various methods and works for auto-
ing [5] has revolutionized the field of computer matic brain tumor detection using medical
vision and natural language processing. imaging. Their work involved the description
Various models such as AlexNet [6], ResNet of various methods such as deep belief net-
[7], and GoogleNet [8] have given near human- works, recurrent neural networks, KNNs,
like accuracy in the ImageNet image SVMs, and many others. Deb and Roy [16]
proposed a novel segmentation and classifica- the BRATS dataset and performed a binary classi-
tion algorithm for brain tumor detection. They fication on the MRI scans and predicted which
proposed a system that uses an adaptive fuzzy scans had tumors and which did not.
deep neural network with frog leap optimiza- Vallabhaneni and Rajesh [19] worked on auto-
tion to detect abnormalities in an image and matic brain tumor detection in noise-corrupted
then the abnormal image was segmented using images. Noise in medical imaging can seriously
an adaptive flying squirrel algorithm. Ambily hamper the ability of automated algorithms to
and Suresh [17] proposed an integrated model correctly detect abnormalities. To tackle this prob-
of CNN and transfer learning automated lem, they performed denoising using Edge
binary classification of MRI images. They used Adaptive Total Variation Denoising Technique.
a 16-layer pretrained network to distinguish This technique preserves the edges of an image
normal and abnormal images. while denoising. The denoised images were then
Amin et al. [18] proposed a deep learning segmented using mean shift clustering and fea-
model to predict input MR slices as unhealthy/ ture extraction was performed using gray-level
healthy based on the presence of tumors. Multiple cooccurrence matrix. A multiclass SVM classifier
image processing techniques were used to make was then used to detect the tumor present in the
the tumors more prominent. The MR slices were images.
segmented by applying optimal thresholds to clus- Rai et al. [20] proposed a hybrid deep CNN
ter the similar pixels together. These segmented model for an automated prediction and segmen-
slices were then sent to a two-layer stacked sparse tation of brain tumors from MRI images. They
autoencoder model. They trained the model on used a dataset with 3929 images, including 1373
images with tumors and 2556 images without The research goal here was to use a range of
tumors. The CNN model was evaluated using techniques to classify MRI scans into two catego-
Jaccard index, DICE score, F1 score, accuracy, ries—scans with tumors and scans without
precision, and recall. It was benchmarked against tumors.
UnetResnet-50 and vanilla U-Net and other
state-of-the-art techniques. They were able to
show excellent results with 99.7% accuracy. 3.3 Deep learning for disease detection
As it can be seen from the discussion above,
most of the papers focused on the segmentation 3.3.1 Artificial neural networks
of tumors use algorithms such as clustering or
autoencoders. Some of the authors have used CT Artificial neural networks (ANNs) are function
scans while the most commonly used imaging approximators that were designed to resemble a
technique is MRI. In this chapter, we aim to pro- primitive idea of the human brain. They are very
vide an exhaustive study of several techniques useful while predicting extremely nonlinear rela-
ranging from transfer learning to deep feature tions. They consist of layers of artificial neurons.
extraction for the detection of brain tumors. Fig. 3.2 shows a simple artificial neuron.
The layers of neurons that accept the input
data are called as the input layers (Fig. 3.3A).
The layers of neurons that give the final predic-
tion are called as the output layers (Fig. 3.3C),
and all the layers in between are called as the
hidden layers (Fig. 3.3B). They consist of neu-
rons and synapses. Each neuron has some data
(x) and some bias (B). Each synapse has a
weight (W) and each layer of neurons has an
activation function (f). The output (o) of each
layer is governed by:
o 5 fðx W 1 BÞ
FIGURE 3.2 Artificial neuron.
FIGURE 3.8 How a computer “sees”? Visualizing the feature maps of an image retrieved from an ResNet50 model. Fig. 3.8A
shows the input image. Fig. 3.8B shows some of the features extracted in the very first layer of the network. Fig. 3.8C shows the
features extracted in the 12th layer of the network. Fig. 3.8D shows the features extracted in the 25th layer of the network.
Fig. 3.8E shows the features extracted in the last convolutional layer of the network. This visualization clearly shows that the edge
information are the most commonly extracted features in CNNs. As the number of layers increase, the features get more abstract.
FIGURE 3.9 Overview of deep feature extraction for classification of MRI scans. MRI, Magnetic resonance imagining.
model_feat = Model(inputs=base_model.input,outputs=predictions)
#Extracting the train, test and validation features from the respective data
train_features = model_feat.predict(X_train_prep)
val_features=model_feat.predict(X_val_prep)
test_features=model_feat.predict(X_test_prep)
FIGURE 3.10 Visualization of the features extracted for totally different images in the first layer of ResNet50 model. It
shows the features extracted for an MRI scan of a brain. As it can be clearly observed, the first layers of the ConvNet
mainly extract the basic features such as edge and contour information. Hence, the parameters of the first few layers of a
deep neural network can be frozen and the features extracted from these can be used to train smaller and more task-
specific neural network to achieve near state-of-the-art performance in various tasks. MRI, Magnetic resonance imaging.
ConvNet are frozen and then a smaller neural networks were frozen and a dense neural net
network is trained from scratch on the fea- was created that accepted the flattened output of
tures extracted by the frozen layers to give the the pretrained CNN. The dense network con-
final result (Fig. 3.11). sisted of a dense layer of dimension 256 3 1. The
In this chapter, 11 different pretrained CNNs output of the dense network was then passed
were used for transfer learning. These are the through a batch normalization layer which was
same networks that were mentioned in then passed through an ReLU activation func-
Section 3.4.1. The parameters of the layers of the tion. After a dropout layer, the final layer
FIGURE 3.13 (A) shows the images without brain tumor and (B) shows the images with brain tumor.
FIGURE 3.14 (A) shows a batch of images without tumor and (B) shows a batch of images with tumors. This
figure shows the varying dimension of the images in the dataset and highlights the requirement of resizing the images.
Before resizing the images, the images were found from these images. After finding the
cropped to avoid distortions. As Fig. 3.15 contours, the extreme points in the contours
clearly shows, the brain occupied varying areas were calculated to get the edge of the brain in
for different images, that is, the black areas the image. Then a bounding box was con-
were different across all images. For this rea- structed with the middle of each side being the
son, the images were cropped to make sure calculated extreme point of the contour. The
that the models performed well. The first step image was cropped according to the edges of
for cropping the images was to find the con- the bounding box. These steps are clearly
tours of the brain in the MRI scan. To do this, shown in Fig. 3.15.
first, a Gaussian blur was applied on the image This step was important to make sure that
to reduce the noise. Then appropriate thresh- the brain occupies maximum area in the scan.
olding and dilation was used to remove small After this step, the images were resized to be
regions of noisy scans. Then contours were of dimension 224 3 224.
return np.array(set_new)
FIGURE 3.16 Sample of an image before augmentation (A) and the resulting images after a few image augmentations
are applied (B). This figure shows how image augmentation helps to generate more data which leads to robust models.
Since there are only 960 samples available and horizontally (Fig. 3.16). All of these aug-
for training, various image augmentations mentations were performed to make the
were used to increase the number of training models more robust. A series of experimen-
samples. The augmentations that were per- tations were also performed to compare the
formed on the image were: Random rotation performance of the models on the aug-
in a 15-degree range, shifting the image mented and nonaugmented images to study
along the width and the length, rescaling the the effect of number of training samples on
image, shearing, varying the brightness of the machine learning algorithms. All the
the image, and flipping the image vertically observations are showed in Section 3.4.7.
train_generator = train_datagen.flow_from_directory(TRAIN_DIR,
color_mode='rgb', target_size=IMG_SIZE, batch_size=32,
class_mode='binary', seed=RANDOM_SEED)
3.4.6 Performance evaluation metrics are predicted correctly, it is called as a true nega-
Performance evaluation metrics are very tive or TN. When positive labels are misclassi-
important while dealing with the problems fied as negative labels or vice-versa, it is called as
related to health care. Since different perfor- false negative (FN) and false positive (FP),
mance metrics mean different things and respectively. These values give us a very clear
choosing a wrong performance metric to evalu- idea about the performance of the model.
ate the model can result in false claims, which After getting the confusion, the accuracy of the
can ultimately lead to a health disaster. Hence, prediction was calculated. This is defined as the
in this chapter, the models were evaluated on ratio of correct prediction to the total number of
the most common performance metrics to get predictions. Since the data was balanced, accuracy
an idea about their robustness and is a good indicator of the performance of the
performance. model. To get an even greater insight to the per-
First, a confusion matrix was plotted for all formance, the precision, recall, F1 score, Cohen’s
the models to know the number of correct pre- Kappa score, and the AUC score were calculated
diction and wrong prediction. When positive and reported. All these metrics, combined
labels are predicted correctly, it is called as a true together, gave a clear indication about the perfor-
positive or TP. Similarly, when negative labels mance of the models.
3.4.7 Experimental results Cohen’s Kappa score, and AUC score when
the features were extracted by VGG19 [26]
While experimenting, many different pre- model. The training accuracy of the XGBoost,
trained architectures were used to perform the Random Forest, and ANN was 100%, while the
classification. But, to check the performance on test accuracy was around 85%. This showed
CNNs trained from scratch, various CNNs that the classifiers were not generalizing well
with different number of layers were trained as on the unseen data when the features were
mentioned in Section 3.4.3. As Table 3.1 clearly being extracted by the VGG19 [26] model
shows that the overall performance of the mod- (Table 3.5).
els was higher when image augmentations As Table 3.5 shows, an ANN classifier per-
were used to increase the amount of data. It formed the best when an ResNet50 [7] model
was also seen that larger models fit better to was used for extracting the features. Overall
larger datasets as the CNN with seven layers test accuracy, F1 score, Kappa score, and AUC
performed the best when image augmentations score were higher when the amount of training
were applied, whereas the CNN with just three data was increased with the help of augmenta-
layers performed the best when no image aug- tions. In spite of performing the best, these
mentations were applied. The overall accuracy, models failed to generalize well and ended up
F1 score, and Kappa score are higher when the overfitting on the training data (Table 3.6).
number of training samples is higher. As observed from Table 3.6, ANN/MLP
Next series of experiments were with trans- classifier performed the best when the features
fer learning. Since the shallow network used to were extracted using an ResNet101 [7] model.
fine-tune the pretrained networks were the When the number of training samples was
same for all the models, the performance dif- increased by augmentations, there was a clear
ference was solely based on the pretrained increase in the performance of the classifiers
model’s abilities to extract features from the with the test accuracy being over 1% higher.
MRI scans (Table 3.2). When augmentations were not used, SVM clas-
As observed, ResNet101 [7] model performed sifier performed similar to the ANN/MLP clas-
the best with and without the image augmenta- sifier (Table 3.7).
tions. Since the training accuracy was 100%, it When MobileNet-v2 [27] was used to extract
indicated to possible overfitting, but as the test the features for classification, it was observed
accuracy was also above 99.5%, it was concluded that the classifiers gave a better result on the
that the models were generalizing well. nonaugmented data. Although the difference
For classification using deep feature extrac- was very small, ANN/MLP classifier per-
tion, as mentioned in Section 3.4.1, 11 different formed the best on nonaugmented data with
neural networks and 6 different classifiers MobilNet-v2 feature extractor (Table 3.8).
were used (Table 3.3). As observed from Table 3.8, ANN/MLP
When the features were extracted using classifier again emerged as the winner in terms
VGG16 [26] model, ANN classifier performed of test accuracy when the features were
the best, but the SVM classifier had the highest extracted from the augmented data using a
test accuracy. However, with the nonaugmented pretrained MobileNet [28]. When the features
data, XGBoost and Random Forest performed were extracted form a smaller nonaugmented
equally well with similar test accuracies and F1 dataset; however, Random Forest had the high-
score (Table 3.4). est test accuracy. It was also observed that all
As observed from Table 3.4, XGBoost classi- classifiers ended up overfitting on the training
fier had the highest test accuracy, F1 score, data (Table 3.9).
TABLE 3.1 Details of performance of the convolutional neural networks (CNNs) that were trained from scratch on
this data.
(A)
Classifier Training accuracy Validation accuracy Test accuracy F1 measure KAPPA AUC
Table 3.1A shows the performance result of the CNNs on the data after image augmentations were applied and Table 3.1B shows the
perforce on the smaller dataset without image augmentations.
When the features of the data are extracted seen in Table 3.9 and Table 3.10. The introduction
using an Inception-V3 [29] network, Random of residual connections increased the overall per-
Forest classifier performed the best on almost formance of the models (Table 3.11).
all the metrics. The test accuracy was barely XGBoost classifier emerged as the clear win-
higher when augmentations are not performed. ner in terms of all the metrics when the features
It was also observed that all the models tended are extracted from the augmented data using a
to overfit on the training data and hence this DenseNet169 [31] model. When nonaugmented
led to poor generalization (Table 3.10). data was used, the overall test accuracy of all the
Random Forest classifier performed the best in classifiers was decreased by over 1%. With a
all the parameters when the features were training accuracy of 100% and a test accuracy of
extracted using Inception-ResNet-v2 [30] from 82.33%, it was very clear that the model was
the augmented data. With a test accuracy of overfitting on the training features (Table 3.12).
85.83%, it was higher by 2%. While ANN/MLP When DenseNet-121 [31] was used to extract
classifiers performed the best in ResNets, the features from the augmented data, XGBoost
introduction of Inception modules in the network model performed the best with 82.17% test accu-
tilted the balance toward tree-based models as racy and an F1 score of 0.82. When image
Classifier Training accuracy Validation accuracy Test accuracy F1 measure KAPPA AUC
Table 3.2A contains the performance of the networks on the dataset after image augmentations were applied and Table 3.2B contains the
performance of the models on the dataset without any image augmentations.
augmentations were not used, the performance somewhat increased when image augmenta-
of all the models saw a significant drop and tions were not used.
ANN/MLP classifier performed the best among
all with an 80% test accuracy (Table 3.13).
When a network with depth separable con- 3.5 Discussion
volution was used, ANN/MLP classifier per-
formed the best with and without any data The series of experiments revealed the perfor-
augmentations. Test accuracy of the models mance of various machine learning algorithms
TABLE 3.3 The performance of various classifiers when the features were extracted using VGG16 [26].
(A)
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 81.67% 82.33% 0.82 0.65 0.82 0.82 0.8233
(B)
Random Forest 100% 84.83% 86.67% 0.87 0.73 0.87 0.87 0.8666
Table 3.3A shows the performance when data augmentation was used, and Table 3.3B shows the performance when data augmentation was not used.
TABLE 3.4 Performance metrics of classifiers on the features extracted by a pretrained VGG19 [26] network.
(A)
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 85.00% 85.33% 0.85 0.71 0.85 0.85 0.8533
(B)
Random Forest 100% 82.17% 81.33% 0.81 0.63 0.81 0.82 0.8133
Table 3.4A shows the performance of the classifiers on the dataset with augmentations and Table 3.4B shows the performance on the dataset without augmentations.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 86.67% 84.50% 0.84 0.69 0.84 0.85 0.845
(B)
Random Forest 100% 85.50% 82.83% 0.83 0.66 0.83 0.83 0.8283
Table 3.5A shows the performance of the classifiers when the number of training samples were increased using image augmentations. Table 3.5B shows the performance of
the classifiers when image augmentations were not used.
TABLE 3.6 Performance of various classifiers trained on the features extracted by an ResNet101 [7] model.
(A)
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 84.83% 84.33% 0.84 0.69 0.84 0.84 0.8433
(B)
Random Forest 100% 86.83% 86.50% 0.86 0.73 0.86 0.87 0.865
Table 3.6A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.6B shows the
performance on the data without any image augmentations.
TABLE 3.7 Performance of various classifiers trained on the features extracted by a MobileNet-v2 [27] model.
(A)
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 75.83% 76.67% 0.77 0.53 0.77 0.77 0.766
(B)
Random Forest 100% 80.83% 76.50% 0.76 0.53 0.76 0.77 0.765
Table 3.7A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and
Table 3.7B shows the performance on the data without any image augmentations.
TABLE 3.8 Performance of various classifiers trained on the features extracted by a MobileNet [28] model.
(A)
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 76.17% 77.00% 0.77 0.54 0.77 0.77 0.77
(B)
Random Forest 100% 80.67% 77.50% 0.77 0.55 0.78 0.78 0.775
Table 3.8A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.8B shows the
performance on the data without any image augmentations.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 73.83% 77.83% 0.78 0.56 0.78 0.78 0.7783
(B)
Random Forest 100% 76.67% 78.00% 0.78 0.56 0.78 0.78 0.78
Table 3.9A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.9B shows the
performance on the data without any image augmentations.
TABLE 3.10 Performance of various classifiers trained on the features extracted by an Inception-ResNet-v2 [30]
model.
(A)
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 82.33% 85.83% 0.86 0.72 0.86 0.86 0.8583
(B)
Random Forest 100% 80.50% 81.17% 0.81 0.62 0.81 0.81 0.8116
Table 3.10A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.10B shows the
performance on the data without any image augmentations.
TABLE 3.11 Performance of various classifiers trained on the features extracted by a DenseNet169 [31] model.
(A)
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 81.00% 82.17% 0.82 0.64 0.82 0.82 0.8216
(B)
Random Forest 100% 79.50% 80.17% 0.8 0.6 0.8 0.8 0.8016
Table 3.11A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.11B shows the
performance on the data without any image augmentations.
TABLE 3.12 Performance of various classifiers trained on the features extracted by a DenseNet121 [31] model.
(A)
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 81.33% 81.50% 0.81 0.63 0.82 0.82 0.815
(B)
Random Forest 100% 81.33% 79.67% 0.8 0.59 0.8 0.8 0.7966
Table 3.12A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.12B shows the
performance on the data without any image augmentations.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC
Random Forest 100% 79.50% 77.50% 0.77 0.55 0.78 0.78 0.7749
(B)
Random Forest 100% 79.50% 83.17% 0.83 0.66 0.86 0.83 0.8316
Table 3.13A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.13B shows the
performance on the data without any image augmentations.
and deep learning techniques while classifying accuracy was higher when image augmentations
normal (without tumor) and abnormal (with were used to increase the number of training
tumor) brain MRI images. From the observa- samples. This goes on to show how important
tions, it is very clear that on an average, the algo- data is for deep learning models to learn.
rithms perform better for augmented data. This Without sufficient data, the models fail to extract
is because, higher number of data points lead to meaningful patterns and hence the performance
higher knowledge of the model about the data. decreases. This trend is clearly shown in Fig. 3.18.
This is because machine learning algorithms are The test accuracy of IncpetionNet-v3 and
very data centric and require a huge amount of InceptionResnetv-2 is higher when data aug-
information to perform tasks. mentation is not performed, but their overall
As evident from Fig. 3.17, all the machine performance is lower when compared to other
learning classifiers had 1% higher test accuracy models. These can be outlier cases as in
on an average when the features were collected machine learning, it is a general rule that high-
from the bigger dataset which was created er amount of data leads to better generalization
after applying image augmentations. Since the of models (Fig. 3.19).
dataset used was balanced, the “test accuracy” When inception modules are introduced in
metric holds the most importance. If the data- the network, the test accuracy of the model
set were unbalanced, then F1 and AUC scores drops to 70%80%. A similar drop in perfor-
would be of more importance as the accuracy mance was also observed for all the other perfor-
would give an illusion of performance. mance metrics. This shows that InceptionNets
When transfer learning was used to classify are not well suited for problem. It could also be
the MRI scans, it was observed that the test concluded that further tuning and a larger
80
70
60
Test Accuracy
50
augmented data
40 no augmentations used
30
20
10
0
ann knn svm rf ada xgb
FIGURE 3.17 Comparison of the performance of the models on the features from augmented and nonaugmented data.
0.8
0.6
Test Accuracy
0.4
augmented data
no augmentations used
0.2
0.0
50 G16 G19 -V3 et 9 1 2 V2 101
et G G n i l eN t16 t 12 et-v et- t-
N o e e
Re
s V V
p ob eN eN esN ileN sNe
In
ce M ns ns nR Mob Re
De De po
ce
In
FIGURE 3.18 Comparison of various models for transfer learning on augmented and nonaugmented data.
0.95
0.90
acc
0.85
0.80
aug
0.75 augmented data
no augmentations used
0 6 9 V3 et 9 1 2
t-V
2
01
t5 G1 G1 n- eN 16 12 t-v
sN
e
VG VG o b i l et
N et
Ne N e
et-1
Re p o eN e
Re
s ile N
ce M ns ens n ob R es
In D e D o M
cep
In
(A)
1.00
0.95
0.90
f1
0.85
0.80
aug
0.75 augmented data
no augmentations used
50 6 9 v3 et 69
et G1 G1 n- 21 t-v
2
t-v
2 01
N VG G o i l eN et1 e t1 e e et-1
Re
s V p ob se
N
se
N sN leN sN
nce M e n en nRe o bi Re
I D D o M
p
nce
I
(B)
FIGURE 3.19 Comparison of the test accuracies of various models trained by transfer learning. (A) shows the test accu-
racies of various models and (B) shows the F1 score of various models.
dataset could have improved the performance of completely dependent on the features them-
inception nets. ResNets and VGG Nets per- selves. The features were a mapping of a vector
formed the best with ResNet50 having a test in the high-dimensional space (the image) to a
accuracy of 99% and an F1 score of 0.9933 and vector in a lower dimensional space (the latent
VGG16 having a test accuracy of 99% and an F1 representation / features). It was observed that
score of 0.9933. Similarly, the bigger ResNet101 the classifiers had the highest average test
and VGG19 have test accuracies of 99.5% and accuracy on the features extracted by ResNets.
99% and F1 score of 0.9949 and 0.9883, respec- The average test accuracy of classifiers trained
tively. So, networks with residual connections on the features extracted by ResNet101 from
ended up performing the best in transfer learn- the augmented dataset is 85.193%, whereas the
ing on this dataset. test accuracy of the classifiers trained on the
Deep feature extraction helped in determin- features extracted by ResNet50 from the aug-
ing which models preserve the maximum data mented dataset is 84.61%. Fig. 3.20 also shows
about the original image in their latent repre- that the features extracted form a smaller non-
sentation. This meant that we could figure out augmented dataset. The classifiers still perform
which network maps the original 224 3 224 3 3 the best on the features extracted by ResNets
dimensional space to a smaller 64 3 1 dimen- with ResNet50 having an average of 84.27%
sional space. Since the same classifiers with the and ResNet101 having an average of 84.38%
exact same hyperparameters were used to (Fig. 3.21).
classify the images from the extracted features, When CNNs of varying depths were trained
the difference in accuracies completely was on the data from scratch, it was observed that
84
82
Test Accuracy
80
78
76
aug
74
augmented data
no augmentations used
0 6 9 3 t 21 2
Ne
9
01
et
2
t5 G1 G1 -V t-V
16
-v
t1
nN
e n e
-1
l Ne
et
sN G G o i e
et
et
V V ob eN
o
sN
Re ile
eN
ep
sN
ns
Re
ep
c M ob
ns
Re
In e
on
Xc
D
De
M
p
ce
In
Network
FIGURE 3.20 Average test accuracies of all the classifiers on the features extracted by various deep learning models.
This plot shows which network represents the original data the best.
0.9
Test Accuracy
0.8
0.7
0.6
aug
augmented data
0.5 no augmentations used
N
N
N
N
C
C
d
d
re
re
re
re
re
re
re
ye
ye
ye
ye
ye
ye
ye
la
la
la
la
la
la
la
2
8
Network
FIGURE 3.21 Test accuracies of various CNNs that were trained from scratch on the dataset. CNNs, Convolutional
neural networks.
82
80
78
acc
76
74
aug
augmented data
72 no augmentations used
n n m rf a b
an kn sv ad xg
algo
FIGURE 3.22 Average test accuracies of the machine learning classifiers on the features extracted by various convolu-
tional neural networks.
4
Breast cancer detection from
mammograms using artificial intelligence
Abdulhamit Subasi1,2, Aayush Dinesh Kandpal3, Kolla Anant Raj4
and Ulas Bagci5
1
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 2Department of
Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia 3Department of
Metallurgical and Materials Engineering, National Institute of Technology Rourkela, Rourkela, Odisha,
India 4Department of Electrical Engineering, Indian Institute of Technology Kharagpur, Kharagpur,
West Bengal, India 5Northwestern University, Chicago, IL, United States
O U T L I N E
States. As of 2020, female breast cancer has sur- Deep learning models can make it possible to
passed lung cancer as the most commonly diag- work with highly specific regions in mammo-
nosed cancer, with an estimated 2.3 million new grams and avoid these. However, even deep
cases (it constitutes 11.7% of all forms of cancer). learning models are not error free and can often
It is the fifth leading cause of cancer mortality misclassify the benign and malignant classes for
worldwide, with 685,000 deaths. Among women, one another. The use of deep learning models is
breast cancer accounts for 1 in 4 cancer cases and governed by the amount of good-quality data
1 in 6 cancer deaths, ranking first for incidence in that is available for training. Due to the high
the vast majority of countries [1]. specificity of the application, the data required
Because of the medical importance of breast for model training is to be of high quality.
cancer screening, CAD (computer-aided detec- Very often, it is observed that the datasets com-
tion) methods for detecting anomalies such piled by institutions and organizations is highly
as calcifications, masses, architectural distortion, skewed and imbalanced. This can lead to the
and bilateral asymmetry have been created [2]. propagation of unwanted class-specific bias
One reason why detection of breast cancer is dif- through the model while in the training process.
ficult is because mammography results are Through the help of deep learning this can
highly dependent on the age of the patient, be dealt with by using the help of data aug-
breast density, and the type of lesion present. mentation. This allows researchers to increase
The density can lead to differences in the con- the amount of data in different classes by
trast of the malignant regions and could lead to using image manipulation techniques such
wrong conclusion. This field of research is highly as flipping, rotation of image through certain
sensitive to details in the mammograms of each angles, applying geometric transformations,
individual and hence requires accurate and fas- color manipulation, and images. One such
ter methods of processing large amounts of data. approach of increasing images per class in imbal-
Deep learning techniques are revolutioniz- anced datasets is using GANs (generative adver-
ing the field of medical image analysis, and sarial networks). Through GANs similar images
hence in this study, we utilized convolutional are formed by the introduction of random/spe-
neural networks (CNNs) for early detection cific noise vectors to the images. Some new-age
of breast cancer to minimize the overheads of techniques such GANs, neural style transfer, and
manual analysis [3]. Deep learning has been meta-learning make it possible to artificially
instrumental in dealing with large amounts improve the class distributions and helped
of data with relative ease. Preprocessing meth- remove bias for specific classes to some extent.
ods, such as image augmentation, make it pos- Mammography is the mostly used breast
sible to train models on selective ROI (region cancer screening technique. It is a type of imag-
of interest) and extract region-specific infor- ing that uses a low-dose X-ray system to exam-
mation for further model building. In general, ine the breast. It is considered the most reliable
complications may arise due to presence of cal- method for screening breast abnormalities
cium deposits which are also known as calcifi- before they become clinically palpable [2,4,5].
cations and microcalcification. These granular Transfer learning has been used extensively for
deposits present themselves in irregular shapes, this purpose, as shown in work proposed by a
lines. The presence of such deposits may lead paper in 2018 [6]. In this chapter, we aim to eval-
to the magnified and highlighted dark white uate and assess the impact of artificial intelli-
spots in the mammograms. These white spots gence (AI)-based techniques that can help health
could be mistaken for abnormal breast lesions professionals screen and diagnose breast cancer
and lead to false conclusions. early and help save lives [7] (Fig. 4.1).
FIGURE 4.1 Overview of the proposed CAD system. CAD, Computer-aided detection.
4.2 Background and literature review practice in many cases has a high chance of
propagating human errors and ultimately lead
Extensive work has been done in this field in to false conclusions. Now with the help of deep
the last few decades. However, in recent years learning models, outlines and regions can be
deep learning has played a pivotal role in early highlighted automatically. The primary concern
breast cancer detection through classification that exists with regards to manual outlining of
and segmentation as object detection models cancerous regions is the fact that radiologists
evolve. Segmentation of abnormal masses and may sometimes not be able to completely ana-
classification of these masses into different clas- lyze the density of different regions in the mam-
ses has been the most crucial objective. Deep mograms and this could lead to false labels
learning has made it possible to use large pre- and outlines of the cancerous regions. There are
trained architectures to retrain mammography many techniques to generate 3D scans of breasts
data through transfer learning. Recently, the use to analyze the images and detect cancerous
of an encoderdecoder architecture to classify developments, and mammography data has
mammograms into different classes has been been widely used compared to other methods.
proposed. Some researchers have employed One advantage of mammography over other
deep learning in the literature to detect suspi- scanning methods is that mammography
cious breast lesions to improve classification exposes patients to relatively low amounts of
accuracy. Traditional CAD-based systems that radiation. Recent studies in this field have been
have been implemented over the past few dec- observed to focus on three crucial steps, namely,
ades still required radiologist to specifically segmentation/feature extraction, data augmenta-
manually outline the cancerous regions. This tion, and classification. The segmentation/feature
extraction process helps in extracting relevant pattern recognition, and predictive modeling
information (specific features) from the mam- and have proved to deliver excellent perfor-
mograms and includes identifying and labeling mance on such data. ANNs have been fast
the ROIs for further processing. In the next growing for their ability to adapt to different
step, various manipulation techniques could be kinds of data. This is done with the combina-
used to clean the data and increase the data tion of external networks, deep networks, and
artificially using techniques such as GAN [8] hyperparameter optimization [15]. Our research
and neural style transfer [9]. In the last step, involves training a classification model to
that is, classification, CNN is used to identify increase the depth and then tuning the hyper-
the mammograms either as benign or malig- parameters to achieve optimal results. ANNs,
nant. The segmentation process has seen the in particular, have been extensively used in
rise of CNN architectures especially for medical disease classification. ANNs have been used for
image segmentation tasks such as U-Net and the classification and segmentation of diseases
U^2-Net [10,11]. Some of the examples that such as Alzheimer’s disease, breast cancer, lung
have been mentioned below are great examples cancer, and brain tumors, among other well-
of how deep learning has helped improve known diseases [1619]. A comparative study
breast cancer detection through segmentation presented in 2020 focused on the strengths
and classification in recent times. In 2019 a and weaknesses of some ANN models such as
method was proposed using the patch-based ML-NN, SNN, SADE, DBN, and PCA-Net. The
CNN method to detect breast lesions in full- study reveals that ANN has been very widely
field digital mammograms [12]. In one study, used, and the performance achieved by ANN-
transfer learning was used to pretrain their based models is auspicious and is comparable
CNN model using an extensive public database with the results of state-of-the-art CNN archi-
of digitized mammograms (i.e., CBIS-DDSM tecture [20]. The results achieved by ANNs
dataset), and then it was tested the trained in disease classification look promising, and its
model using the INbreast dataset [13]. The wide use in this field is rapidly on the rise.
study evaluated breast detection using VGG16,
ResNet50, and InceptionV3 as in-depth feature
extractors. It was concluded that the model
4.3.2 Deep learning
based on InceptionV3 achieved promising The swift advancement of deep learning con-
results with a true positive rate (TPR) of 98% tinues to aid the medical imaging community
[12]. In 2018 a CAD system based on the Faster in applying advanced techniques to improve
R-CNN detected and classified breast lesions as the accuracy of cancer screening. Many CAD-
benign or malignant. The study evaluated and based screening techniques have been exten-
tested the model using the breast dataset, sively researched since their inception in the
achieving an overall classification accuracy of 1990s. Deep learning has vastly helped in
95% in area under the ROC curve (AUC) [14]. improving accuracy in early detection of breast
cancer through mammograms. In recent years
deep learning has revolutionized fields con-
cerned with object detection and classification,
4.3 Artificial intelligence techniques pattern recognition, and many other domains
[21]. A breakthrough in the field of image pro-
4.3.1 Artificial neural networks
cessing was achieved in the year 2012 when a
Artificial neural networks (ANNs) have been deep learning model (convolution neural net-
used in many fields for classification tasks, work) was able to outperform all other models
# Output Layer
DNN_model.add(Dense(1,activation='sigmoid'))
# Model Summary
DNN_model.summary()
4.3.3 Convolutional neural networks information specific to that image that helps
classify images into different categories, such
CNNs are specifically designed to perform as malignant or benign in our case. Among
better on image data. In the past few years, other techniques that examine the breast, the
deep learning has achieved excellent perfor- mammogram is a widely utilized and depend-
mance in various fields, such as visual recogni- able screening innovation, and in our work,
tion, speech recognition, and natural language we have used various CNN architectures to
processing. Among different types of deep detect cancerous tissues in the Mammographic
neural networks, CNNs have been most exten- Image Analysis Society (MIAS) [28,29] and
sively studied. Leveraging on the rapid growth CBIS-DDSM [30,31] datasets. During a con-
in the amount of the annotated data and volution, the original image size is reduced.
the significant improvements in the strengths To maintain the image size, various padding
of graphics processor units, the research on techniques are used. A CNN consists primarily
CNNs has been emerged swiftly and achieved of three steps: Feature extraction, dimensional-
state-of-the-art results on various tasks. CNNs ity reduction, and classification. The convo-
can do this by analyzing the images with a lutional layers perform the task of feature
grid-like structure (Fig. 4.2). Multiple convolu- extraction, pooling layers perform feature size
tional layers create successive feature maps. reduction, and finally, the SoftMax layer classi-
Various filters are used through which the fies the image. CNNs tend to outperform dense
image is passed to retrieve relevant informa- neural networks (feed-forward networks) as
tion such as edge detection, pixel correlations, they can extract image-specific information
and pixel-specific information, among other through the use of multiple feature maps. In
valuable features. These feature maps contain contrast, densely connected networks simply
FIGURE 4.2 Simple CNN architecture with image augmentation. CNN, Convolutional neural network.
4.4 Breast cancer detection using equipped to generalize data by enabling the
artificial intelligence computer to build complex insights out of sim-
ple ideas [23].
4.4.1 Feature extraction using deep The training images were passed through
learning the pretrained models to generate a feature
A total of 11 models have been used for embedding/feature vector of either 16/32/64/
feature extraction. The models used were 128 dimensions. Feature extraction techniques
pretrained on the ImageNet dataset, and the have previously been used to extract specific
corresponding weights were used (Fig. 4.3). features from image scans using deep CNNs
Although machine learning is helpful in effec- such as ResNet and VGG architectures [32].
tively extracting features for certain tasks, the After experimenting with multiple-dimensional
remaining challenge is deciding which specific outputs, the best feature embedding/feature
features should be extracted to feed into the vector was used for further processing and
model building. To obtain a good feature experiments was to classify images into two
embedding, dropout layers with appropriate classes, namely, benign and malignant. As this
dropout rates were used along with batch was a two-class classification, a sigmoid layer
normalization. Transfer learning was used to was used for making the prediction and classi-
retrain the trained models on image data and fication of an input image into the two respec-
obtain feature embeddings that contain class tive classes. The classes in the two datasets
and image-specific information. A sample Python were not balanced, and different techniques,
code is given below. such as callbacks, propagating custom weights
model_feat = Model(inputs=base_model.input,outputs=predictions)
train_features = model_feat.predict(x_train)
val_features=model_feat.predict(x_val)
test_features=model_feat.predict(x_test)
4.4.2 Prediction and classification through the network, and reducing learning
rate with epochs, among other techniques,
In this work, two datasets are utilized, were used to tackle this problem. Various
namely, MIAS and DDSM. The objective in the optimizers such as Adam, SGD, RMSprop,
y_pred_train= sentiment_fit.predict(X_train)
y_pred_val = sentiment_fit.predict(X_val)
y_pred_test = sentiment_fit.predict(X_test)
Adamax, etc., were experimented with. Adam, Finally, we put a fully connected layer at the
which showed a great convergence pattern end of the network architecture to classify the
and converged faster than another optimizer, images. We put raw images and labels into
was chosen for final model training. Hence, the CNNs with no information on the underly-
the Adam optimizer yielding the best results ing data, and it almost gives us the state-of-
was chosen for final prediction and classifica- the-art [33]. Similarly, in the case of classifiers
tion while working with convolution neural trained on features extracted from state-of-
networks and transfer learning techniques. the-art models, dense layers of appropriate
FIGURE 4.4 Sample of benign mammograms from MIAS [28,29] dataset. MIAS, Mammographic Image Analysis Society.
def save_dictionary(path,data):
#open('u.item', encoding="utf-8")
import json
with open(path,'w') as outfile:
json.dump(str(data), fp=outfile)
def read_image():
info = {}
for i in range(322):
if i<9:
image_name='mdb00'+str(i+1)
elif i<99:
image_name='mdb0'+str(i+1)
else:
image_name = 'mdb' + str(i+1)
image_address= url+image_name+'.pgm'
img = cv2.imread(image_address,1)
img = cv2.resize(img, (224,224))
rows, cols,channel = img.shape
info[image_name]={}
for angle in range(0,no_angles,8):
M = cv2.getRotationMatrix2D((cols / 2, rows / 2), angle, 1)
img_rotated = cv2.warpAffine(img, M, (cols, rows))
info[image_name][angle]=img_rotated
return (info)
def read_lable():
filename = url+'Info.txt'
text_all = open(filename).read()
#print(text_all)
lines=text_all.split('\n')
info={}
for line in lines:
words=line.split(' ')
if len(words)>3:
if (words[3] == 'B'):
info[words[0]] = {}
for angle in range(0,no_angles,8):
info[words[0]][angle] = 0
if (words[3] == 'M'):
info[words[0]] = {}
for angle in range(0,no_angles,8):
info[words[0]][angle] = 1
return (info)
def read_data(filename):
full_dataset = tf.data.TFRecordDataset(filename,
num_parallel_reads=tf.data.experimental.AUTOTUNE)
full_dataset = full_dataset.cache()
print("Size of Training Dataset: ", len(list(full_dataset)))
feature_dictionary = {
'label': tf.io.FixedLenFeature([], tf.int64),
'label_normal': tf.io.FixedLenFeature([], tf.int64),
'image': tf.io.FixedLenFeature([], tf.string)
}
full_dataset = full_dataset.map(_parse_function,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
print(full_dataset)
for image_features in full_dataset:
image = image_features['image'].numpy()
image = tf.io.decode_raw(image_features['image'], tf.uint8)
image = tf.reshape(image, [299, 299])
image=image.numpy()
image=cv2.resize(image,(100,100))
image=cv2.merge([image,image,image])
image
images.append(image)
labels.append(image_features['label_normal'].numpy())
filenames=['CBIS-DDSM_Dataset(kaggle).tfrecords']
X=np.array(images)
y=np.array(labels)
FIGURE 4.5 Sample of malignant mammograms from MIAS [28,29] dataset. MIAS, Mammographic Image Analysis Society.
4.4.4 Performance evaluation measures True negatives (TN)—These are the correctly
predicted negative values that mean that the
The MIAS dataset was randomly split as fol- actual class’s value is no, and value of the pre-
lows for training, validation, and testing pur- dicted class is also no.
poses (train size 5 70%, validation size 5 21%, False positive and false negative values
and test size 5 9%). Similarly, the CBIS-DDSM occur when your actual class contradicts the
dataset was split as follows (train size 5 70%, predicted class.
validation size 5 20% of train size, and test False positives (FP)—When the actual class is
size 5 30%). The splits were stratified on the no, and the predicted class is yes.
labels to maintain the class ratios throughout False negatives (FN)—When actual class is
the experimentation process. The evaluation yes, but predicted class is no.
metrics are to choose carefully when dealing Accuracy—It is the ratio of correctly predicted
with medical data. Even the slightest of errors observations to the total number of observations.
could lead to major changes in applying and
adapting deep learning models in real time since TP 1 TN
Accuracy 5
medical data is susceptible to individual classes. ðTP 1 FP 1 FN 1 TNÞ
F1 score metric, Kappa score, area under the Precision—Precision is the ratio of correctly
ROC curve (AUC), recall, and precision scores predicted positive observations to the total pre-
are some popularly known and most widely dicted positive observations.
used metrics to interpret the performance of
multiclass models built with the help of deep TP
Precision 5
learning and machine learning techniques. TP 1 FP
To evaluate the results of our classification Recall (sensitivity)—Recall is the ratio of cor-
model, we require some values such as TP rectly predicted positive observations to all
(true positive), FN (false negative), FP (false observations in the actual class.
positive), and TN (true negative).
True positives (TP)—These are the correctly TP
Recall 5
predicted positive values that mean that the TP 1 FP
actual class’s value is yes, and the value of the F1—F1 score is the weighted average of pre-
predicted class is also yes. cision and recall. Therefore this Score takes
both false positives and false negatives into (FPR) at different thresholds. The area under
account. the curve (AUC) is a measure of a classifier’s
ability to distinguish between classes and is
F1 Score 5 2 ðRecall PrecisionÞ= used to summarize the ROC curve [38]. The
ðRecall 1 PrecisionÞ higher the value of this metric, the better the
performance of the classification model. It is
Cohen’s Kappa coefficient—This statistic is used evaluated by calculating the area under the
to measure the reliability between evaluators ROC curve [39].
(and also the reliability within evaluators) for A sample Python code for performance
qualitative (categorical) items. Cohen’s Kappa is metrics is given below.
TABLE 4.2 Convolutional neural networks (CNNs) with different number of layers.
Classifier Training accuracy Validation accuracy Test accuracy F1 measure KAPPA ROC area
As the number of layers increases, the accuracy It can be seen from Table 4.5 that the Random
of the model increases. Forest classifier achieved the highest test accu-
It can be seen from Table 4.3 that except for racy. Besides SVM, other classifiers were overfit-
the SVM classifier, all the other classifiers were ting on training data. The test accuracy, recall,
overfitting on training data. The accuracy scores and precision scores were found to be in the
were consistently in the range of 5159; similar range of 5156.
scores were seen in recall and precision. It can be seen from Table 4.6 that XGBoost
It can be seen from Table 4.4 that although classifier has the best overall performance.
XGBoost and Random Forest were overfitting While it overfits training data, it outperformed
on training data, they showed the best overall other models in all performance measures.
performance. Accuracy scores were consis- Random Forest also, while it overfits, was able
tently between 5159, and a similar trend was to perform well on test data. SVM did not over-
seen in recall and precision scores. fit on train data, unlike other classifiers.
TABLE 4.3 Performance of VGG16 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1 score Kappa Recall Precision
TABLE 4.4 Performance of VGG19 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 4.5 Performance of ResNet50 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
It can be seen from Table 4.7 that the It can be seen from Table 4.8 that the
Random Forest classifier overfit on train data Random Forest classifier overfit on train data
showed the best performance on all our per- showed the best performance on all our per-
formance evaluation measures. The accuracy, formance evaluation measures. The accuracy,
recall, and precision scores were consistently recall, precision scores were consistently found
found to be in the range of 5161. to be in the range of 4958.
TABLE 4.7 Performance of MobileNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 4.8 Performance of MobileNet deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 easure Kappa Recall Precision
It can be seen from Table 4.9 that all the XGBoost and KNN classifiers were overfitting
models performed equally well, and the scores on training data.
were consistently in a specific range. XGBoost, It can be seen from the Table 4.11 that
Random Forest, and KNN classifier were over- KNN, XGBoost classifier showed the best
fitting on train data. performance. Except for the SVM classifier,
It can be seen from Table 4.10 that SVM all classifiers were overfitting on training
classifier showed the best overall performance. data.
TABLE 4.9 Performance of InceptionV3 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 4.10 Performance of InceptionResNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 4.11 Performance of DenseNet169 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
It can be seen from Table 4.12 that ANN, performance. XGBoost, KNN, and Random
KNN, Random Forest, XGBoost, and AdaBoost Forest classifiers were overfitting on training
classifiers were overfitting on training data. data.
KNN classifier showed the best performance
out of all. 4.4.5.2 CBIS-DDSM dataset
It can be seen from the Table 4.13 that It can be observed from Table 4.14 that
XGBoost classifier showed the best overall simple CNN models were able to achieve high
TABLE 4.13 Performance of Xception deep feature extraction with different machine learning models
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 4.14 Convolutional neural network (CNN) with different number of layers.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa ROC area
performance overall. However, CNN 4 layer It can be observed from Table 4.15 that pre-
was not able to give good performance and trained networks were able to perform very
was overfitting on specific class data, which well. MobileNet of 95.73% accuracy achieved
resulted in a significant drop in ROC and the highest test accuracy, and it also has the
Kappa values. CNN 7 Layer has the best per- highest Kappa score, which indicates the reli-
formance overall in other networks. ability of the model. Overall, it was seen that
TABLE 4.16 Performance of VGG16 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
pretrained networks outperformed all the sim- It can be observed from Table 4.18 that
ple CNN models. XGBoost classifier showed the best overall per-
It can be observed from Table 4.16 that formance. ANN model was observed to have
Random Forest classifier was observed to give the highest reliability index. XGBoost classifier
the best performance. XGBoost and Random was overfitting on training data.
Forest classifiers were overfitting on training It can be observed from Table 4.19 that all
data. The ANN model achieved the highest the classifiers were observed to have identical
kappa score. performance. ANN and XGBoost models were
It can be observed from Table 4.17 that observed to have high-reliability scores of
Random Forest classifier showed the best over- about 0.45. The test accuracies, F1 Score, recall,
all performance. ANN model was seen to have and precision scores were consistently in the
the highest reliability index. Random Forest same range.
and XGBoost classifiers were seen to overfit on It can be observed from Table 4.20 that all
training data. the classifiers were observed to have identical
TABLE 4.18 Performance of ResNet50 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 4.19 Performance of ResNet101 deep feature extraction with different machine learning models
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
performance. ANN and XGBoost models were classifiers. XGBoost model was overfitting on
observed to have high-reliability scores. The training data.
test accuracies, F1 score, recall, and precision It can be observed from Table 4.22 that
scores were consistently in the same range. XGBoost classifier was observed to have the
It can be observed from Table 4.21 that best performance overall. The remaining models
ANN model was able to outperform all other showed identical results.
TABLE 4.20 Performance of MobileNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 4.21 Performance of MobileNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 4.22 Performance of MobileNet deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
It can be observed from Table 4.23 that and Random Forest classifiers were found to
XGBoost classifier was observed to have the overfit on training data.
best performance overall. The remaining mod- It can be observed from Table 4.25 that
els showed identical results. XGBoost classifier outperformed other classifiers.
It can be observed from Table 4.24 that all Random Forest and XGBoost classifiers were
the models performed equally well. XGBoost overfitting on training data.
TABLE 4.24 Performance of DenseNet169 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
TABLE 4.25 Performance of DenseNet121 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
It can be observed from Table 4.26 that mammograms into B (Benign), M (Malignant),
XGBoost classifier outperformed other classifiers. transfer learning was found to have the best
performance. In our experimentation with the
4.5 Discussion MIAS dataset, VGG16 had the best perfor-
mance (Train accuracy: 98.3, Test accuracy:
After experimenting with various AI techni- 95.71, Validation accuracy: 96.04, F1 score:
ques to train our classification model to classify 0.957, Kappa score: 0.9122, ROC score: 0.9545).
TABLE 4.26 Performance of Xception deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision
The custom CNN n-layered (CNN 6-layered neural networks has also revived this research
achieved a test accuracy of only 61.59) models domain. The increasing number of layers in
and classifiers (ANN, KNN, XGBoost, SVM, modern networks amplifies the differences
and Random Forest, AdaBoost) trained on between architectures and motivates exploring
extracted features were outperformed by the different connectivity patterns and revisiting
transfer learning models (XGBoost classifier old research ideas [40].
trained on features extracted from ResNet-101
model achieved a test accuracy of only 61.37).
And, in our experimentation with the CBIS- 4.6 Conclusion
DDSM dataset, MobileNet had the best perfor-
mance (Train accuracy: 99.07, Test accuracy: In this chapter, we have highlighted the
95.73, Validation accuracy: 95.86, F1 score: various techniques of image classification in
0.956, Kappa score: 0.80, ROC score: 0.89). The the field of breast cancer detection. Techniques
custom CNN n-Layered (CNN 5-Layered, such as transfer learning, deep feature extrac-
CNN 8-Layered models achieved a test accu- tion, model tuning, and hyperparameter opti-
racy of about 92% each, respectively) models mization were used to obtain the best results.
and classifiers (ANN, KNN, XGBoost, SVM, An in-depth analysis of different models has
Random Forest, AdaBoost) trained on extracted been presented in the work and the various
features were slightly outperformed by the performance evaluation metrics used. Popularly
transfer learning models (XGBoost classifier known datasets in breast cancer, such as DDSM
trained on features extracted from DenseNet169 and MIAS datasets, were used to showcase the
model achieved a test accuracy of only 89.98). impact of how different AI techniques can
When trained on various classifiers, the feature be used in this field. This chapter consists of a
extracted from the CBIS-DDSM dataset per- comparative study of various models such as
formed relatively well compared to the MIAS simple CNNs, transfer learning through pre-
dataset. Transfer learning was able to generalize trained models, deep feature extraction, and
to a terrific extent the features it had learned using traditional machine learning models over
from the training data and effectively use it these pretrained models. Our findings have
to classify test mammograms. The exploration shown that the techniques mentioned above
of network architectures has been a part of neu- can detect breast cancer from mammograms
ral network research since their initial discov- with accuracies as high as 96%. These findings
ery. The recent resurgence in the popularity of indicate that many improvements could be
[22] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet [32] A. Boyd, A. Czajka, K. Bowyer, Deep learning-based
classification with deep convolutional neural net- feature extraction in iris recognition: use existing mod-
works, Commun. ACM 60 (6) (2017) 8490. Available els, fine-tune or train from scratch?, ArXiv200208916
from: https://doi.org/10.1145/3065386. Cs, ,http://arxiv.org/abs/2002.08916. Feb. 2020.
[23] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, (accessed 06.05.21).
Nature 521 (7553) (2015) 436444. Available from: [33] A. Subasi, A. Mitra, F. Özyurt, T. Tuncer, Automated
https://doi.org/10.1038/nature14539. COVID-19 detection from CT images using deep
[24] A.S. Lundervold, A. Lundervold, An overview of learning, Computer-aided Design and Diagnosis
deep learning in medical imaging focusing on Methods for Biomedical Applications, Taylor &
MRI, Z. Für Med. Phys. 29 (2) (2019) 102127. Francis, 2021, pp. 153176. Available from: https://
Available from: https://doi.org/10.1016/j.zemedi.2018. doi.org/10.1201/9781003121152-7.
11.002. [34] Y. Jiménez-Gaona, M.J. Rodrı́guez-Álvarez, V.
[25] A.W. Seni, et al., Improved protein structure predic- Lakshminarayanan, Deep learning based computer-
tion using potentials from deep learning, Nature, aided systems for breast cancer imaging: a critical
2020, p. 577. Available from: https://www.nature. review, ArXiv201000961 Cs Eess, ,http://arxiv.org/
com/articles/s41586-019-1923-7. abs/2010.00961., Sep. 2020 (accessed 11.05.21).
[26] A. Kryshtafovych, T. Schwede, M. Topf, K. Fidelis, J. [35] P. Ranganathan, C. Pramesh, R. Aggarwal, Common
Moult, Critical assessment of methods of protein pitfalls in statistical analysis: measures of agreement,
structure prediction (CASP)—Round XIII, Proteins Perspect. Clin. Res. 8 (4) (2017) 187. Available from:
Struct. Funct. Bioinforma. 87 (12) (2019) 10111020. https://doi.org/10.4103/picr.PICR_123_17.
Available from: https://doi.org/10.1002/prot.25823. [36] J. Cohen, A coefficient of agreement for nominal
[27] M. Ghassemi, T. Naumann, P. Schulam, A. L. Beam, I. scales, Educ. Psychol. Meas. 20 (1) (1960) 3746.
Y. Chen, R. Ranganath, A review of challenges [37] Z. Yang, M. Zhou, Kappa statistic for clustered physi-
and opportunities in machine learning for health, cianpatients polytomous data, Comput. Stat. Data
ArXiv180600388 Cs Stat, ,http://arxiv.org/abs/ Anal. 87 (2015) 117.
1806.00388. Dec. 2019 (accessed 13.05.21). [38] K. Feng, H. Hong, K. Tang, J. Wang, Decision
[28] The mini-MIAS database of mammograms. ,http:// making with machine learning and ROC curves,
peipa.essex.ac.uk/info/mias.html. (accessed 12.05.21). ArXiv190502810 Cs Econ Q-Fin Stat, ,http://arxiv.
[29] MIAS Mammography. ,https://kaggle.com/kma- org/abs/1905.02810., May 2019 (accessed 12.05.21).
der/mias-mammography. (accessed 14.05.21). [39] J.B. Brown, Classifiers and their metrics quantified,
[30] R. Sawyer-Lee, F. Gimenez, A. Hoogi, D. Rubin, Mol. Inform. 37 (12) (2018) 1700127. Available from:
Curated breast imaging subset of DDSM, Cancer https://doi.org/10.1002/minf.201700127.
Imaging Archive (2016). Available from: https://doi. [40] G. Huang, Z. Liu, L. van der Maaten, K.Q.
org/10.7937/K9/TCIA.2016.7O02S9CY. Weinberger, Densely connected convolutional net-
[31] DDSM Mammography. ,https://kaggle.com/ works, ArXiv160806993 Cs, ,http://arxiv.org/abs/
skooch/ddsm-mammography. (accessed 12.05.21). 1608.06993., Jan. 2018 (accessed 29.04.21).
5
Breast tumor detection in ultrasound
images using artificial intelligence
Omkar Modi1 and Abdulhamit Subasi2,3
1
Indian Institute of Technology, Kharagpur, West Bengal, India 2Institute of Biomedicine, Faculty of
Medicine, University of Turku, Turku, Finland 3Department of Computer Science, College of
Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
5.1 Introduction 137 5.4.1 Feature extraction using deep learning 149
5.4.2 Prediction and classification 151
5.2 Background/literature review 138
5.4.3 Experimental data 165
5.3 Artificial intelligence techniques 139 5.4.4 Performance evaluation measures 166
5.3.1 Artificial neural networks 139 5.4.5 Experimental results 168
5.3.2 Deep learning 140
5.5 Discussion 178
5.3.3 Convolutional neural networks 140
5.6 Conclusion 180
5.4 Breast tumor detection using artificial
intelligence 149 References 180
breast lumps and is complementary to mam- in the automated analysis of medical images for
mograms or breast MRIs. Ultrasound is a safe, anomaly detection. The same is true for breast
noninvasive, and radiation-free procedure. images for possible breast cancer detection [5,6].
Breast ultrasonography can assist to detect Conventional methods or traditional machine
whether an abnormality is solid (such as a algorithms such as K-nearest neighbors (KNN),
benign cyst), fluid-filled (such as a noncancer- support vector machine (SVM), and Random
ous lump of tissue), or both cystic and solid Forest showed moderate performances.
(such as a malignant tumor) [3]. In recent Emerging of DL algorithms, which process
years, with the development in artificial intelli- images and extract features, has shown remark-
gence (AI), especially in the field of deep learn- able results. The CNN model is used for training
ing (DL) networks and its outperforming the data very often in medical image diagnosis,
results in the field of image recognition tasks, analysis, and its applications. In fact, medical
we can leverage the technology for ultrasound imaging in CAD systems becomes successful
tests of breast cancer for early detection because of the use of CNN [7]. CNN exploits the
whether the tumor is benign or malignant. spatial data among the image pixels. CNN has
In this chapter, we will review the task of cate- helped researchers map important features local-
gorizing the tumor with various models and izing it from the scan images of a breast and clas-
compare them. We applied DL models involving sifying them into various kinds of abnormalities.
convolutional neural networks (CNNs) and It is evident that DL requires large data, the pre-
transfer learning for feature extraction and classi- training method provided a solution to classify
fied it on various DL architectures and tradi- for incomplete data or large data was not avail-
tional machine learning (ML) algorithms. We able [8].
compared all the models on various metrics. In Ref. [9], the authors compared the classifi-
cation results of KNN, SVM, Random Forest,
and Decision Tree techniques. The Wisconsin
5.2 Background/literature review Breast Cancer dataset was utilized, which was
downloaded from the UCI repository. KNN
Breast self-examination is a screening method was the top classifier in simulations, followed
that is performed by the individual themselves. by SVM, Random Forest, and Decision Tree.
It is feasible to notice any differences or changes In Ref. [10], the authors presented a unique
in the breasts by palming them at different approach for detecting breast cancer using ML
angles and at varied pressures. Breast inspection, techniques such as the Naive Bayes classifier,
on the other hand, is the least reliable method of SVM classifier, bi-clustering AdaBoost techniques,
cancer detection. Mammography has evolved as RCNN classifier, and bidirectional recurrent neu-
a viable alternative and is now commonly uti- ral networks (HA-BiRNN). A comparison of ML
lized in medicine. However, only relying on techniques and the proposed methodology [deep
mammograms carries a considerable risk of false neural network (DNN) with support value] was
positives, which frequently result in unneeded conducted, and the simulated results revealed
biopsies and procedures [4]. that the DNN algorithm was superior in terms of
Due to the development of AI and computer performance, efficiency, and image quality, all of
vision, there has been massive research in using which are critical in today’s medical systems,
AI for automation in the field of medicine. whereas the other techniques failed to perform as
Recently, AI technology has made great progress expected.
histogram features, edge detection, region DL is closer to its goal than ML and can extract
growth, and pixel classification. ANNs have features automatically [18].
been used for the classification and segmenta- Accordingly, the DL algorithm gets a lot of
tion of diseases such as Alzheimer’s, breast attention these days to solve various problems in
cancer, lung cancer, and brain tumors, among the medical imaging field. In this chapter, we have
other diseases and have vast potential in the developed and compared various DL models on
medical and health sector. ultrasound images to detect breast cancer. DL has
helped to reduce false-positive rates and decrease
assessment time and unnecessary biopsies.
5.3.2 Deep learning
DL is a subfield of ML. DL learns from the
5.3.3 Convolutional neural networks
data. The data may be unstructured or unla-
beled. Researchers have recently advanced DL to CNN is widely used in computer vision for
expand ANN into DNN by stacking many hid- both supervised and unsupervised learning.
den layers with linked nodes between the input CNN is used to classify the images. It takes the
and output layers. By combining basic decisions images of the breast cancer dataset as input that
between layers, the multilayer can handle and neurons are associated with their corre-
increasingly complicated challenges. In predic- sponding weights. The weights are adjusted to
tion tasks such as classification and regression, minimize the error and enhance the perfor-
DNN often outperforms the shallow layered net- mance. The extracted information includes facial
work. To avoid learning converging at the local features, edge detection, object recognition, and
minimum or overcome overfitting difficulties, other relevant features present in the image.
each layer of the DNN improved its weights These features help distinguish one class of
using the unsupervised restricted Boltzmann images from another. This is what sets the CNNs
machine. Recently, residual neural networks different from other DL techniques. The compo-
used skip connections to avoid vanishing gradi- sitions of CNN are convolutional, pooling, and
ent problems. Furthermore, the introduction of fully connected layers (Fig. 5.1). In the convolu-
big data and graphics processing units has the tion layer, a feature map is used to extract the
potential to solve complicated issues and reduce features of the given image and makes the origi-
computing time. There are several hidden layers nal image more compact. The pooling layer is
between the input layer and the output layer. used to reduce the dimensions of the image.
The nodes known as neurons are found in each Rectified linear unit (ReLU) layer is used as an
layer. The difference between ML and DL is that activation function in which it checks the value
Normal
Benign
Malignant
Breast
Ultrasound
def cnn_2():
# Create the model
model = Sequential()
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
def cnn_3():
# Create the model
model = Sequential()
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
def cnn_4():
# Create the model
model = Sequential()
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
def cnn_5():
# Create the model
model = Sequential()
return model
def cnn_6():
# Create the model
model = Sequential()
# Add convolutional layers
model.add(Conv2D(16, (3, 3), padding='same', strides=(1, 1),input_shape=img_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Add dropout layer
model.add(Dropout(0.2))
return model
def cnn_7():
# Create the model
model = Sequential()
return model
def cnn_8():
# Create the model
model = Sequential()
model.add(Conv2D(32,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))
model.add(Conv2D(64,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))
model.add(Conv2D(128,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))
model.add(Conv2D(256,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=1,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))
model.add(Conv2D(128,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=1,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))
model.add(Conv2D(64,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))
# Add flatten layer
model.add(Flatten())
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
5.4 Breast tumor detection using #VGG16 PreTrained Model for Feature Extraction
artificial intelligence base_model = tf.keras.applications.VGG16(
include_top=False,
5.4.1 Feature extraction using deep learning weights="imagenet",
Feature extraction is a step in the image pro- input_tensor=None,
cessing, which divides and reduces a large collec- input_shape=img_shape,
tion of raw data into smaller groupings. As a pooling=None
result, processing will be easier. When you have
)
a huge data collection and need to decrease the
feature_extraction(base_model)
number of resources without sacrificing any vital
or relevant information, extracting the features #VGG19 PreTrained Model for Feature Extraction
might help. Feature extraction aids in the reduc- base_model = tf.keras.applications.VGG19(
tion of unnecessary data in data collection. The
include_top=False,
reduction of data makes it easier for the com-
weights="imagenet",
puter to develop the model with less effort, and
it also speeds up the learning and generalization input_tensor=None,
processes in the ML process [22]. input_shape=img_shape,
In our research, we have extracted featured pooling=None
through multilayered CNN layers. CNNs provide )
automatic feature extraction. The specified input
feature_extraction(base_model)
data is initially forwarded to a feature extraction
network, and then the resultant extracted features #ResNet50 PreTrained Model for Feature Extraction
are forwarded to a classifier network after apply- base_model = tf.keras.applications.ResNet50(
ing a fully connected layer. Max and average pool- include_top=False,
ing layers were introduced for dimension
weights="imagenet",
reduction that helps significantly in reducing com-
puting costs. To obtain a better feature, vector input_tensor=None,
dropout and batch normalization were used which input_shape=img_shape,
also helped to prevent high variance in data. We pooling=None
have trained various n-layered CNN models )
(n varying from 2 to 10) and compared the quality feature_extraction(base_model)
of features extracted after passing to classifier neu-
ral network through performance matrix. Apart #ResNet101 PreTrained Model for Feature Extraction
from this, a total of 11 pretrained CNN architec- base_model = tf.keras.applications.ResNet101(
tures have been used for feature extraction and include_top=False,
trained on various ML models. Model weight was weights="imagenet",
initialized with ImageNet weights which contains input_tensor=None,
more than 14 million images that belong to more
input_shape=img_shape,
than 20,000 classes. This usage of models with pre-
pooling=None
trained weights is known as transfer learning that
helped to save time and resources from having to )
train multiple ML models from scratch to com- feature_extraction(base_model)
plete similar tasks. Python codes for deep feature
extraction are given below.
5.4.2 Prediction and classification dropout layers and batch normalization layers
The dataset used in our research consisted were introduced. The model was compiled
of three classes of labels associated with each using Adam optimizer and the categorial loss
scanned ultrasound image. Extracted feature function was used to evaluate the loss. This
vectors processed by CNN layers and various optimization technique showed good conver-
pretrained architecture were passed to a dense gence in the dataset. Data was stratified to deal
neural network consisting of a varied number with class imbalance problems and early stop-
of hidden layers with ReLU activations. The ping was introduced to avoid overfitting and
last layer that is output layer was a softmax to improve the learner’s performance on data
layer that classified the feature vectors into outside of the training set. The Python codes
three classes. To overcome high variance, for Transfer learning models are given below.
#VGG16
base_model = tf.keras.applications.VGG16(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
for l in base_model.layers:
l.trainable = False
def VGG16():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#VGG 19
base_model = tf.keras.applications.VGG19(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
def VGG19():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#ResNet50
base_model = tf.keras.applications.ResNet50(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
def resnet():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#InceptionV3
base_model = tf.keras.applications.InceptionV3(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
for l in base_model.layers:
l.trainable = False
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#MobileNet
base_model = tf.keras.applications.MobileNet(
alpha = 0.75,
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
def MobileNet():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#DenseNet121
base_model = tf.keras.applications.DenseNet121(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
def DenseNet121():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#DenseNet169
base_model = tf.keras.applications.DenseNet169(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
def DenseNet169():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#InceptionResNetV2
base_model = tf.keras.applications.InceptionResNetV2(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
def InceptionResNetV2():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#MobileNetV2
base_model2 = tf.keras.applications.MobileNetV2(
alpha = 0.75,
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model2.layers:
layer.trainable = False
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#ResNet101
base_model = tf.keras.applications.ResNet101(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False
def resnet101():
model = Sequential()
model.add(base_model)
model.add(MaxPooling2D((2,2),strides = 2))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return model
#AlexNet
def AlexNet():
AlexNet = Sequential()
#Output Layer
AlexNet.add(Dense(3))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('softmax'))
AlexNet.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
return AlexNet
In the second part of our experiment, vari- the images into benign and malignant. The
ous classifiers were trained on features classifiers used were ANNs, KNN classifier,
extracted from the pretrained models. The SVM classifier, Random Forest, AdaBoost,
region of interest captured in the extracted fea- Bagging, XGBoost, LSTM, and Bi-LSTM classi-
tures is fed into various classifiers, to classify fier (Fig. 5.2).
162 5. Breast tumor detection in ultrasound images using artificial intelligence
def eval(classifier_name,y_train,y_train_pred,y_val,y_val_pred,y_true,y_pred):
y_train = np.argmax(y_train,axis=1)
y_val = np.argmax(y_val,axis=1)
y_true = np.argmax(y_true,axis=1)
train_accuracy = round(accuracy_score(y_train,y_train_pred),4)
val_accuracy = round(accuracy_score(y_val,y_val_pred),4)
test_accuracy = round(accuracy_score(y_true,y_pred),4)
f1_measure = round(f1_score(y_true,y_pred,average='weighted'),4)
kappa_score = round(cohen_kappa_score(y_true,y_pred),4)
recall = round(recall_score(y_true,y_pred,average='weighted'),4)
precision = round(precision_score(y_true,y_pred,average='weighted'),4)
score={"classifier":classifier_name,"train_accuracy":train_accuracy ,
"val_accuracy":val_accuracy,"test_accuracy":test_accuracy,"f1_
measure":f1_measure,"kappa_score":kappa_score,"recall":recall,"precision":precision}
classifier.fit(X_train,np.argmax(y_train,axis=1))
y_train_pred = classifier.predict(X_train)
y_val_pred = classifier.predict(X_val)
y_test_pred = classifier.predict(X_test)
eval(classifier_name,y_train,y_train_pred,y_val,y_val_pred,y_test,y_test_pred)
names = ['SVM',
'Random Forest',
'AdaBoost',
'KNN',
'XGBoost',
'Bagging',
'ANN'
]
classifier = [
SVC(),
RandomForestClassifier(),
AdaBoostClassifier(),
KNeighborsClassifier(),
XGBClassifier(),
BaggingClassifier(),
MLPClassifier(max_iter = 400),
]
cls_list = zip(names,classifier)
clsm_list = zip(names,classifier)
def feature_extraction(base_model):
X_feat_out = base_model.output
X_feat_flatten = Flatten()(X_feat_out)
Xm_feat_train = X_feat_model.predict(Xm_train)
Xm_feat_val = X_feat_model.predict(Xm_val)
Xm_feat_test = X_feat_model.predict(Xm_test)
for l in base_model.layers:
l.trainable = False
#LSTM
lstm_model = Sequential()
lstm_model.add(base_model)
lstm_model.add(Reshape((base_model.output.shape[1]*base_model.output.shape[2],
base_model.output.shape[3])))
lstm_model.add(LSTM(128, dropout=0.5,recurrent_dropout=0.5))
lstm_model.add(Dense(3,activation='softmax'))
lstm_model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
lstm_train_predict = np.argmax(lstm_model.predict(X_train),axis=1)
lstm_val_predict = np.argmax(lstm_model.predict(X_val),axis=1)
lstm_test_predict = np.argmax(lstm_model.predict(X_test),axis=1)
eval("LSTM",y_train,lstm_train_predict,y_val,lstm_val_predict,y_test,lstm_test_predict)
bidir_model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])
ANN
K-NN
SVM
RF
Adaboost
Bagging
XGBoost
LSTM
BiLSTM
Breast
Ultrasound
Deep Feature Extraction Classification
FIGURE 5.2 Classification of breast ultrasound images using deep feature extraction.
FIGURE 5.4 Ground truth (masked) image example for their respective original image.
At Baheya Hospital, grayscale ultrasound and we trained the model on both images indi-
pictures were gathered and saved in a DICOM vidually [23].
format. They were preprocessed and improved Each image was changed to a grayscale image
after being annotated. The number of ultra- with a target shape of size (128 3 128 3 1). The
sound images was decreased to 780 once the dataset consisted of {“benign”: 891, “normal”:
dataset was refined. Normal, benign, and 266, “malignant”: 421} labels. The data was split
malignant images are separated into three cate- into stratified to maintain class ratios uniformly
gories (cases). To remove unnecessary and through the process. The training set consisted of
irrelevant borders from the images, they were 76.5%, the validation set consisted of 8.5% and
all cropped to various sizes. MATLAB was the test set consisted of 15% of the total dataset.
used to perform ground truth (image bound-
aries) in order to make the ultrasound dataset
more useful. For each image, a freehand seg- 5.4.4 Performance evaluation measures
mentation is created independently. As a Evaluation metrics are used to measure the
result, there is a masked image for each image, performance of the statistical model. In our
The above table is a sample of classification F1: F1 score is the weighted average of preci-
table for multiclass classification. The values in sion and recall. Therefore this score takes both
blue are correctly predicted and values in red false positives and false negatives into account.
are wrongly predicted. Depending upon this Tp 1 Tn
1 Accuracy 5 Tp 1 Tn 1 Fp 1 Fn
value other metrics are measured.
Tp
2 Precision 5 Tp 1 Fp
True Benign 5 Total number of observations
that are benign and machine has predicted it 3 Recall 5 Tp Tp
1 Tn
as benign. 4 F1 5 2 Precision 1 Recall
PrecisionRecall
False Benign 5 Total number of observations
that are normal or malignant and machine 5.4.4.1 Cohen’s Kappa coefficient
has predicted it as benign.
The kappa statistic is frequently used to test
True normal 5 Total number of observations
model reliability. The importance of model reli-
that are normal and machine has predicted
ability lies in the fact that it represents the extent
it as normal
to which the data collected in the study are cor-
False normal 5 Total number of observations
rect representations of the variables measured.
that are not normal and machine has
The kappa can range from 21 to 11, where 0
predicted it as normal.
represents the amount of agreement that can be
True malignant 5 Total number of
expected from random chance, and 1 represents
observations that are malignant and
perfect agreement between the raters. values # 0
machine has predicted it as malignant.
as indicating no agreement and 0.010.20 as
False malignant 5 Total number of
none to slight, 0.210.40 as fair, 0.410.60 as
observations that are not malignant and
moderate, 0.610.80 as substantial, and 0.811.00
machine has predicted it as malignant.
as almost perfect agreement [24].
Accuracy: It is the ratio of correctly predicted
observations to the total number of PrðaÞ 2 PrðeÞ
κ5
observations. 1 2 PrðeÞ
Where Pr(a) represents the actual observed agree- be able to tell the difference between positive
ment, and Pr(e) represents chance agreement. and negative class values. Because the classi-
fier can recognize more true positives and
5.4.4.2 Area under the curve score true negatives than false negatives and false
positives, this is the case. The classifier is
The receiver operator characteristic (ROC) unable to discriminate between positive and
curve is a classification problem evaluation negative class points when AUC 5 0.5. In
metric. It is a probability curve that compares other words, the classifier predicts a random
the true positive rate to the false-positive rate or constant class for all data points [25].
at various threshold levels, thereby separating During classification training, many genera-
the “signal” from the “noise.” The AUC is a tive classifiers use accuracy as a criterion for
summary of the ROC curve that measures a selecting the best answer. However, the accu-
classifier’s ability to discriminate between racy has various flaws, including reduced
classes. The higher the AUC, the better the uniqueness, discriminability, informativeness,
performance of the model at distinguishing and bias to data from the majority class [26].
between the positive and negative classes. Simple Python implementation function is
When AUC 5 1, the classifier is capable of shown below.
y_pred = model.predict(X_test)
y_pred_label = np.argmax(y_pred, axis=1)
y_true_label = np.argmax(y_test, axis=1)
TABLE 5.1 Performance of the classifiers for multilayered CNN networks on images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa ROC area
TABLE 5.2 Performance of the classifiers for multilayered CNN networks on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa ROC area
ResNet (Residual network) is deep network ResNet101 is 101-layered deep neural network
with skip connection that helps in vanishing with residual networks, the features extracted are
gradient problem and decrease error on higher trained to ML models; ANN and XGBoost per-
layered model compared to conventional mod- formed best among all other models (Table 5.11).
els. Feature extracted from ResNet50 followed Bagging model performed quiet impressively
by XGBoost resulted in 78% of test accuracy with test accuracy of 91.45% on ResNet101 fea-
(Table 5.9). ture vectors (Table 5.12).
ANN model on masked image resulted in Inverted residual blocks are introduced in
94% of accuracy with good kappa and F1 score this architecture which helps to reduce the
(Table 5.10). computation cost and model size. SVM
TABLE 5.5 Performance of the classifiers using VGG16 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.6 Performance of the classifiers using VGG16 pretrained model for feature extraction on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision
TABLE 5.7 Performance of the classifiers using VGG19 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision
TABLE 5.8 Performance of the classifiers using VGG19 pretrained model for feature extraction on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
ANN 1 0.9403 1 1 1 1 1
TABLE 5.10 Performance of the classifiers using ResNet50 pretrained model for feature extraction on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.11 Performance of the classifiers using ResNet101 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision
performed best with test accuracy if 79.5% and accuracies in range of 92%98.2%, Here again
F1 score of 0.7871 (Table 5.13). ANN performs best among all other models
Feature vector extracted from MobileNetV2 with F1 score of 0.9706 (Table 5.16).
for masked image shows test accuracy in the Random Forest and XGBoost were overfit-
range 95.5% to 98.3% (Table 5.14). ting on training set, the accuracies were in
Feature extracted from MobileNet architecture range of 60%73.5%, SVM performed best
obtained test accuracies in range of 67%80%, among all other models with F1 score of 0.7812
where ANN performance best with F1 score of (Table 5.17).
0.7992 (Table 5.15). Training accuracies for Random Forest,
Feature extracted from MobileNet architec- XGBoost, and ANN were 100% and clearly it is
ture for masked images obtained test overfitting. KNN performed best among all
TABLE 5.12 Performance of the classifiers using ResNet101 pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision
TABLE 5.13 Performance of the classifiers using MobileNetV2 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision
TABLE 5.15 Performance of the classifiers using MobileNet pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.16 Performance of the classifiers using MobileNet pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
other models with test accuracy of 94.87% and Model was overfitting on Random Forest,
F1 score of 0.9491 and kappa score 0.9127 XGBoost, and ANN model. Highest test accu-
(Table 5.18). racy 79.49% was observed in ANN with F1
InceptionResNetV2 combines Inception net- score of 0.7897 and kappa score of 0.6558
work with residual linkage. Bi-LSTM showed (Table 5.21).
best results with test accuracy 76% and 0.7574 Approximately same performance was
and 0.5981 F1 score and kappa scores, respec- observed by SVM, KNN, and bagging model
tively (Table 5.19). with highest accuracy of 97.44% (Table 5.22).
Test accuracies were consistently in range Highest test accuracy of 80.34% was
91%95%, LSTM and Bi-LSTM performed best obtained in XGBoost. Bi-LSTM model showed
with test accuracy of 95.73% (Table 5.20). good results with test accuracy approx. 80%
TABLE 5.17 Performance of the classifiers using InceptionV3 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.18 Performance of the classifiers using InceptionV3 pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.20 Performance of the classifiers using InceptionResNetV2 pretrained model for feature extraction on
masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.21 Performance of the classifiers using DenseNet169 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.22 Performance of the classifiers using DenseNet169 pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.23 Performance of the classifiers using DenseNet121 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
and F1 score of 0.7932 and kappa score of Bi-LSTM performed best with accuracy of
0.6626 (Table 5.23). 97.44% and F1 score of 0.9747 and kappa score
All the model performed extremely good with of 0.9565 (Table 5.26).
accuracies in range 96%99% (Table 5.24).
Xception is a convolutional neural network
that is 71-layer deep. Random Forest, XGBoost, 5.5 Discussion
and ANN were overfitting on training set, accu-
racies are in range of61%76% with ANN per- In this chapter, we have trained various DL
forming best of all other models (Table 5.25). architectures on unmasked and masked
Xception architecture on masked images images, and the performance of each model is
yield test accuracies in range 93%97%. quantitatively compared. The multilayered
XGBoost 1 0.9104 1 1 1 1 1
TABLE 5.25 Performance of the classifiers using Xception pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
TABLE 5.26 Performance of the classifiers using Xception pretrained model for feature extraction on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision
CNN architecture followed by the dense neural and evaluating them with various performance
network to predict each image as benign, nor- metrics. The performance on masked imaged
mal, or malignant on unmasked images with such a small dataset was impressive and
showed average results with validation and test accuracies up to 98% were observed.
test accuracies in the range of 60%70%. Training and transfer learning made it possible
Overfitting was observed on the training set to give satisfactory results on unmasked data
due to a small dataset, so batch normalization and test accuracy of 80% was achieved.
and dropout layer were introduced. No signifi-
cant improvement was observed on stacking
many CNN layers. On the other hand, CNN References
architecture on masked images showed good
results and increasing CNN layers, the model [1] H. Sung, et al., Global Cancer Statistics 2020:
GLOBOCAN Estimates of Incidence and Mortality
learned better and performance metrics were Worldwide for 36 Cancers in 185 Countries, CA.
quite impressive. Cancer J. Clin. 71 (3) (2021) 209249. Available from:
To overcome the problem of insufficient data, https://doi.org/10.3322/caac.21660.
pretraining models and training a classifier on top [2] Cancer.Net, Breast cancer - diagnosis, https://www.
of it were implemented. For unmasked data, train- cancer.net/cancer-types/breast-cancer/diagnosis, Jun.
25, 2012 (accessed 26.11.21).
ing accuracies were in the range of 90%95%, [3] Radiologyinfo.org., R. S. of N. A. (RSNA) and A. C. of
while validation and test accuracies were in the Radiology (ACR), Ultrasound - breast, https://www.
range of 70%80%. For masked data, the training radiologyinfo.org/en/info/breastus, June 15, 2020
accuracies were in the range of 97%98%, while (accessed 26.11.21).
validation and test accuracies were in the range of [4] M.H.-M. Khan, et al., Multi- class classification of
breast cancer abnormalities using deep convolutional
90%95%. There were significant improvements neural network (CNN), PLOS ONE 16 (8) (2021)
on F1, kappa score, and ROC area as well. e0256500. Available from: https://doi.org/10.1371/
Further ML classifiers such as SVM, AdaBoost, journal.pone.0256500.
Random Forest, XGBoost, KNN, ANN, LSTM, [5] M. Talo, Automated classification of histopathology
and Bi-LSTM were trained on deep features images using transfer learning, Artif. Intell. Med. 101
(2019) 101743. Available from: https://doi.org/
extracted from pretrained models. These showed 10.1016/j.artmed.2019.101743.
satisfactory results. Bi-LSTM and LSTM models [6] S.M. Shah, R.A. Khan, S. Arif, U. Sajid, Artificial
outperformed other models. intelligence for breast cancer detection: trends &
The number of hyperparameters such as the directions, ArXiv211000942 Cs Eess, http://arxiv.
total number of hidden layers, number of neu- org/abs/2110.00942, 2021 (accessed 18.12.21)
[7] S.S. Yadav, S.M. Jadhav, Deep convolutional neural net-
rons in each hidden layer, number of epochs to work based medical image classification for disease
train, learning rate, and choice of optimizer diagnosis, J. Big Data 6 (1) (2019) 113. Available from:
were fine-tuned on validation accuracies. This https://doi.org/10.1186/s40537-019-0276-2.
plays important role in determining the perfor- [8] G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning
mance of the model. algorithm for deep belief nets, Neural Comput 18 (7)
(2006) 15271554. Available from: https://doi.org/
10.1162/neco.2006.18.7.1527.
[9] S. Bhise, S. Gadekar, A.S. Gaur, S. Bepari, D. Kale, D.S.
5.6 Conclusion Aswale, Breast cancer detection using machine learning
techniques, Int. J. Eng. Res. Technol. 10 (7) (2021)
In this chapter, we classified breast cancer as Accessed: Dec. 18, 2021. [Online]. Available from:
https://www.ijert.org/research/breast-cancer-detection-
benign, malignant, or normal using ultrasound using-machine-learning-techniques-IJERTV10IS070064.pdf.
images by training various CNN models and Available from: https://www.ijert.org/breast-cancer-
pretrained architecture, optimizing, fine-tuning, detection-using-machine-learning-techniques.
6
Artificial intelligence-based skin
cancer diagnosis
Abdulhamit Subasi1,2 and Saqib Ahmed Qureshi3
1
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland
2
Department of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia
3
Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
O U T L I N E
diagnosis [2]. Melanoma as a disease was first symptom of melanoma that defines whether
discovered by Rene Laennec in 1804 by the you are infected is a new spot on the skin (likely
name melanose. The heterogeneous nature of moles) or a spot that is changing in size, shape,
some melanoma tumors was first observed by or color. However, you can also determine the
William Norris in 1820. Sir Robert Carswell symptoms with the help of ABCDE rule [10]:
introduced the term melanoma in 1838 [3]. In
• Asymmetry: Shape of the mole is irregular.
the year 2020 in the United States nearly 100,350
• Border: It has irregular edges instead of
new cases of melanoma were diagnosed. It
smooth.
affected 60,190 men and 40,160 women. And in
• Color: The mole has dark spots or uneven
2020 in the United States, an estimated 6850
shading.
deaths from melanoma were expected, which
• Diameter: The spot is larger than the size of
include 4610 men and 2240 women [4].
a pencil eraser.
Melanoma cancer developed through the
• Evolving or elevation: It is continuously
pigment producing cells known as melano-
changing in size, shape, or texture.
cytes. It is also caused due to exposure of UV
(ultraviolet rays). It is like moles in the body, Other symptoms are such as a sore that does
have black/pink color, and can develop any- not heal, redness or a new swelling, changes in
where on the skin. In men, these are more sensation such as itchiness, pain, or tenderness,
likely to begin on the chest and back, while in change in size of the mole, bleeding, oozing, or
case of women, it generally appears on the the development of lump or bump. Due to
legs. The neck and face are other common sites changing signs of cancer, it is better to take con-
where melanoma would appear. It is more sultation of skin specialist as soon as possible to
likely to be found among the fair-skinned peo- diagnose melanoma. Detection of cancer early
ple, whereas people with dark skin have very often allows for more treatment options. Like
less possibility to develop melanoma, but it can other cancers, melanoma also has four stages.
develop between the gaps of fingers, under the As a rule, the lower the number, the less the can-
nails, and also sometimes inside the mouth cer has spread, so melanoma can be diagnosed
and eyes [5]. Melanoma can affect people of all successfully after detecting the early stage [11].
ages. It is often considered serious for older Prevention of melanoma cancer can be done
people with average age of 65 [6], while the through limiting your exposure to UV rays, by
incidence of melanoma is rapidly rising in avoiding the use of tanning beds and sunlamp,
young adults. It has been observed that mela- by wearing protective clothing and putting
noma is now the most common form of cancer moisturizers on skin, and by boosting the
in men and women aged 2039 [7]. The risk of immune system, as having a weakened immune
melanoma seems to be increasing under the system increases not only risk of getting mela-
age of 40 especially in women. It is also the noma but also other types of skin cancer also.
most common cancers in young adults [8]. Melanoma treatment includes chemotherapy,
There are several types of melanoma cancer radiotherapy, surgery, radiation therapy, immu-
but most common are cutaneous. Other types notherapy, or targeted therapy, either alone or
include superficial spreading melanoma, acral in combination. Treatment may include a com-
lentiginous melanoma, nodular melanoma, bination of procedures or not depends upon
amelanotic and desmoplastic melanomas, lenti- how early you have detected the cancer [12].
go maligna melanoma, metastatic melanoma, Chemotherapy is recommended for patients
and ocular melanoma [9]. First important who have metastatic melanoma which has
Feature Image
Image of Extracon with Extracted classificaon
mole Pre - trained Features with Arficial
Models Intelligence
Classified
image
Melanoma
OR
Normal
The main method or test for detecting mela- produce the segmentation and coarse classifi-
noma skin cancer is biopsy. Biopsy is a process cation results. It consisted of two fully convolu-
of removal of skin tissues for testing [17]. What tional residual networks. The classification
type of biopsy is to be used depends on doctor was done in two steps: a simple CNN was
and the condition of skin? It is costlier and used for feature extraction and a lesion index
time-consuming process as compare to deep calculation unit is developed to refine the
learning techniques. Hence, it is better to use coarse classification results by calculating the
deep learning techniques first and then these distance heat map. Accuracies of this frame-
biopsy testing methods can be done as subse- work showed promising results, that is, for
quent steps. It will save money and time as task 1, task 2, and task 3, 0.753, 0.848, and 0.912
well. As there are several Deep Learning mod- were obtained respectively.
els, they can be deployed on web or made Castro et al. [19] developed an accessible
available in the form of apps for maximizing mobile app which contains a CNN model
the reach. trained on images collected from smartphones
In this chapter, several deep learning meth- and lesion clinical information. The dataset
ods are used for the detection of skin cancer used for this problem was highly imbalanced.
using images (Fig. 6.1). Apart from classical From the proposed approach, promising
methods, transfer learning is also used which results were obtained with an accuracy of 92%
uses pretrained models. The below image and a recall of 94%.
shows the pipeline of transfer learning used. Gulati and Bhogal [20] used two pretrained
models AlexNet and VGG16 in two different
ways. They used these pretrained models for
6.2 Literature review transfer learning and feature extraction. The
results showed that transfer learning was more
Li and Shen [18] developed a framework efficient for both CNN models as compared to
using deep learning which can simultaneously feature extraction method. In transfer learning
nonlinear function. This process continues till make it more flexible. The weights are updated
output layer is reached [27]. During this whole by using an algorithm named as backpropaga-
process, neurons of only one layer remain active tion algorithm [28]. During this process, differ-
at a time; the output of one hidden layer acts as entials are taken with respect to weights and
an input for the next hidden layer. As the num- then they are multiplied in subsequent steps.
ber of hidden layer increases, the complexity of This gives rise to two major problems: explod-
algorithm also increases. If nonlinear function is ing gradient and vanishing gradient problem.
not used, then the expression in the output layer In exploding gradient problem, sometimes the
can be written as a linear combination of the differentials take very big values, which leads to
inputs. This would eliminate the purpose of all overflow and makes the gradient “NaN,” that
hidden layers and the whole system will act as is, not a number. While in vanishing gradient
a single-layer network. Also, this nonlinearity problem, the differentials take very small values
function makes the model more complex and which tend toward zero after subsequent multi-
helps it learn more complicated features. The plication; it results in no update of weights.
bias term is added to generalize the model and Sample Python code for ANN is given below.
import tensorflow as tf
model_ANN = tf.keras.models.Sequential([
tf.keras.Input(shape=(last_layer_shape[1],last_layer_shape[2],last_layer_shape[3])),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation = 'relu'),
tf.keras.layers.Dense(1, activation = 'sigmoid')
])
LEARNING_RATE = 1e-4
OPTIMIZER = RMSprop(lr=LEARNING_RATE,decay=1e-2)
LOSS = 'binary_crossentropy'
METRICS = [
'accuracy',
'AUC'
]
model_ANN.compile(
loss=LOSS,
metrics=METRICS,
optimizer=OPTIMIZER,
)
y_pred_ANN = model_ANN.predict(X_test)
y_pred_kNN = classifier_kNN.predict(X_test)
trees; here trees refer to decision tree classifiers. new decision tree models are created to rectify
Hence, it makes a forest of decision tree classifiers. the mistakes made by existing models. These
They follow “Bagging” approach which says that models are created sequentially on top of each
the collection of learning algorithms increases the other following an iterative approach. This pro-
overall performance of the model. Decision trees cess continues till stopping criteria is achieved.
are very interpretable and straightforwardly Gradient descent algorithm is used to minimize
deterministic. For a given feature set, they always the loss while making new models, therefore it is
produce the same regression or classification called gradient boosting. Execution speed of
model structure. To add some amount of ran- XGBoost is very fast as it uses parallelization to
domness or fuzziness, Random Forest classifier is make use of all cores of CPU. During preproces-
used. There can be some cases where one decision sing of data, if there exist missing values or vari-
tree classifier is overfitting, while the other one is ables need to be one-hot encoded it makes the
highly biased; they cancel errors by supporting data sparse. XGBoost has an inbuilt algorithm
each other and as a whole become more robust which handles different types of patterns in
model. During model building, whole dataset is sparse data. In order to tackle the problem of
not taken at once; instead, some randomly overfitting, it uses L1 and L2 regularization.
selected data points were given to individual deci- Sample Python code for XGBoost is given below.
sion tree classifier. Another important function of
Random Forest is that it can be used to determine
the features which are more important. At first, it import xgboost as xgb
classifier_xgb = xgb.XGBClassifier(n_estimators = 300)
creates the model on the whole dataset and calcu- classifier_xgb.fit(X_train, y_train)
lates the score. Then it shuffles one of the features
randomly and calculates the score on the remain- test_acc_xgb = classifier_xgb.score(X_test, y_test)
ing features. By observing the magnitude of incre-
ment or decrement of the score, the importance of y_pred_xgb = classifier_xgb.predict(X_test)
y_pred_RF = classifier_RF.predict(X_test)
import tensorflow as tf
model_LSTM = tf.keras.models.Sequential([
tf.keras.Input(shape=(last_layer_shape[1], last_layer_shape[2]*last_layer_shape[3])),
tf.keras.layers.LSTM(100, return_sequences=True),
tf.keras.layers.LSTM(32),
tf.keras.layers.Dense(1, activation = 'sigmoid')
])
LEARNING_RATE = 1e-4
OPTIMIZER = Adam(lr=LEARNING_RATE,decay=1e-2)
LOSS = 'binary_crossentropy'
METRICS = [
'accuracy',
'AUC'
]
model_LSTM.compile(
loss=LOSS,
metrics=METRICS,
optimizer=OPTIMIZER,
)
y_pred_LSTM = model_LSTM.predict(test_features_2d)
6.3.9 Bidirectional long short-term memory LSTMs and makes the inputs to run in both for-
Bidirectional LSTM (Bi-LSTM) [36] is the ward and backward direction. This makes
extended version of traditional LSTMs. This model more robust and efficient as for taking
structure considers information not only from the decision both future and past hidden states
past states but also from future states. It enables are utilized. All other properties of Bi-LSTM are
the current state to take decision by considering similar to unidirectional LSTM. Sample Python
both past and present information. It uses two code for Bi-LSTM is given below.
model_Bi_LSTM.compile(
loss=LOSS,
metrics=METRICS,
optimizer=OPTIMIZER,
)
y_pred_Bi_LSTM = model_Bi_LSTM.predict(test_features_2d)
LEARNING_RATE = 1e-4
OPTIMIZER = Adam(lr=LEARNING_RATE,decay=1e-2)
LOSS = 'binary_crossentropy'
METRICS = [
'accuracy',
'AUC'
]
6.3.10 Convolutional neural network weights are needed. Now if one more neuron is
added, it becomes 32 3 32 3 3 3 2, that is, more
CNN [37] is one of the special types of deep than 6000 weights. For such a small resolution
neural networks which are specialized in analyz-
image only, there are so many weights need to be
ing visual imagery. The term “convolution”
trained. Hence, it is more fruitful and practical
comes from the linear operation in mathematics approach to look for the local regions in an image
which is performed between matrices. CNN has rather than full image. Parameters or weights can
performed exceptionally well in computer vision, be decreased more rapidly by increasing the
natural language processing, voice recognition, strides. In convolutional layer, there is a problem
etc. The features should not be spatially depen- of loss of information on borders of the image,
dent; it is very important assumption, which is which can be easily overcome by using zero pad-
considered for all problems solved by CNNs. ding. Pooling layer is used to downsample the
CNNs consist of filters in the form of matrix image for reducing the complexity for further
which extract different features from the image. layers. Also, it does not contain any trainable
For example, if a boundary detection filter is parameters or weights. Nonlinearity layer is used
passed through the image, it extracts all the to adjust or saturate the output. The most com-
boundaries present in the image. It actually cre- mon function which is used for nonlinearity is
ates an activation map of that particular feature Rectified Linear Unit also known as “ReLU”;
so that it can be easily seen which part of the other functions are “sigmoid” and “tanh.” After
image is activated or what features are extracted the processing of image is done by going through
by this filter. A CNN architecture has many required number of layers, global pooling is done
layers: convolutional layer, pooling layer, nonline- and the matrix is flattened into a vector to extract
arity layer, and fully connected layer. In convolu- out final features. These features are fed into the
tional layer, instead of looking for the full image, fully connected layer. The fully connected layer is
local regions are focused so that trainable para- the same as traditional neural networks. It is used
meters can be reduced. For example, if an image to give the final output from the model by using
has resolution 32 3 32 3 3 raw pixels are used as “sigmoid” or “softmax” function depending
an input. To connect this input layer with one upon the number of classes. Sample Python code
neuron only 32 3 32 3 3 weights, that is, 3072 for CNN is given below.
def create_model():
print("create model")
model = Sequential()
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
return model
LEARNING_RATE = 1e-4
OPTIMIZER = Adam(lr=LEARNING_RATE,clipvalue=0.45)
LOSS = 'binary_crossentropy'
METRICS = [
'accuracy',
'AUC'
]
model.compile(
loss=LOSS,
metrics=METRICS,
optimizer=OPTIMIZER,
)
y_p = model.predict_generator(test_gen)
def load_pretrained_model():
base_model = MobileNet(
input_shape=INPUT_SHAPE,
include_top=False,
weights='imagenet'
)
# freeze the first 75 layers of the base model. All other layers are trainable.
for layer in base_model.layers[0:75]:
layer.trainable = False
return base_model
def create_model():
print("create model")
model = Sequential()
model.add(load_pretrained_model())
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
return model
model = create_model()
196 6. Artificial intelligence-based skin cancer diagnosis
6.4 Results and discussions down. After this step, different convolutional
layer models were built ranging from two to
6.4.1 Dataset eight convolutional layers. This whole process
completes the first stage of the experiment.
The dataset used in this experiment is taken In the second stage, features of the image
from Kaggle website [39]. It was generated and were extracted from different pretrained mod-
available by the ISIC and images are from the els (mentioned above). These features were
following sources: Memorial Sloan Kettering passed through global average pooling layer
Cancer Center, Melanoma Institute Australia, and a classifier was put on the top of it. Several
Hospital Clinic de Barcelona, Medical University machine learning techniques were used for
of Vienna, The University of Athens Medical final classification such as XGBoost, Random
School, and the University of Queensland. The Forest, SVM, k-NN, AdaBoost, ANN, LSTM,
dataset was highly imbalanced as the number of Bi-LSTM, and Bagging classifier.
melanoma cases was only 575, while on the
other hand the number of benign cases was
31,956. For balancing the dataset, equal number
of benign and melanoma cases was taken, that 6.4.3 Performance metrics
is, 575. The height and the width of images var- The dataset was initially imbalanced, for mak-
ied a lot. To make all the images have uniform ing it balanced equal number of positive and
size, they were converted to the size 224 3 224. negative cases was taken. It was then further
divided into three sets, that is, train, validation,
and test set and accuracies were recorded sepa-
6.4.2 Experimental setup
rately for them. Training was done on training
In the first stage, CNNs were used for classifi- set, hyperparameter tuning was done on valida-
cation of benign and melanoma images. A total tion set and finally model was tested on test set
of 11 different pretrained models (VGG16, to see how the model is generalized. After these
VGG19, MobileNet, MobileNetV2, InceptionV3, steps, F1 score was calculated on test set as it
InceptionResNetV2, DenseNet121, DenseNet169, includes both precision and recall which gives us
Xception, ResNet, and ResNet50) with some better idea whether the model is biased for one
fine-tunings were used for classification and particular class or not. Finally, AUC is calculated
their different accuracy parameters were noted for test set to check the model robustness further.
print_performance_metrics(y_test,y_pred)
6.4.4 Experimental results while the model with four layers performed
the worst with an F1 score of 0.657.
Different pretrained architectures were used From Table 6.2 to Table 6.12, deep feature
for classification which contains pretrained extraction techniques were used. Pretrained
weights. Apart from these pretrained architec- model architectures were used to extract the
tures, several CNNs with different number of features and then several machine learning
layers were also used. classifiers were put on top of that to give the
From Table 6.1, it can be observed that in final output.
pretrained models “MobileNet” architecture In Table 6.2, VGG16 architecture was used
performed the best with an F1 score of 0.8014. for feature extraction. It can be clearly seen that
On the other hand, “ResNet50” performed the “LSTM” classifier performed the best with an
worst with an F1 score of 0.371. Among the F1 score of 0.802. The worst performance was
CNN layers, the model with eight layers per- given by the “k-NN” classifier with an F1 score
formed the best with an F1 score of 0.7675, of 0.6446.
Model Training accuracy Validation accuracy Test accuracy F1 score ROC area
In Table 6.3, VGG19 architecture was used F1 score were “XGBoost” with a score of
for feature extraction. Among the classifiers, 0.818 and “k-NN” with a score of 0.6608,
“Bagging” classifier achieved the highest F1 respectively.
score, that is, 0.7861, while “k-NN” classifier In Table 6.5, MobileNetV2 architecture was
achieved the lowest F1 score, that is, 0.607. used for feature extraction. Among the classi-
In Table 6.4, MobileNet architecture was fiers, “XGBoost” performed the best with an F1
used for feature extraction. The classifiers score of 0.8128; on the other hand, “k-NN” per-
which achieved the highest and the lowest formed poorly with an F1 score of 0.6575.
Model Training accuracy Validation accuracy Test accuracy F1 score ROC area
In Table 6.8, DenseNet121 architecture was 0.8075, while “AdaBoost” classifier achieved
used for feature extraction. The classifiers with the lowest F1 score, that is, 0.6898.
the highest and the lowest F1 score were In Table 6.10, Xception architecture was
“ANN” with a score of 0.7968 and “k-NN” used for feature extraction. “Random
with a score of 0.6416, respectively. Forest” classifier achieved the highest F1
In Table 6.9, DenseNet169 architecture was score, that is, 0.7366, while “Bagging” classi-
used for feature extraction. “Bi-LSTM” classi- fier achieved the lowest F1 score, that is,
fier achieved the highest F1 score, that is, 0.6969.
In Table 6.11, ResNet architecture was used while “k-NN” performed the worst with an F1
for feature extraction. The classifiers with the score of 0.6769.
highest and the lowest F1 score were
“Bagging” with a score of 0.8021 and “k-NN”
with a score of 0.6636, respectively.
In Table 6.12, ResNet50 architecture was
6.4.5 Discussion
used for feature extraction. From the table, it From the results obtained, it is observed that
can be observed that “Random Forest” per- both end-to-end learning and deep feature
formed the best with an F1 score of 0.7967, extraction technique is effective in classification
of melanoma and benign cases. Especially, MobileNet architecture and XGBoost was used
MobileNet architecture performed very well in as a classifier achieved the highest F1 score of
both end-to-end learning and deep feature 0.818. Other classifiers such as LSTM, Random
extraction technique. In end-to-end learning, Forest, and ANN also performed well. Features
MobileNet architecture achieved the highest F1 extracted from VGG16, ResNet, MobileNetV2,
score of 0.8014. In feature extraction technique, and DenseNet169 also gave promising results
the model in which features were extracted from and secured the F1 score of more than 80.
6.5 Conclusion [2] The Skin Cancer Foundation, How dangerous is mela-
noma? It’s all a matter of timing ,https://www.skincan-
cer.org/blog/dangerous-melanoma-matter-timing/.,
Skin cancer is one of the most hazardous and Oct. 27, 2017 (accessed 03.02.21).
common diseases which world is facing right [3] V.C. Gorantla, J.M. Kirkwood, State of melanoma: an his-
now. Therefore proper research and actions need toric overview of a field in transition, Hematol. Oncol.
to be taken for reducing the spread of this dis- Clin. North. Am. 28 (3) (2014) 415435. Available from:
https://doi.org/10.1016/j.hoc.2014.02.010.
ease. AI has proven its importance in every field,
[4] Melanoma Research Alliance, 2020. Melanoma mortality
especially in medical science. To reduce its rates decreasing despite ongoing increase in
spread, the first step and most important step incidence, , https://www.curemelanoma.org/blog/arti-
is to detect it. Many times, people ignore it cle/2020-melanoma-mortality-rates-decreasing-despite-
and hesitate in going to the doctor because they ongoing-increase-in-incidence-rates . (accessed 03.02.21).
[5] Mayo Clinic, Skin cancer - symptoms and causes,
consider it as a very rare thing. This ignorance
, https://www.mayoclinic.org/diseases-conditions/
results in great disaster and the situation skin-cancer/symptoms-causes/syc-20377605 .
becomes out of control. If some kind of app or (accessed 03.02.2021).
website can be made which can deploy these [6] SEER, Melanoma of the skin - cancer stat facts,
machine learning models, then people will hesi- ,https://seer.cancer.gov/statfacts/html/melan.html.
(accessed 03.02.21).
tate less in checking this disease. More number
[7] AIM at Melanoma Foundation, Age and risk,
of cases can be detected on time and people can ,https://www.aimatmelanoma.org/melanoma-101/
be cured on time. Hence, these kinds of research understanding-melanoma/melanoma-risk-factors/age-
in medical science are very crucial. and-risk/. (accessed 03.02.21).
[8] Cancer young in adults. ,https://www.cancer.org/can-
cer/cancer-in-young-adults.html. (accessed 03.02.21).
References [9] Cancer Treatment Centers of America, Types of mela-
[1] Wikipedia, Melanoma, ,https://en.wikipedia.org/w/ noma: common, rare and more varieties. ,https://www.
index.php?title 5 Melanoma&oldid 5 1002726498., Jan. cancercenter.com/cancer-types/melanoma/types., Oct.
25, 2021 (accessed 03.02.21). 05, 2018 (accessed 03.02.21).
[10] Cancer Treatment Centers of America, What are the [22] N.C. Codella, et al., Deep learning ensembles for mela-
symptoms and signs of melanoma? ,https://www. noma recognition in dermoscopy images, IBM J. Res.
cancercenter.com/cancer-types/melanoma/ Dev. 61 (4/5) (2017). 51.
symptoms., Oct. 05, 2018 (accessed 03.02.21). [23] A.R. Lopez, X. Giro-i-Nieto, J. Burdick, O. Marques,
[11] Stages of melanoma skin cancer. ,https://www.can- Skin lesion classification from dermoscopic images
cer.org/cancer/melanoma-skin-cancer/detection- using deep learning techniques, in: 2017 13th IASTED
diagnosis-staging/melanoma-skin-cancer-stages. International Conference on Biomedical Engineering
html. (accessed 03.02.21). (BioMed), Feb. 2017, pp. 4954. Available from:
[12] Cancer Treatment Centers of America, Melanoma https://doi.org/10.2316/P.2017.852-053.
treatment options & advanced therapies, ,https:// [24] A. Astorino, A. Fuduli, P. Veltri, E. Vocaturo,
www.cancercenter.com/cancer-types/melanoma/ Melanoma detection by means of multiple instance
treatments., Oct. 05, 2018 (accessed 03.02.21). learning, Interdiscip. Sci. Comput. Life Sci 12 (1)
[13] CTCA, Chemotherapy: personalized therapies to treat (2020) 2431. Available from: https://doi.org/
cancer. ,https://www.cancercenter.com/treatment- 10.1007/s12539-019-00341-y.
options/chemotherapy. (accessed 03.02.21). [25] R. Ali, R.C. Hardie, B.N. Narayanan, S.D. Silva, Deep
[14] Cancer Treatment Centers of America, learning ensemble methods for skin lesion analysis
Immunotherapy to treat cancer: options & side effects, towards melanoma detection, in: 2019 IEEE National
Oct. 17, 2018. https://www.cancercenter.com/treat- Aerospace and Electronics Conference (NAECON),
ment-options/precision-medicine/immunotherapy Jul. 2019, pp. 311316. Available from: http://
(accessed Feb. 03, 2021). 10.1109/NAECON46414.2019.9058245.
[15] Cancer Treatment Centers of America, Radiation ther- [26] I. Yilmaz, N. Erik, O. Kaynar, Different types of learn-
apy: usages, side effects & more, Oct. 17, 2018. ing algorithms of artificial neural network (ANN)
,https://www.cancercenter.com/treatment-options/ models for prediction of gross calorific value (GCV) of
radiation-therapy., (accessed 03.02. 21). coals, Sci. Res. Essays 5 (2010) 22422249.
[16] Cancer Treatment Centers of America, What is cancer [27] S.-C. Wang, Artificial neural network, in: S.-C. Wang
surgery? | Options & side effects, ,https://www. (Ed.), Interdisciplinary Computing in Java
cancercenter.com/treatment-options/surgery., Oct. Programming, Springer US, Boston, MA, 2003,
17, 2018 (accessed 03.02.21). pp. 81100. Available from: https://doi.org/10.1007/
[17] Mayo Clinic, Melanoma - symptoms and causes, 978-1-4615-0377-4_5.
,https://www.mayoclinic.org/diseases-conditions/ [28] V. Skorpil, J. Stastny, Neural networks and back prop-
melanoma/symptoms-causes/syc-20374884. agation algorithm, Sep. 2006.
(accessed 28.02.21). [29] G. Guo, H. Wang, D. Bell, Y. Bi, “KNN model-based
[18] Y. Li, L. Shen, Skin lesion analysis towards melanoma approach in classification,” Aug. 2004.
detection using deep learning network, Sensors 18 (2) [30] T. Evgeniou, M. Pontil, Support vector machines: the-
(2018). Available from: https://doi.org/10.3390/ ory and applications, Advanced Course on Artificial
s18020556. Art. no. 2. Intelligence, vol. 2049, Springer, 2001, pp. 249257.
[19] P.B.C. Castro, B. Krohling, A.G.C. Pacheco, R.A. Available from: http://doi.org/10.1007/3-540-44673-
Krohling, “An app to detect melanoma using deep 7_12.
learning: an approach to handle imbalanced data [31] J. Ali, R. Khan, N. Ahmad, I. Maqsood, Random
based on evolutionary algorithms,” in: 2020 Forests and Decision Trees, Int. J. Computer Sci. Issues
International Joint Conference on Neural Networks (IJCSI) 9 (2012).
(IJCNN), Jul. 2020, pp. 16. Available from: https:// [32] T. Chen, C. Guestrin, XGBoost: a scalable tree boosting
doi.org/10.1109/IJCNN48605.2020.9207552. system, Aug. 2016, pp. 785794. Available from:
[20] S. Gulati and R. K. Bhogal, “Detection of malignant https://doi.org/10.1145/2939672.2939785.
melanoma using deep learning,” in: International [33] T. Chengsheng, L. Huacheng, X. Bing, AdaBoost
Conference on Advances in Computing and Data typical algorithm and its application research,
Science, Singapore, 2019, pp. 312325. Available from: MATEC Web of Conferences, 139, p. 00222, Jan. 2017,
https://doi.org/10.1007/978981-13-9939-8_28. Available from: https://doi.org/10.1051/matecconf/
[21] A.A. Adegun, S. Viriri, Deep learning-based system 201713900222.
for automatic melanoma detection, IEEE Access. 8 [34] P. Bühlmann, B. Yu, Analyzing bagging, Ann. Stat.
(2020) 71607172. Available from: https://doi.org/ 30 (4) (2002) 927961. Available from: https://doi.
10.1109/ACCESS.2019.2962812. org/10.1214/aos/1031689014.
7
Brain stroke detection from computed
tomography images using deep learning
algorithms
Aykut Diker1, Abdullah Elen1 and Abdulhamit Subasi2,3
1
Department of Software Engineering, Faculty of Engineering and Natural Sciences, Bandirma Onyedi
Eylul University, Bandirma, Balikesir, Turkey 2Institute of Biomedicine, Faculty of Medicine,
University of Turku, Turku, Finland 3Department of Computer Science, College of Engineering, Effat
University, Jeddah, Saudi Arabia
O U T L I N E
when the blood supply to the brain tissues is methods for automatically detecting the sever-
decreased; other stroke is hemorrhagic, and it ity of stroke disease to overcome these issues
occurs when a vessel inside the brain ruptures. [6]. Normal and stroke brain computed tomog-
Stroke, with the simplest definition, is a “brain raphy (CT) images are given in Fig. 7.1.
attack” caused by cessation of blood flow. Besides, scientists and engineers have fre-
Although stroke is used synonymously with quently used machine learning (ML) and artificial
the term hemiplegia, it is colloquially known intelligence methods for the detection and classifi-
as paralysis. Its distribution is age-related and cation of stroke. Al-Qazzaz et al. [7] suggested an
doubles every 10 years after the age of 55. It autonomous machine interface structure to deter-
ranks third among the causes of death in the mine rehabilitation changes and present a new
population over 65 years of age, after heart dis- method based on BCI. The EEG samples from
eases and cancer. Since stroke occurs as a sud- poststroke subjects with upper extremity hemi-
den event, this event that happens to a person paresis were examined. Additionally, Random
unexpectedly is considered as a crisis situation Forest (RF), Support vector machine (SVM), and
[35]. MRI images contain important informa- k-NN classifiers were used to classify EEG sig-
tion for the classification of stroke severity. nals. Sung et al. [8] proposed an ML program that
On the other hand, it is very difficult to inter- could detect subjects with suspected stroke during
pret the scan results as small changes in these emergency department triage. The application
images are spots that indicate the severity of can be integrated into an electronic triage system
the stroke. This complexity corresponds to a and used to initiate code strokes. To develop
significant amount of time spent manually ana- stroke classification models, researchers investi-
lyzing images. The mental efforts required to gated SVM, RF, k-NN, C4.5, CART, and logistic
determine the severity of a stroke create regression (LR). In the experimental study, the
exhaustion, and fatigue can contribute to accuracy values of C4.5, CART, RF, k-NN, SVM,
human error, which affects diagnostic quality. and LR classifiers were obtained 81.2%, 81.1%,
On the outside of fatigue, all human-based 82.6%, 80.6%, 81.3%, and 82.0%, respectively. Ref.
classification approaches have interobserver [9] presents a new deep learning-based technique
and intraobserver variability. Clinical analyst for segmenting stroke lesions in CT perfusion
education is a technique that can help to miti- maps. The suggested approach was tested using
gate some of these errors. However, educating the ISLES 2018 dataset. The positive predictive
someone to be an expert takes time and money, value and sensitivity (SEN) value of the proposed
making regular stroke risk prediction activities method were obtained as 68% and 67%, respec-
uneconomical. Researchers have developed tively. In Ref. [10] a method generated from the
FIGURE 7.1 Normal and stroke brain CT image samples. CT, Computed tomography.
RF, Naive Bayes, k-NN, LR, Decision Tree, SVM, 7.3.1 AlexNet
multilayer perceptron neural network, and deep
learning, to classify stroke into its two subtypes Even though it is stated that Yann LeCun
(ischemic and hemorrhage). According to their employed deep learning for the first time in an
experimental results, they reported that the RF article published in 1998, it was first heard
algorithm gave the best classification score with worldwide in 2012. The AlexNet model,
95.97% accuracy. designed with a deep learning architecture,
won the ImageNet competition held that year.
The study was announced with the paper
named “ImageNet Classification with Deep
7.3 Deep learning methods Convolutional Networks.” The computerized
object recognition error rate was lowered from
In this chapter, a model based on the CNN 26.2% to 15.4% using this deep learning model.
has been proposed to classify it as stroke and The architecture is consisting of five convolu-
normal, which consists of two classes in total. tion layers, a pooling layer, and three fully con-
The proposed CNN structure is given in Fig. 7.2. nected layers (FCL) given in Fig. 7.3. The first
convolutional layer filters the input image
Conv 1 96 11 3 11 4 55 3 55 3 96 ReLU
%% Load AlexNet
%Pretrained model "AlexNet "
net = alexnet();
%% AlexNet training options
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose',true);
[net,info] = trainNetwork(imdsTrain,layers,options);
%% Alexnet Classification
[YPred,scores] = classify(net,imdsTest);% Testing for accuracy
testLabels = imdsTest. Labels;
trainLabels = imdsTrain. Labels;
Accuracy_AlexNet = mean(YPred == testLabels)*100;% Alexnet Classification result
%% Load GoogleNet
%Pretrained model "GoogleNet "
net = googlenet();
%% GoogleNet
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose',true);
[net,info] = trainNetwork(imdsTrain,layers,options);
%% GoogleNet Classification
[YPred,scores] = classify(net,imdsTest); % Testing for accuracy
testLabels = imdsTest. Labels;
trainLabels = imdsTrain. Labels;
Accuracy_GoogleNet = mean(YPred == testLabels)*100; % GoogleNet Classification result
%% Load VGG-16
%Pretrained model "VGG-16 "
net = vgg16();
%% VGG-16 training options
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose',true);
[net,info] = trainNetwork(imdsTrain,layers,options);
%% VGG-16 Classification
[YPred,scores] = classify(net,imdsTest);% Testing for accuracy
testLabels = imdsTest. Labels;
trainLabels = imdsTrain. Labels;
Accuracy_VGG-16 = mean(YPred == testLabels)*100;% VGG-16 Classification result
nonlinear activation. Five pooling layers are
responsible for spatial pooling. A 2 3 2 size fil-
ter and stride 2 are used for max-pooling.
Three FCL are constructed after a succession of
convolutional and max-pooling layers. The
softmax layer is the final layer [24]. The archi-
tecture of VGG-16 is shown in Fig. 7.6.
7.3.5 VGG-19
VGG-19 is a 19-layer CNN that has been
pretrained. The model has trained on over a
FIGURE 7.5 The residual learning: a building block. million data which contains roughly 10,000
%% Load VGG-19
%Pretrained model "VGG-19 "
net = vgg19();
%% VGG-19 training options
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose',true);
[net,info] = trainNetwork(imdsTrain,layers,options);
%% VGG-19 Classification
[YPred,scores] = classify(net,imdsTest); % Testing for accuracy
testLabels = imdsTest. Labels;
trainLabels = imdsTrain. Labels;
Accuracy_VGG-19 = mean(YPred == testLabels)*100;% VGG-19 Classification result
FIGURE 7.9 CT images of (A) normal and (B) stroke. CT, Computed tomography.
FIGURE 7.10 Confusion matrix and ROC of AlexNet, GoogleNet, and VGG-19. ROC, Receiver operating characteristic.
FIGURE 7.11 Confusion matrix and ROC of Residual CNN and VGG-16. ROC, Receiver operating characteristic.
TABLE 7.2 Stroke classification performance All scores of the classifiers pretrained CNN
evaluation results. architectures are reported in Table 7.2. With
CNN models ACC SEN SPE F-score respect to Table 7.2, the highest stroke classifi-
cation performance was reached 97.06% with
AlexNet 94.53% 98.06% 88.77% 93.18%
VGG-19 pretrained CNN model. Besides, the
GoogleNet 92.00% 95.26% 86.66% 90.76% training loss and training accuracy of the
Residual CNN 94.80% 98.06% 89.47% 93.57% model are given in Fig. 7.12.
VGG-16 94.66% 96.98% 90.87% 93.83%
Consequently, the maximum accuracy value
for stroke classification was achieved ACC
VGG-19 97.06% 97.41% 96.49% 96.95%
97.06%, SEN 97.41%, SPE 96.49%, and F-Score
Chin et al. (2017) [27] Deep learning (CNN) CT image dataset 90%
Karthik et al. [28] Fully convolutional network (FCN) MRI image dataset with 4,284 samples 70%
Liu et al. [29] Support vector machine (SVM) CT-scan image dataset with 1,157 samples 83.3%
Gaidhani et al. [30] Deep learning models (LeNet and SegNet) MRI scan with 400 samples 96%97%
Badriyah et al. [31] Random Forest CT scan images from 102 patients 95.97%
[11] M. Bento, R. Souza, M. Salluzzi, L. Rittner, Y. Zhang, GoogLeNet encodings, Comput. Biol. Med. 125 (June)
R. Frayne, Automatic identification of atherosclerosis (2020) 103993.
subjects in a heterogeneous MR brain imaging data [22] A. Diker, Sıtma Hastalığının Sınıflandırılmasında
set, Magn. Reson. Imaging 62 (June) (2019) 1827. Evrişimsel Sinir Ağlarının Performanslarının
[12] P.P. Rebouças Filho, R.M. Sarmento, G.B. Holanda, D. de Karşılaştırılması, BEÜ Fen. Bilim. Derg. 9 (4) (2020)
Alencar Lima, New approach to detect and classify 18251835.
stroke in skull CT images via analysis of brain tissue den- [23] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning
sities, Comput. Methods Prog. Biomed. 148 (2017) 2743. for image recognition, in: Proc. IEEE Comput. Soc.
[13] J. Vargas, A. Spiotta, A.R. Chatterjee, Initial experi- Conf. Comput. Vis. Pattern Recognit., vol. 2016Dec,
ences with artificial neural networks in the detection pp. 770778, 2016.
of computed tomography perfusion deficits, World [24] P. Saha, M.S. Sadi, O.F.M.R.R. Aranya, S. Jahan, F.-A.
Neurosurg. 124 (2019) e10e16. Islam, COV-VGX: An automated COVID-19 detection
[14] A. Gautam, B. Raman, Towards effective classification system using X-ray images and transfer learning,
of brain hemorrhagic and ischemic stroke using CNN, Inform. Med. Unlocked 26 (2021) 100741.
Biomed. Signal. Process. Control. 63 (April 2020) [25] S. Agarwal, A. Rattani, C.R. Chowdary, A comparative
(2021) 102178. study on handcrafted features v/s deep features for
[15] R. Kanchana, R. Menaka, Ischemic stroke lesion detec- open-set fingerprint liveness detection, Pattern
tion, characterization and classification in CT images Recognit. Lett. 147 (2021) 3440.
with optimal features selection, Biomed. Eng. Lett. [26] A. Rahman, Brain Stroke CT Image Dataset, Kaggle,
10 (3) (2020) 333344. ,https://www.kaggle.com/afridirahman/brain-
[16] U. Raghavendra, T.-H. Pham, A. Gudigar, V. Vidhya, stroke-ct-image-dataset., 2021. (accessed 10.11.21)
B.N. Rao, S. Sabut, et al., Novel and accurate non- [27] C.L. Chin, B.J. Lin, G.R. Wu, T.C. Weng, C.S. Yang, R.
linear index for the automated detection of haemor- C. Su, et al., An automated early ischemic stroke
rhagic brain stroke using CT images, Complex. Intell. detection system using CNN deep learning algorithm,
Syst. 7 (2) (2021) 929940. in: Proc. - 2017 IEEE 8th Int. Conf. Aware. Sci.
[17] L. Herzog, E. Murina, O. Dürr, S. Wegener, B. Sick, Technol. iCAST 2017, vol. 2018January, no. iCAST,
Integrating uncertainty in deep neural networks for pp. 368372, 2017.
MRI based stroke analysis, Med. Image Anal. [28] R. Karthik, U. Gupta, A. Jha, R. Rajalakshmi, R.
65 (2020) 121. Menaka, A deep supervised approach for ischemic
[18] T. Badriyah, N. Sakinah, I. Syarif, D.R. Syarif, lesion segmentation from multimodal MRI using Fully
Machine Learning Algorithm for Stroke Disease Convolutional Network, Appl. Soft Comput. J.
Classification, 2020 International Conference on 84 (2019) 105685.
Electrical, Communication, and Computer Engineering [29] J. Liu, H. Xu, Q. Chen, T. Zhang, W. Sheng, Q. Huang,
(ICECCE) 1 (5) (2020). Available from: doi:10.1109/ et al., Prediction of hematoma expansion in spontane-
ICECCE49384.2020.9179307. In this issue. ous intracerebral hemorrhage using support vector
[19] J. Chen, Z. Wan, J. Zhang, W. Li, Y. Chen, Y. Li, et al., machine, EBioMedicine 43 (2019) 454459.
Medical image segmentation and reconstruction of [30] B.R. Gaidhani, R. Rajamenakshi, S. Sonavane, Brain
prostate tumor based on 3D AlexNet, Comput. stroke detection using convolutional neural network
Methods Prog. Biomed. 200 (2021) 105878. and deep learning models, in: 2019 2nd Int. Conf.
[20] Ö. İnik, E. Ülker, Deep learning and deep learning Intell. Commun. Comput. Tech. ICCT 2019,
models used in image analysis, Gaziosmanpasa J. Sci. pp. 242249, 2019.
Res. 6 (3) (2017) 85104. [31] T. Badriyah, N. Sakinah, I. Syriaf, Machine Learning
[21] S. Deepak, P.M. Ameer, Retrieval of brain MRI with Algorithm for Classification, J. Phys. Conf. Ser. 1994
tumor using contrastive loss based similarity on (1) (2021) 1213.
8
A deep learning approach for COVID-19
detection from computed tomography
scans
Ashutosh Varshney1 and Abdulhamit Subasi2,3
1
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal,
India 2Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 3Department
of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
8.1 Introduction are the white glassy patches called the ground-
glass opacity [3]. Our intuition therefore was to
Coronavirus disease 2019 (COVID-19) has make use of CNNs to identify these unique and
affected more than 17 million people all around COVID-specific features in the CT scans which
the world and caused 681 K deaths worldwide, might not be distinguishable through the naked
as of August 2 in 2020 [1]. The estimated viral eye. Hence, the purpose of our research was to
reproduction number tells that an infected indi- study the diagnostic performance of various deep
vidual can transmit this deadly disease to around learning models using CT images to screen for
2.5 noninfected individuals [1] with low immu- COVID-19.
nity, indicating a high risk of massive spread of
the disease. Therefore it is crucial to have fast
testing of suspected individuals as early as possi- 8.2 Literature review
ble for quarantine and treatment purposes.
The major problem in disease control is the The recent advances in medical imaging and
lack of sufficient test kits available for testing. disease prediction have witnessed a great rise in
The current tests are mostly based on reverse machine learning methods. But these methods, to
transcription-polymerase chain reaction (RT- work accurately, require features to build upon
PCR). The RT-PCR tests are not very accurate which are themselves not very easily extractable.
and might sometimes give false-positive results. Therefore there has been high amount of prog-
It is reported that many “suspected” cases with ress in deep learning models because they can
typical clinical characteristics of COVID-19 and extract features or can make use of some pretrain-
identical specific computed tomography (CT) ing beforehand [4] and the whole process can be
images were not diagnosed [2]. The test takes molded into one single step.
around 6 hours which is very slow compared to There have been several previous attempts of
the disease spreading rate. Thus the shortage of using deep learning methods for detecting
RT-PCR kits and their inaccurate results moti- COVID-19 from chest CT and X-Ray scans. Shin
vates us to study an alternative testing proce- et al. [5] used deep CNN to classify the intersti-
dure, which can be made widely available and tial lung disease in CT images. Pezeshk et al. [6]
is faster, cheaper, and more feasible than RT- used 3D CNN to detect the pulmonary nodules
PCR, in particular, CT scans. in chest CT. Li et al. [7] developed a 3D deep
The CT images of various viral pneumonia learning architecture COVNet for COVID-19
and other lung diseases are more or less similar detection. Panwar et al. [8] used a Grad-CAM
to that of COVID-19, therefore it becomes difficult based color visualization along with a transfer
for radiologists to diagnose the disease as well as learning approach for COVID-19 detection.
to distinguish it from other viral pneumonia [3]. Meng et al. [9] showed that CT scans of patients
There are several artificial intelligence techniques with COVID-19 have definite characteristics.
developed to extract shape and spatiotemporal Das et al. [10] also tried to use a transfer learning
features from images and use it for disease pre- approach using Xception architecture. Similarly,
diction. There has been recent development in Lalmuanawma et al. [11] also showed that
medical imaging techniques using deep learning, machine learning and AI-based methods can
especially convolutional neural network (CNN). prove to be helpful in automatic detection of
Several features are used for identifying viral COVID-19. Ardakani et al. [12] explored the pos-
pathogens on the basis of imaging patterns, sibility of CNNs being used, whereas Panwar
which are associated with their specific pathogen- et al. [13] proposed nCOVnet which makes use
esis. The main identifying features of COVID-19 of X-Ray scans for the classification.
The same training and validation datasets were passed through a convolutional layer and then
selected for all networks to facilitate the perfor- through a pretrained model. The output is then
mance comparison of networks. Fig. 8.2 depicts passed through a series of pooling, batch normal-
our proposed architecture. The input image is ization, and dense layers to get the final output.
def select_pretrained_model(name):
# function to select one of the pre-trained ImageNet models
pretrainedbase = None
if name == "DenseNet121":
pretrainedbase = DenseNet121(weights='imagenet', include_top=False)
if name == "VGG16":
pretrainedbase = VGG16(weights='imagenet', include_top=False)
if name == "VGG19":
pretrainedbase = VGG19(weights='imagenet', include_top=False)
if name == "ResNet50":
pretrainedbase = ResNet50(weights='imagenet', include_top=False)
if name == "InceptionV3":
pretrainedbase = InceptionV3(weights='imagenet', include_top=False)
if name == "InceptionResNetV2":
pretrainedbase = InceptionResNetV2(weights='imagenet', include_top=False)
if name == "Xception":
pretrainedbase = Xception(weights='imagenet', include_top=False)
if name == "MobileNet":
pretrainedbase = MobileNet(weights='imagenet', include_top=False)
return pretrainedbase
def transfer_learning(modelname):
# function to train model using transfer learning
base = select_pretrained_model(modelname)
model = build_model(base,2)
optimizer = Adam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=0.1, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=optimizer,
metrics=['accuracy']) model.summary()
checkpoint = ModelCheckpoint('model.h5', verbose=1, save_best_only=True)
history = model.fit(X_train, Y_train, validation_split=0.25, epochs=50,
batch_size=64, verbose=2,callbacks=[checkpoint])
return history
Output
CT Scans
Output
Pre-trained Model Layer
Fully
Connected
Layer
DenseNets comprise dense blocks which are of output feature maps with very few convolu-
based upon this idea of collective knowledge. tion operations. The growth rate k is also kept
Since the feature maps are getting concatenated, low which further optimizes the computation
the channel dimension keeps on increasing at and makes it much more memory efficient [29].
8.4.3 MobileNet
module can be summarized as a depthwise
MobileNets are based on a streamlined archi- convolution followed by a pointwise convolu-
tecture that uses depth-wise separable convolu- tion. Thus the depthwise separable convolution
tions to build lightweight deep neural networks. used here is just an Inception module with
They are known to be memory efficient and large number of towers. The data first goes
lightweight because of a smaller number of para- through the entry flow, then through the mid-
meters to be trained upon. They make use of dle flow, which is repeated eight times and
dense blocks efficiently by having large number finally through the exit flow [30].
8.4.5 Visual geometry group (VGG) to use the convolutional operation. For exam-
ple, in AlextNet, we have the convolutional
The visual geometry group (VGG) architecture operation and max-pooling operation follow-
is very simple to understand. The input image is ing each other, whereas in VGGNet, we have
of size 224 3 224 and is then passed through a three convolutional operations in a row and
series of five convolution blocks consisting of then one max-pooling layer. Thus the idea
3 3 3 kernels with stride 1 and ReLU activations. behind GoogLeNet is to use all the operations
The convolution blocks are followed by max- at the same time. It computes multiple kernels
pooling. Finally, it is passed through three fully of different size over the same input map in
connected layers with last one having a dimen- parallel, concatenating their results into a sin-
sion of 1000 corresponding to 1000 different image gle output. The intuition is that convolution
classes [31]. VGG architectures have also proved
to be very successful in ImageNet challenge.
history = transfer_learning("VGG19") ####change the model name here
model = load_model('./model.h5')
final_loss, final_accuracy = model.evaluate(X_test, Y_test)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))
plot_model_accuracy()
plot_model_loss()
8.4.6 Inception/GoogLeNet
In most of the standard network architec- filters of different sizes will handle objects at
tures, the intuition is not clear why and when multiple scale better. This is called an
to perform the max-pooling operation, when Inception module [32].
K-NN
SVM
RF
Bagging
AdaBoost
XGBoost
CT Scan
FIGURE 8.3 COVID-19 detection using deep feature extraction and conventional machine learning.
def featureextraction(modelname):
base = select_pretrained_model(modelname)
model = build_model(base,500)
X_T = np.array([model.predict(i.reshape(1,200,200,3)).reshape(500) for i in X_Train])
return X_T
no (negative). A false negative (FN) is when classified just by chance by the classifier using
the outcome is incorrectly predicted as nega- the concepts of expectation [36]. The kappa sta-
tive when it is actually positive [36]. tistic (κ) always considers the predictions to be
Information retrieval researchers define para- by chance and probabilistic. The probabilities of
meters called recall and precision: getting a correct prediction by chance and confi-
dence are calculated using the concept of expec-
TP
Recall 5 tation of a random variable. The kappa statistic
TP 1 FN is the most frequently used statistic for the eval-
TP uation of categorical data when there is no inde-
Precision 5
TP 1 FP pendent means of assessing the probability of
Then F1 measure can be formulated as: chance agreement between two or more obser-
vers. A kappa value of 0 designates agreement
2 Precision Recall equivalent to chance, whereas a kappa value of
F1 5
Precision 1 Recall 1 designates perfect agreement [38,39]. Cohen
Another important measure is the confusion [40] defined the kappa statistic as an agreement
matrix which can be formulated as: index and defined as the following:
P0 2 Pe
TP FP K5
Confusion Matrix 5 1 2 Pe
FN TN
Where P0 is observed agreement and Pe mea-
sures the agreement expected by chance [41].
8.6.1.2 Receiver operating characteristic
(ROC) analysis
Receiver operating characteristic (ROC) curves
8.6.2 Experimental results
depict the performance of a classifier without
taking into account the actual error rate or cost. 8.6.2.1 Transfer learning
The curve is plotted by plotting “true positive Table 8.1 shows the experiment results for
rate” on the y-axis and “true negative rate” on transfer learning performed on various pre-
the x-axis. Formally: trained CNNs using our proposed architecture.
TP Rate 5 100 3 TP=ðTP 3 FN Þ We were able to achieve a high-test accuracy of
def print_scores(Y_test,Y_pred):
print("Confusion Matrix: ", confusion_matrix(Y_test,Y_pred))
print("F1 score: ", f1_score(Y_test,Y_pred))
print("Kappa: ", cohen_kappa_score(Y_test,Y_pred))
print("ROC area: ", roc_auc_score(Y_test,Y_pred))
TABLE 8.3 Deep feature extraction along with SVM TABLE 8.4 Deep feature extraction along with
classifier. Random Forest classifier.
Accuracy F1 score Kappa AUC Accuracy F1 score Kappa AUC
MobileNet 86.92% 0.869 0.738 0.869 MobileNet 81.69% 0.8169 0.634 0.817
Xception 79.07% 0.786 0.581 0.79 Xception 77.86% 0.786 0.558 0.779
InceptionV3 75.85% 0.763 0.517 0.758 InceptionV3 67.80% 0.676 0.356 0.678
InceptionResNetV2 78.87% 0.789 0.578 0.788 InceptionResNetV2 78.47% 0.784 0.569 0.784
VGG19 87.92% 0.882 0.758 0.879 VGG19 81.08% 0.816 0.622 0.811
VGG16 88.93% 0.888 0.778 0.889 VGG16 84.31% 0.846 0.686 0.843
DenseNet121 85.31% 0.853 0.706 0.853 ResNet50 85.11% 0.85 0.702 0.851
ResNet50 85.91% 0.864 0.718 0.859 DenseNet121 81.08% 0.811 0.622 0.811
8.6.2.2.3 Random Forest TABLE 8.5 Deep feature extraction along with
In the case of Random Forest, ResNet50 per- AdaBoost.
formed the best with an accuracy of 85.11%, beat- Accuracy F1 score Kappa AUC
ing VGG16 by about a difference of 0.8% which
MobileNet 77.66% 0.782 0.553 0.777
is not a large margin. Similarly, the F1 score,
AUC, and kappa values were also greater in case Xception 74.04% 0.746 0.481 0.741
of ResNet50 compared to VGG16 by a very small InceptionV3 64.19% 0.645 0.284 0.642
margin. InceptionV3 and InceptionResNetV2
InceptionResNetV2 73.84% 0.734 0.476 0.738
showed extremely poor compared to others, fol-
lowing the trend of the results of the K-NN and VGG19 80.28% 0.797 0.605 0.803
SVM classifier. Similar to SVM, the performance VGG16 82.49% 0.825 0.649 0.825
of MobileNet, DenseNet121, and VGG19 was
DenseNet121 78.67% 0.784 0.574 0.786
also very good with the test accuracy being
greater than 80% for all the three (Table 8.4). ResNet50 78.87% 0.791 0.577 0.788
MobilNet 81.08% 0.816 0.622 0.811 MobileNet 80.88% 0.809 0.6177 0.809
Xception 77.06% 0.771 0.541 0.771 Xception 77.86% 0.782 0.557 0.778
InceptionV3 68.41% 0.688 0.368 0.684 InceptionV3 68.61% 0.691 0.372 0.686
InceptionResNetV2 74.64% 0.746 0.493 0.746 InceptionResNetV2 78.47% 0.787 0.569 0.785
VGG19 82.49% 0.821 0.649 0.825 VGG19 82.89% 0.828 0.658 0.829
VGG16 85.71% 0.857 0.714 0.857 VGG16 85.51% 0.858 0.71 0.855
DenseNet121 77.46% 0.776 0.549 0.775 DenseNet121 78.87% 0.793 0.577 0.788
ResNet50 83.90% 0.839 0.678 0.839 ResNet50 80.48% 0.804 0.609 0.805
comparable performance. VGG19 was the ResNet50 gave great results with the maximum
second-best performer after VGG16. It should be value of 87.92% and 85.91% with SVM, respec-
noted that both these methods could not beat K- tively. They both lagged behind VGG16 by only
NN, SVM, and Random Forest as the obtained a margin of 2%4% for all the classifiers used.
scores were lesser compared to them. InceptionV3 performed the worst, whereas the
performance of MobileNet and DenseNet121
8.6.2.2.5 Bagging was comparable to VGG16 and VGG19.
Table 8.7 shows the results of the Bagging clas-
sifier, which proved to be on average with the
8.6.3 Discussion
XGBoost algorithm. The results for all the models
were almost similar with no substantial changes Our proposed methods and architecture beat
observed in the test accuracies. The best test accu- various other deep learning models. Our trans-
racy of 85.51% obtained is for VGG16 compared fer learning approach yielded an accuracy of
to a value of 85.71% for XGBoost. The F1 score, 98.30% compared to an accuracy of 95.12% by
ROC AUC, and kappa are almost same for the DeTrac [42] and 96% by Bai et al. [43]. Our
cases. The results show that both Boosting and model also performed better on the AUC, with
Bagging algorithms are equally efficient in classi- a value of 0.982 compared to 0.96 by Li et al. [7].
fying the extracted features. Similarly, our machine learning-based approach
For almost all the machine learning classi- yielded an accuracy of 91.75%, which is by far
fiers, VGG16 proved to give the highest test better than the deep learning approaches, such
accuracy among all the models. The highest as accuracy of 90.1% by Zheng et al. [44]. AUC
value obtained was 91.75% with the K-NN clas- obtained by using SVM on our feature extractor
sifier. The F1 score, AUC, and Cohen’s kappa is 0.889, which is better than 0.862 reported by
were also the highest in this case. Among the Mei et al. [45], which shows that our proposed
machine learning classifiers, K-NN seemed to architecture is better at extracting features.
work the best followed by SVM, XGBoost, These results support other unique feature
Bagging, Random Forest, and AdaBoost, respec- extraction-based methods [46,47] and novel
tively. It should also be noted that VGG19 and techniques [48,49] in this area.
[41] Z. Yang, M. Zhou, Kappa statistic for clustered physi- 15. Available from: https://doi.org/10.1038/s41591-
cianpatients polytomous data, Comput. Stat. Data 020-0931-3.
Anal. 87 (2015) 117. [46] F. Ozyurt, T. Tuncer, A. Subasi, An automated COVID-
[42] (PDF) Classification of COVID-19 in chest X-ray 19 detection based on fused dynamic exemplar pyramid
images using DeTraC deep convolutional neural network feature extraction and hybrid feature selection using
(2020). https://www.researchgate.net/publication/ deep learning, Comput. Biol. Med. 132 (2021) 104356.
340332332_Classification_of_COVID-19_in_chest_X-ray_ [47] T. Tuncer, F. Ozyurt, S. Dogan, A. Subasi, A novel
images_using_DeTraC_deep_convolutional_neural_net- Covid-19 and pneumonia classification method based
work (accessed 09.08.20). on F-transform, Chemometr. Intell. Lab. Syst. 210
[43] H.X. Bai, et al., AI augmentation of radiologist perfor- (2021) 104256. Available from: https://doi.org/
mance in distinguishing COVID-19 from pneumonia 10.1016/j.chemolab.2021.104256. 15 March 2021.
of other etiology on chest CT, Radiology 296 (2020). [48] A. Subasi, S.A. Qureshi, T. Brahimi, A. Serireti,
Available from: https://pubs.rsna.org/doi/full/ COVID-19 detection from X-Ray images using artifi-
10.1148/radiol.2020201491 (accessed 09.08.20). cial intelligence, Artificial Intelligence and Big Data
[44] C. Zheng et al., Deep learning-based detection for Analytics for Smart Healthcare, Elesevier, 2021.
COVID-19 from chest CT using weak label, infectious [49] A. Subasi, A. Mitra, F. Ozyurt, T. Tuncer, Automated
diseases (except HIV/AIDS), preprint, Mar. 2020. doi: Covid-19 detection from CT images using deep learn-
10.1101/2020.03.12.20027185. ing, in: V. Bajaj, G.R. Sinha (Eds.), Computer-aided
[45] X. Mei, et al., Artificial intelligence-enabled rapid Diagnosis and Design Methods for Biomedical
diagnosis of patients with COVID-19, Nat. Med. (2020) Applications, CRC Press, Taylor & Francis, 2021.
9
Detection and classification of Diabetic
Retinopathy Lesions using deep learning
Siddhesh Shelke1 and Abdulhamit Subasi2,3
1
Indian Institute of Technology, Indore, Madhya Pradesh, India 2Institute of Biomedicine, Faculty of
Medicine, University of Turku, Turku, Finland 3Department of Computer Science, College of
Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
various sorts of lesions on a retina picture is used impossible to treat. Early detection by diagno-
to identify DR. DR is one of the leading causes of sis is essential because it is often treated effec-
blindness in the working age of developing coun- tively in the early stages. The cost of this task
tries. It is estimated that more than 93 million peo- is high, so early detection is important to
ple will be affected. It is an “eye disease” that reduce labor. It is needed to automatically
causes meningitis as a long-term consequence of detect defects in the eye image at low cost
diabetes, which results in progressive damage to using digital image processing and artificial
the eye and even blindness [1]. Because diabetes intelligence (AI) algorithms. In DR, blood ves-
is a progressive disease, doctors recommend that sels that help in sustaining the retina begins to
people with diabetes be checked at least twice a leak fluid and blood onto the retina, which can
year to identify symptoms regularly. result in formation of visual features known as
Risks factors of DR: lesions such as microaneurysms, hemorrhages,
hard exudates, cotton spots, and vessels area
• Duration of diabetes:
[1]. In a medical diagnosis, an ophthalmologist
• A patient diagnosed before age 30 years
examines an image of a colored background to
• 50% DR after 10 years
examine the patient’s condition. This diagnosis
• 90% DR after 30 years
is difficult and time-consuming and introduces
• Poor metabolic control:
additional errors. Furthermore, because of the
• It is less essential but quite pertinent to
vast number of diabetics and a lack of health
the onset and progression of DR.
resources in some locations, most DR patients
• Increased HbA1c is associated with the
are unable to be detected and treated on time,
increased risk.
suffer permanent eyesight loss, and even lack
• Pregnancy:
vision.
• It is linked to a rapid progression of DR
Rapid detection of DR, especially in early
• Prediction factors—Poor pregnancy
stages, can effectively control and delay degen-
control of diabetes mellitus (DM), too
erative conditions. At the same time, the
rapid control during the early stages of
impact of hand interpretation depends to a
pregnancy, preeclampsia, and fluid
large extent on the understanding of the doc-
imbalance are all risk factors.
tor. Medical malpractice occurs due to incom-
• Hypertension:
petence of doctors. Convolutional neural
• It is most common in patients with DM
networks (CNNs) have surpassed all previous
type 2.
image analysis techniques in computer vision
• Should strictly control (,140/80 mm of Hg).
and image classification tasks over the past
• Nephropathy:
decade. Computer-assisted diagnosis is more
• Associated with the worsening of DR.
effective because it allows screening for a large
• Renal transplantation may be linked to a
number of diseases. The conditions that cause
reduction in DR and a better response to
microangiopathy can lead to the formation of
photocoagulation.
microaneurysms. Hard exudates are white or
• Others:
creamy colors that are very bright in the retina.
• Smoking
If they appear near the middle of the macula
• Obesity
and show fluid in the eyeball, they are consid-
• Hyperlipidemia
ered very dangerous. Bleeding scarring is the
• Anemia
most common type of bleeding due to DR. It is
DR is a chronic disease that appears only a small hemorrhage that originates from the
in the late stages when it is difficult and cervical network. Planting lesions are upper
Mild DR
Moderate DR
Severe DR
Proliferative DR
can be more blind than patients without diabe- 9.2.1 Traditional diabetic retinopathy
tes. AIDS and macular edema (both essential in detection approach
the hospital) can cause severe vision loss. This
affects the eyeballs and can cause blindness in Chandrashekar [7] proposed a method for
diabetics. DR affects many diabetics in devel- extracting retinal vessels from retinal fundus
oped countries [2]. images using morphological approaches. Kaur
The goal of this chapter is to use fundus and Sinha [8] suggested a blood vessel segmen-
image classification to enforce a direct synthe- tation approach based on morphological filters.
sis of DR. We are working on categorizing fun- There is no noticeable improvement in perfor-
dus images depending on the level of DR, with mance when increasing the number of filter
the goal of achieving end-to-end actual classifi- banks; instead, the convolution process, which is
cation from fundus images to medical status. time-consuming task, is increased. Jaffar et al.
Rather than the doctors’ manual control with [9] proposed a way that uses reconciling thresh-
expertise, it helps to relieve their pressure on olding for exudate detection and eliminates arti-
the diagnosing and treating of DR in a simple facts from the exudates; the retinal structure
and accurate manner. For this task, a variety of area unit is utilized in classification. The pro-
image preprocessing and AI techniques to jected technique failed to cowl all the DR signs;
extract many key features is used and then it has to be explored. Jiang and Mojon [10]
classify them into their corresponding classes. proposed a way, reconciling thresholding on
We use CNN architecture to detect DR in verification-based multithreshold inquisitory
two datasets. The precision, recall, accuracy, approach. With international thresholding, the
receiver operating characteristic (ROC), and blood vessels can’t be divided because of the
area under curve (AUC) measures are all eval- image gradients. So, image inquisitory with var-
uated. We also plot the confusion matrix, ied threshold values does not extract the thre-
which helps us to confirm the strength of the sholded image. Clara I Sánchez et al. [11]
model visually. proposed a combination model that separates
the exudates from the image background, and
edge detection strategies area unit want to sepa-
rate laborious and soft exudates.
9.2 Literature survey on diabetic Goh et al. [12] classified retinal images using
retinopathy detection various classifiers. On fundus images, segmen-
tation was used to differentiate blood vessels,
Much effort has been made in DR detection. microaneurysms, and exudates. The classifiers
There are many ways to find a DR. Scientists were given the segmented region, textural data,
have worked on a variety of techniques to treat and other information derived from Gray-Level
a variety of injuries such as blood vessels, Co-Occurrence Matrix (GLCM) to categorize the
microaneurysms, secretions, and bleeding. normal and abnormal images. On normal
Changes in the shape and size of blood vessels images, the detection system has a success rate
can be a positive sign of DR, as well as the of 92%, while on aberrant images, it has a suc-
presence of various types of lesions that con- cess rate of 91%. Liew et al. [13] employed a sta-
tribute to the diagnosis of diabetes. As a result, tistical technique to demonstrate the
various studies on the autonomic nervous sys- relationship between retinal vascular indicators
tem fall into two categories [46]. and the importance of both qualitative and
Image
Segmentation
No-DR
Mild, Moderate,
Severe,
Proliferate-DR
FIGURE 9.2 Diabetic retinopathy detection and classification process using traditional approach.
(AlexNet and two different networks) were used normalized and resized, and to scale back overfit-
to observe microaneurysms, hemorrhages, soft ting, L2 regularization and dropout techniques
and laborious exudates from three completely were used. The system created a specificity of
different datasets, Kaggle, DiaretDB1, and E- 95%, accuracy of 75%, and a sensitivity of 30%.
ophtha (private). The images of complex body Gulshan et al. [23] planned a way wherever
part were resized, cropped, normalized, aug- 10 CNNs (pre-trained Inceptionv3) were trained
mented, and therefore the morphological filter to discover diabetic macular dropsy (DME) and
was applied within the preprocessing section. DR. Eyepacs-1 and Messidor-2 datasets were
The illness was classified into two categories, accustomed to check the CNN model. The data-
attributable and nonreferable DR, and made a set images were initially normalized, resized,
mythical creature price of 0.954 and 0.949 in and fed into the CNN model to classify the
Kaggle and E-ophtha, respectively. images into ascribable DME, moderate/worse
Jiang et al. [20] projected a model wherever DR, severe/worse DR, or totally hierarchical
three pretrained CNNs (Inceptionv3, ResNet152, DR. The model created a specificity of 93% in
and InceptionResNetv2) do not to classify the two of the datasets taken and sensitivity of
dataset as attributable DR or nonreferable DR. 97.5% and 96.1% in yepacs-1 and Messidor-2
Before CNN training, the images were resized, datasets, respectively.
enhanced, and improved, and so the models
were integrated with AdaBoost technique.
Further to update the network weights, Adam
9.2.3 Datasets
optimizer was used and therefore the system
achieved 88.21% accuracy and AUC of 0.946. a) Diabetic-Ratinopathy_Sample_Dataset_
Zago et al. [21] projected a technique wher- Binaryi - This dataset is sample data from
ever two CNNs (pretrained VGG16 and a the Diabetic Retinopathy Competition. It
CNN) were utilized to observe DR or non-DR takes an excessive amount of time in
images supported the red lesion patches likeli- preprocessing, so this dataset contains
hood. This model was trained on the resized images (90,128,264) for saving a
DIARETDB1 dataset, and it had been tested on while. It contains only 526 samples 1/2 of
the few datasets: IDRiD, Messidor, Messidor-2, the samples has DR, and half have not.
DDR, DIARETDB0, and Kaggle. The model Metadata of array are given in Binary
achieved the good results on the Messidor data- Dataframe CSV, and images are on an
set with a sensitivity of 0.94 and AUC of 0.912. exhibition on the identical index.
b) Diabetic Retinopathy 224 3 224 Gaussian
Filteredii - The given image used to identify
9.2.2.2 Multilevel classification the diabetes virus. The first dataset can be
Pratt et al. [22] planned a way wherever a purchased at Discover Blind APTOS 2019.
CNN was used with 10 CNN layers, 8 max- This image has been reshaped to 224 3 224
pooling layers, and 3 totally connected layers, and pixels, making it easier to use with many
a softmax classifier was accustomed classify the pretrained advanced learning models. With
Kaggle dataset images into five categories accord- the dashboard, all images are stored in a
ing to the severity levels of DR. Throughout folder, depending on the intensity/degree
the preprocessing part, the images are color of diabetes. A CSV file is provided. There
i
https://www.kaggle.com/sohaibanwaar1203/prepossessed-arrays-of-binary-data.
ii
https://www.kaggle.com/sovitrath/diabetic-retinopathy-224x224-gaussian-filtered.
been used in medicine to evaluate chest radio- trained DNN can be used in the diagnosis of
graphs [35] and images from histopathology [36]. diabetic patients, the system’s predictive power
The use of a DNN in the screening of patients by to correctly detect retinopathy in fundus images
mammography provided a better prediction of was investigated. While using this method, 10
the detection of malignancy than inexperienced layers of neural networks can be used by the
radiographers [37]. Moreover, DNNs are often number of neurons varying from 8 to 256 in a
used to medicate visual perception deficiencies manner mainly in 8 3 16 3 32 3 64 3 128 3
in ophthalmology [38] (Fig. 9.4). 256 3 128 3 64 3 32 3 16 3 8 neuron configura-
A neural network was trained in this study to tion. The accuracy was not impressive since the
recognize the features of diabetic fundus images, data was imbalanced concerning the number of
and its accuracy, F1 score, was tested. To see if a images in each class.
import tensorflow as tf
from tensorflow.keras.applications import *
from tensorflow.keras.optimizers import *
from tensorflow.keras.losses import *
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.preprocessing.image import *
from tensorflow.keras.utils import *
dnn_model=Sequential()
dnn_model.add(Dense(8, input_dim=2, kernel_initializer = 'uniform', activation = 'relu'))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(16, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(32, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(64, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization()
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(128, kernel_initializer = 'uniform', activation = 'relu'))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(256, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(128, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(64, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(32, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(16, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(8, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(2,activation='softmax'))
dnn_model.summary()
9.3 Deep learning methods for diabetic retinopathy detection 249
FIGURE 9.4 Deep learning model for diabetic retinopathy detection and classification.
9.3.2 Convolutional neural networks level DR signal, gradually converting and inte-
A CNN is a process model that takes small grating it into higher-resolution DR functions.
pixels as input and routes them through a These properties are automatically combined
defined set of elements in the network. During to provide a great opportunity to map the
learning, the network itself generates a low- image as normal or abnormal [39].
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=(264,264,3)))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Dense(16, activation='relu'))model.add(Dropout(0.5))
model.add(Flatten())model.add(Dense(16, activation='relu'))
model.add(Dropout(0.5))model.add(Dense(5, activation='softmax'))
9.3.2.1 Layers in convolutional neural this manually designed method does not
network cover all DR signals in the image, which is a
In this experiment, the CNN-DRD model waste of time investigating common
developed has two to eight flexible layers, sche- problems. As a result, the DR solution is
matics, and a reading system. CNN is a well- limited. Unlike hand-drawn images, a DL
known DL model used by scientists to distribute technique known as CNN has significantly
natural images and has proven to be a successful increased DR values and reduced the time
method for medical images. For example, using constraint on the need to use more data to
monetary imaging, the CNN model plays an know financial characteristics. CNN’s
important role in the proper distribution of the traditional technique and functional design
Nonproliferative Diabetic Retinopathy (NPDR). It fall into the accessories category. The
also increases the efficiency, availability, and cost- manually designed method demonstrates the
effectiveness of your Diabetic Retinopathy (DR) process of moving an object using advanced
scoring system. The DR scoring system is opti- algorithms and expert techniques. CNN is a
mized for a variety of high-quality images and DL process that works and is created by the
different settings compared to traditional hand- human brain system. Learn key ideas from
crafted methods [39]. Considering the popularity your data and understand the design process
of CNN architectures for DR diagnosis, they have with minimal control. CNN is very motivated
significant limitations that are described below. to practice accurate predictions, even if there
are blockages, posts, and the true nature of
1. The current CNN model for detecting DR the object is not possible. In addition, the
focuses only on the DR distribution and does benefits of expanding data are used to learn
not detect the status of DR lesions in the how to make changes and improve mapping
background image. They work from start to capabilities through the CNN system.
finish. From end to end of the CNN means Scientists are currently adapting several
that the input image is imported directly into modifications to the CNN system to improve
the CNN and the output image provides DR their applications in physics [40].
intensity. Although the details of the DR
injury are important to hospital staff.
2. The latest CNN models require high-quality
information models for training, which is an 9.3.3 Transfer learning
expensive and time-consuming task. In In order to reach high accuracy in training a
contrast, CNN models, which can be deep CNN, a substantial amount of training data
learned from several examples, are the most is typically required, which can be costly to obtain.
effective and demanding practice in medical This problem is addressed by transfer learning,
education today in terms of the value of DR. which transfers knowledge learnt on a big dataset
3. CNN models that had been developed failed with a similar domain to the training dataset. A
to learn the complex behavior of DR lesions. common method for CNNs is to train them on a
Initially, the fundus image was split into big source dataset and then exploit their feature
discrete segments that were used to feed into extraction skills. Furthermore, fine-tuning the
the CNN. As a result, small lesions are resulting pre-trained model on a smaller target
difficult to detect due to the incomprehensible dataset with a comparable domain but a different
nature of the optic nerve of the eye. Studying goal has been shown to enhance task accuracy
these vulnerabilities is essential and necessary even further. DL techniques require many exam-
for a real DR classification system. Moreover, ples or a large set of data. The number of samples
Normal
Diabetic
Retina Retinography
Image
Pre-trained Model
Fully Output
Connected Layer
Layer
FIGURE 9.5 Transfer learning model for diabetic retinopathy detection and classification.
training complex medical data using algo- imbalanced dataset consists of Binary. npz files
rithms that create advanced research data. AI that have 1000 images. We split them into 600
and DL are applications in the field of ophthal- training, 200 validation, and 200 testing
mology because most of the data is image- images. We predict only 2 Classes (0 and 1)
based and the results are consistent with image using this dataset. The balanced dataset con-
recognition. AI applications in DR research sists of 3662 224 3 224 Gaussian images. We
have succeeded in providing recommendations split them 2966 training images, 329 validation
and research presentations. The role of AI in images, and 367 testing images. We predict
the diagnosis of rheumatoid arthritis (RDR), both 2 Classes (0 and 1) and 5 Classes (0, 1, 2,
expressed as moderate to severe or higher DR 3, and 4) using this dataset.
retinopathy (NPDR), is very consistent with or
without macular degeneration (DME). This
offers the right benefits. Over 90% of the stud- 9.4.1.1 For Dataset a) Diabetic-Retinopathy
ies use various AI algorithms [41]. AI algo- Sample_Dataset_Binary
rithms require skilled professionals to obtain In this experiment, we apply CNN with dif-
clear and concise images for use as input data, ferent layers and different neuron configura-
and ophthalmologists/ophthalmologists (infor- tions. This approach did not give significant
mants) to provide full field accuracy for imag- differences in the evaluation metrics.
ing. A recent study of eye imaging on
import numpy as np
smartphones and EyeArt AI software showed import matplotlib.pyplot as plt
a high efficiency of 95.8% for detecting DR of y_test=np.argmax(y_test, axis=1)
pred=np.argmax(model.predict(x_test),axis=-1)
any severity and over 99% efficiency for exper- cm=confusion_matrix(y_test,pred)
tise in detecting RDR and STDR [42]. cm_plot=plot_confusion_matrix(cm,classes=['0','1'])
9.4.2 Performance evaluation metrics the F1 score. The F1 score is the harmonic mean
of precision and recall, ranging from 0 to 1.
It is critical to estimate how precisely a classi- Many performance metrics are used to assess
fication model predicts the correct result when the classification performance of DL methods.
developing one. This estimation, however, is Accuracy, recall, precision, and area under the
insufficient because it can produce deceptive ROC curve are some of the commonly used
results in some cases. And it is at this point metrics in DL. The percentage of abnormal
where the new requirements become an impor- images classified as abnormal is referred to as
tant factor in determining the more significant sensitivity, and the percentage of normal images
estimations of the constructed model. classified as usual is referred to as specificity [43].
For classification models, accuracy is a criti- AUC is area under the curve that is formed
cal metric. It is straightforward to understand by plotting sensitivity versus specificity. The
and apply to binary and multiclass classification percentage of correctly classified images is
problems. Accuracy indicates the proportion of referred to as accuracy. The equations for each
tangible results in the total number of records measurement are listed below:
tested. The classification model, which is
entirely built from balanced datasets, is accurate (1) Accuracy 5 TN 1 TP/(TN 1 TP 1 FN 1 FP)
enough to be tested. Precision is defined as the (2) Precision 5 TP/(TP 1 FP)
ratio of true positives to predicted positives. (3) Recall 5 TP/(TP 1 FN)
Another important metric is recall, which gives (4) F1 score 5 (2 3 Precision 3 Recall)/
information if all potential positives must be (Precision 1 Recall)
captured. Recall is the percentage of overall pos- (5) The area under the curve is abbreviated as
itive samples that were correctly predicted as AUC. AUC is a more comprehensive
positive. The recall is one if all positive samples measurement that takes into account both
are predicted to be positive. If an optimal com- true negative and true positive outcomes.
bination of precision and recall is required, The higher the AUC, the better the model’s
these two measures can be combined to form performance.
The number of disease images classified as architectures, we wished to see how the perfor-
true positives (TP) is equal to the number of mance would change as the CNNs of higher
disease images classified as true positives (TP). levels were being used. We conducted experi-
A true negative is an outcome where the ments on the data for the same and came up
model correctly predicts the negative class (TN). with the following tables.
The number of standard images classified as
disease is referred to as the false-positive (FP) rate. 9.4.3.1 For Dataset a) Diabetic-
The number of false-negative (FN) disease Retinopathy Sample Dataset Binary
images is equal to the number of normal dis- This section contains all the experimental
ease images. results and observations for the abovemen-
In practice, a prototype must have precision tioned dataset (Table 9.1).
and recall of 1, resulting in an F1 score of 1, that Since the given dataset might be imbalanced
is, 100% accuracy, which is not possible in a clas- as a result, and the Kappa score turned out to
sification task. As a result, the classifier that is be hostile or equal to 0. Due to the high num-
created should have higher precision and recall. ber of layers with no residual skip in between
In addition to the agreement observed in the the layers, the information could not get propa-
confusion matrix, Cohen Kappa accounts for gated throughout the model, and thus it gave
agreement that occurs by chance. an abysmal performance. The softmax activa-
tion function has to be added to the output to
9.4.3 Experimental results create normalized arrays that determine
Different architectures were trained with whether the retinal image is healthy or dis-
pretrained weights. Apart from the popular abled (Table 9.2).
TABLE 9.1 The comparison of convolutional neural network (CNN) layers using Diabetic Retinopathy Detection
binary dataset.
CNN layers Training accuracy Validation accuracy Test accuracy F1 score ROC area
For the models mentioned earlier, the softmax and negative in some cases. This happens because
activation function must be added to the output the dataset is highly unbalanced as a result the
to create normalized arrays that determine scores are relatively not good. To encounter this,
whether the retinal image is healthy or disabled we shall make the dataset balanced using data
(0 or 1). We can see that the Kappa score is zero augmentation or GAN.
class Generator(nn.Module):
def __init__(self, ngpu):
super(Generator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is Z, going into a convolution
class Discriminator(nn.Module):
def __init__(self, ngpu):
super(Discriminator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is (nc) x 256 x 256
nn.Conv2d(nc, ndf , 4, 4, 1, bias=False),
nn.BatchNorm2d(ndf),
nn.LeakyReLU(0.2, inplace=True),
# input is (ndf*2)x 32 X 32
nn.Conv2d(ndf*2, ndf*4, 4, 4, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(ndf * 4, 1, 4, 1, 0, bias=False),
nn.Sigmoid())
TABLE 9.2 The comparison of other deep learning models using Diabetic Retinopathy Detection binary dataset.
CNN layers Training accuracy Validation accuracy Test accuracy F1 score ROC area
9.4.3.2 For Dataset b) Diabetic Retinopathy The dataset, as mentioned above, is balanced
224x224 Gaussian Filtered with around 1700 images in both classes 0 and 1.
This section contains all the experimental The Cohen Kappa Score turned out to be greater
results and observations for the dataset, as than 0.8 in all the experiments, which are good
mentioned earlier. We can infer two differ- signs showing an excellent strength of agree-
ent possibilities in this dataset. We can use ment. The softmax activation function has to be
this dataset to predict both 2 Classes and added to the output to create normalized arrays
5 Classes in diabetic retinopathy disease that determine whether the retinal image is
(Table 9.3). healthy or disabled (0 or 1) (Table 9.4).
model = tf.keras.Sequential([
layers.Conv2D(16, (3,3), padding="same", input_shape=(224,224,3), activation = 'relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.BatchNormalization(),
layers.Flatten(),
layers.Dense(32, activation = 'relu'),
layers.Dropout(0.15),
layers.Dense(2, activation = 'softmax')])
model.compile(optimizer=tf.keras.optimizers.Adam(lr = 1e-5),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['acc','AUC',tensorflow_addons.metrics.F1Score(num_classes=2, average='weighted'),
tensorflow_addons.metrics.CohenKappa(num_classes=5)])
history = model.fit(train_batches,
epochs=12,
validation_data=val_batches)
CNN layers Training accuracy Validation accuracy Test accuracy F1 score Kappa score ROC area
TABLE 9.4 The comparison of DL model using Diabetic Retinopathy Detection 224 3 224 Gaussian images for 2
Class classification.
Models Training accuracy Validation accuracy Test accuracy F1 score Kappa score ROC area
For the models mentioned above, the that determine whether the retinal image
softmax activation function has to be added is healthy or disabled (0 or 1) (Tables 9.5
to the output to create normalized arrays and 9.6).
train_generator,test_generator,train_images,val_images,test_images=create_gen()
# Load the pretained model
pretrained_model = tf.keras.applications.ResNet50(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling='avg')
pretrained_model.trainable = False
inputs = pretrained_model.input
x = tf.keras.layers.Dense(128, activation='relu')(pretrained_model.output)
x = tf.keras.layers.Dense(128, activation='relu')(x)
outputs = tf.keras.layers.Dense(2, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy','AUC'])
history = model.fit(
train_images,
validation_data=val_images,
batch_size = 32,
epochs=10,
callbacks=[
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)
])
TABLE 9.5 The comparison of different convolutional neural network (CNN) layers using Diabetic Retinopathy
Detection 224 3 224 Gaussian images for 5 Class classification.
CNN layers Training accuracy Validation accuracy Test accuracy F1 score Kappa score ROC area
Models Training accuracy Validation accuracy Test accuracy F1 score Kappa score ROC area
the quantity of reads they devise their personal framework to identify the diabetic-related disease,
CNN gadget and the quantity of reads they opt to study how to disseminate images of those with
use for present structures which includes VGG, the disease, and learn the function that directly
ResNet, and MobileNet, and withinside the transi- removes the image’s feature. Image courtesy is
tion to the smaller ones. Building a brand-new standardized and expanded as a product selection
CNN model from scratch takes a whole lot of and model training. The maximum accuracy of
attempt and time. It is an awful lot simpler to the test was 97.82% using the DenseNet121 archi-
apply switch learning, which hastens the layout tecture, but the two-dimensional projection uses a
and improvement of latest buildings. On the alter- 224 3 224 Gaussian image dataset to improve the
native hand, it ought to be cited that the consis- accuracy of the diabetic image distribution. Also,
tency of the gadget with the CNN model itself is unlike modern ones, the number of images
extra than that utilized in present structures. required to train a small sample, which is very
Researchers must focus on this issue and important due to the weight of the coverage.
need to do more research to determine both The automated DR detection system reduces
conditions. Improved DR detection system that the time required to perform diagnostic work,
can detect different types of lesions and stages saves ophthalmologists work and costs, and
DR results in an improved monitoring system results in timely patient care. Automatic DR
for DR patients which avoids the risk of vision detection plays an important role in early DR
loss. Five DR levels need to be identified cor- detection. The DR level is supported by the
rectly and the gap closed in a system that can type of color that appears on the eyeball. This
detect DR damage. This can be seen as a cur- chapter presents modern automation systems
rent challenge for researchers for future for diabetes diagnosis and classification using
research. Use the map to show the DR level of advanced learning techniques. Most research-
each pixel to determine the distribution. We ers use CNN for classification, so they detect
used a simple method to calculate the label DR images from their size. This chapter also
function for the entire image. The advantage is describes the main methods which could be
that it does not need to be reused. In particular, used to classify and diagnose DR using DL.
our method distinguishes between injury
detection time (studied by CNN) and DR tag-
ging by CNN events. As a result, this product References
can be used in any live database without any
additional changes. [1] C. Agurto, et al., Multiscale AM-FM methods for dia-
betic retinopathy lesion detection, IEEE Trans. Med.
Imaging 29 (2) (2010) 502512.
[2] R. Acharya, Y.K.E. Ng, J.S. Suri, Image Modeling of the
9.6 Conclusion Human Eye, Artech House, 2008.
[3] Early Treatment Diabetic Retinopathy Study Research
Group, Grading diabetic retinopathy from stereoscopic
DM is one of the complications of diabetes and
color fundus photographs—an extension of the modi-
is a cause of blindness. An effective and automatic fied Airlie House classification: ETDRS report number
study of the incidence of DM is of clinical signifi- 10, Ophthalmology 98 (5) (1991) 786806.
cance. Early detection allows faster treatment. [4] P. Liskowski, K. Krawiec, Segmenting retinal blood ves-
This is important because early detection can pre- sels with deep neural networks, IEEE Trans. Med.
vent disability. Automatic diagnosis of DM from Imaging 35 (11) (2016) 23692380.
[5] J. Odstrcilik, et al., Retinal vessel segmentation by
previous images can help clinicians effectively improved matched filtering: evaluation on a new high-
with the diagnosis of DM, which can improve the resolution fundus image database, IET Image Processing
quality of diagnosis. This chapter presents a 7 (4) (2013) 373383.
[34] L. Deng, D. Yu, Deep learning: methods and applica- [40] H. Greenspan, B. van Ginneken, M. Ronald, Summers
tions, Found Trendss Signal Process 7 (34) (2014) Guest editorial deep learning in medical imaging:
197387. Overview and future promise of an exciting new tech-
[35] S.C. Lo, M.T. Freedman, J.S. Lin, S.K. Mun, Automatic nique, IEEE Trans. Med. Imaging 35 (5) (2016)
lung nodule detection using profile matching and 11531159. Available from: https://doi.org/10.1109/
back-propagation neural network techniques, J. Digit. tmi.2016.2553401.
Imaging 6 (1993) 4854. [41] R. Raman, S. Srinivasan, S. Virmani, S. Sivaprasad, C.
[36] M.L. Astion, P. Wilding, The application of back- Rao, R. Rajalakshmi, Fundus photograph-based deep
propagation neural networks to problems in pathol- learning algorithms in detecting diabetic retinopathy,
ogy and laboratory medicine, Arch. Path. Lab. Med. Eye 33 (2019) 97109.
116 (1992) 9951001. [42] R. Rajalakshmi, R. Subashini, R.M. Anjana, V. Mohan,
[37] Y. Wu, M.L. Giger, K. Doi, C.J. Vyborny, R.A. Automated diabetic retinopathy detection in
Schmidt, C.E. Metz, Artificial neural networks in smartphone-based fundus photography using artificial
mammography: application to decision making in intelligence, Eye 32 (2018) 11381144.
the diagnosis of breast cancer, Radiology, 187, 1993, [43] W. Zhang, et al., Automated identification and grad-
pp. 8187. ing system of diabetic retinopathy using deep neural
[38] S.E. Spenceley, D.B. Henson, D.R. Bull, Visual field networks, Knowl. Base Syst. 175 (2019) 1225.
analysis using artificial neural networks, Ophthal.
Physiol. Opt. 14 (1994) 239248.
[39] Q. Abbas, M.E.A. Ibrahim, M.A. Jaffar, Video scene Further reading
analysis: an overview and challenges on deep learning
algorithms, Multimed. Tools Appl. 77 (16) (2018) W.L. Alyoubi, W.M. Shalash, M.F. Abulkhair, Diabetic reti-
2041520453. Available from: https://doi.org/ nopathy detection through deep learning techniques: a
10.1007/s11042-017-5438-7. review, Inform. Med. Unlocked 20 (2020) 10037.
10
Automated detection of colon cancer
using deep learning
Aayush Rajput1 and Abdulhamit Subasi2,3
1
Indian Institute of Technology, Kharagpur, West Bengal, India 2Institute of Biomedicine, Faculty of
Medicine, University of Turku, Turku, Finland 3Department of Computer Science, College of
Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
of development of colon cancer is slightly more colon cancer, a small tissue from the affected
in men than women. One man among 23 and region’s large intestine is removed from the
one woman in every 25 have colon cancer in patient’s body and examined under a micro-
the United States [2]. Colorectal cancer is the scope by a pathologist [6]. The biopsy is capa-
term used for colon and rectal cancer collec- ble of making a definite diagnosis of colon
tively. Colorectal cancer is the third most com- cancer. Colonoscopy is a method performed by
mon type of cancer, excluding the skin cancers the colonoscopist in which a colonoscopist
in the United States, with 104,270 cases of colon checks inside the colon in the patient’s body.
cancer and 45,230 cases of rectal cancer in 2021 During the colonoscopy, if cancer is detected,
[2]. People with a family history of colon can- then the tumor has to be removed surgically
cer have more risk of developing it, especially from the site for a complete diagnosis. In CT
when any close family member below the age scans, X-rays are used to produce a 3D image
of 60 is diagnosed with colon cancer. Other sit- of the human body, and sometimes a die is
uation can also increase the risk of developing also used for more precise results [7]. The die
colon cancer, such as people who do not do can be given to the patient by injecting directly
much physical activity, overweight, smokers, into the veins or can be taken as a pill.
people having a history of any other type of From the resulting image of the CT scan,
cancer such as ovarian cancer or uterine cancer, abnormalities can be detected. In colon cancer,
and people having adenomas [1]. internal bleeding in the large intestine occurs.
Sometimes it is possible that a person hav- The person becomes anemic, so a blood test is
ing colon cancer does not show any symptoms done to detect the blood count; a lower value of
in the initial stage of development of cancer blood count shows the presence of internal
[3]. The symptoms of colon cancer include a bleeding and hence the colon cancer. Blood tests
frequent change in bowel habits, diarrhea, the are also done to detect the carcinoembryonic
feeling of bowel does not empty, constipation, antigen level in blood. A higher value of carci-
thinner stools than usual, tiredness or fatigue, noembryonic antigen (CEA) indicates the
anemia, blood in stool [3]. A person should spread of cancer to other parts of the body. MRI
immediately consult a doctor if any of the works similarly to the CT scan. Only the differ-
symptoms last for a long time regardless of age ence is that here magnetic rays are used to
because colon cancer can develop at any age. make the 3D image of the body, and the size of
The most common type of colon cancer is the tumor can be detected. The treatment of
adenocarcinoma colon cancer [4]. Benign colon colon cancer has physical, emotional, social, and
cancer is the initial stage of colon cancer which financial effects on the patient. These effects can
can be treated easily and is not life-threatening. vary from person to person for the same treat-
Metastasis is when cancer develops in one part ment for the same type of cancer. To cope with
of body and spread to other parts [5]. Doctors the side effect of the treatment, various things
use many types of tests for the diagnosis of need to be done. After the detection of a cancer,
colon cancer. For checking metastasis, tests are patient can have sadness, anxiety, or anger.
also done. The test used by the doctor depends Taking help from the counselor can relieve the
on the various factors and symptoms seen in patient from the emotional side effect of the
the patient. The first physical examination of treatment. The treatment of colon cancer is
the patient is done. Then tests such as biopsy, costly, due to which sometimes patients cannot
colonoscopy, computed tomography (CT), bio- take complete treatment and put their life at
marker testing, blood test, magnetic resonance risk. Any financial problem should be consulted
imaging (MRI) are used. In a biopsy to detect with the supportive team. Family and friends
Hospital. Of these slides, 85 were from normal machine (SVM) classifier and a fine-tuned pre-
colorectal tissue, 222 slides were colorectal can- trained model fixing the weights of lower level
cer, 275 slides were used for training, and the blocks were used. The three techniques gave an
remaining for testing. Pretrained InceptionV3 accuracy score of 90.37%, 96.46%, and 96.82%.
architecture was used with the input size as Li et al. [14] used DL methods to classify colo-
299 3 299 pixels. The training was done to 20 rectal cancer lymph node metastasis images.
epochs with a batch size of 92 and a learning Different ML techniques were used for the task,
rate 3 3 1024. The overall classification accu- and their result was compared. Data used in the
racy and Receiver operating characteristic study was taken from the Harbin Medical
(ROC) score were 95.1% and 99%, respectively. University Cancer Hospital; total data consist of
The image segmentation was also done. The 1646 positive and 1718 negative samples. Features
prediction performance across all the slides on used for training the models are extracted by
the independent dataset has the mean accuracy three techniques: gray-level histogram, textural
score, specificity score, sensitivity score, and features by the gray-level cooccurrence matrix,
dice score of 87.8%, 90.0%, 85.2%, and 87.2%. and the scale-invariant feature transform. The pre-
Hornbrook et al. [12] used machine learning trained CNN model AlexNet is also used. CNN
(ML) tools for the early detection of colorectal model LeNet and AlexNet are used with full
cancer using gender, age, and complete blood training, classical ML models AdaBoost, Decision
count data. The data consist of 900 colorectal can- Tree, KNN, Logistic Regression, and multilayer
cer cases and 9108 no cancer cases. The data was perceptron, Naive Bayes (NB), Stochastic gradient
taken from the Kaiser Permanente Northwest descent (SGD), and SVM are used. The evaluation
Region’s Tumour Registry. The performance of metrics used for comparing the model’s perfor-
the model was evaluated using the specificity, mance are accuracy, AUC, sensitivity, specificity,
Area under the ROC curve (AUC), and odds positive predictive value, and negative predictive
ratio. The model gave the area under curve score value. The pretrained AlexNet model outper-
of 80% with 99% specificity. They used the formed the other model giving accuracy, AUC,
ColonFlagR model for the detection of undiag- sensitivity, specificity, Positive Predictive Value
nosed colorectal cancer. The study showed that (PPV), and Negative Predictive Value (NPV)
ColonFlag identifies individuals 10 3 higher risk scores of 75.83%, 79.41%, 80.04%, 79.97%, 79.92%,
of undiagnosed colon cancer and is more accu- and 80.09%, respectively.
rate at identifying right-sided colorectal cancers.
Ponzio et al. [13] proposed a DL technique
using CNNs to detect adenocarcinoma from
healthy and benign lesions. The data used was 10.3 Artificial intelligence for colon
taken from a public repository of H&E stained cancer detection
whole-slide images available on the website of
University of Leeds Virtual Pathology Project. DL is a very useful tool in medical science
The total data consist of 13,500 patches, from that can significantly affect many ways, like giv-
which 9000 were used for training and the rest ing accurate and fast predictions. Normally
4500 for testing. They also used a pretrained detection of colon cancer from the tissue slides
model, VGG16, and principal component analy- takes a lot of time for a skilled doctor to detect a
sis (PCA) was used for the feature reduction cancer-causing tumor, which can be avoided
technique. A fully trained CNN model and a using DL. The important techniques used in this
pretrained VGG16 model with support vector study are explained in the following sections.
models trained for weeks by the people having a small portion of the layer is used for getting
best resources can be used directly by any the value of one neuron of the next layer. In
other person for a different problem. These CNN, no feature engineering is done to get
pretrained models give very good results in a good results CNN takes care of the features
much shorter time because they already have itself while using classical ML; a lot of work
the best weights and do not need to be chan- has to be done to get the important features.
ged [20]. The initially hidden layers of CNN are respon-
The results using transfer learning are gen- sible for detecting the low-level features in the
erally better than training a DL model from image, and high-level features are detected
scratch. For the computer vision tasks, CNNs using the farther hidden layers. The feature
outperform the other algorithms. In this study, detection in CNN is done by the filters associ-
CNN models are trained from scratch, pre- ated with each layer; a layer can have many fil-
trained models and pretrained models with ters [22]. A filter is a matrix that moves over a
classical ML algorithms are used, and their CNN layer, and the value of its weights has
results are compared. Besides computer vision, multiplied the value of the neurons to get an
DL is used in many other fields such as natural output value for the next layer’s neuron. This
language processing, visual recognition, rec- operation is called the convolution operation.
ommendation systems, self-driving cars, etc. The pooling layer is used for reducing the size
[21]. Today, we have enough data resources to of a layer to reduce the computation cost.
train these DL models that take more data than There are no weights associated with filters in
other ML models. DL has a direct effect on these layers. Max pooling and average pooling
people’s lives. DL in medical science has made are the two types of pooling mainly used [23].
it easier for people to get accurate and faster The maximum value from a portion of layer on
results. From getting recommendations on which filter is applied is taken in max pooling.
social media to robotics everywhere, DL is In average pooling takes the average of the
used. Many research are going on DL to make portion on which kernel is applied is taken.
it better. During the convolution operation, the pixel in
the middle will affect more neurons of the out-
put layer than the corners. It can lead to loss of
10.3.3 Convolutional neural networks information present in the corner pixels; to
CNN is a type of ANN which can take a 3D avoid this problem, padding is done. Padding
array as input and give the required results. is adding zeros to the convolution layer [24].
Every image is a 3D array with each pixel as This can keep the shape of the next layers the
the value so that images can be given as input same as the previous layer—this kind of pad-
in CNN. CNN can detect the features in an ding is called the same padding. Valid padding
image using the filters. During the training, the means no padding at all. A filter of CNN has
weights of these filters are changed to get a the property of stride; it defines the movement
minimum possible error in the true and pre- of a filter after doing one convolution opera-
dicted values. Like ANN, in CNN, there is an tion. If the stride of a filter is n, it means that
input layer, an output layer, and in between after doing one convolution operation, it will
input and output layers, there are hidden move n pixels to reach its new position. These
layers. In CNN, each neuron or unit is not con- all factors are to be given by the user designing
nected to every neuron of the next layers. Only the CNN.
features from the data. The pretrained models reducing the size. After applying PCA, data
already have the best weights, which can detect can be converted to original dimensions, but
important features in the data; there is no need that will not be the same as the original data.
to change the weights of the pretrained model. PCA will project the data on a hyperplane,
In this study pretrained CNN architectures and the value after the project will be the
VGG16 [25], VGG19 [25], ResNet50 [26], value of data. This is the most widely used
ResNet101 [26], MobileNetV2 [27], MobileNet and older dimension reduction algorithm. In
[27], InceptionV3 [28], InceptionResNetV2 [29], LDA, linear combinations of input columns
DenseNet169 [30], DenseNet121 [30], and are calculated in a way that the calculated
Xception [31] are used as feature extractor and features of a particular class are extracted
then feature extracted using these models are from another class of data. Neural autoenco-
then flattened and used for training the classical ders convert the data to a lower size by
ML model. The time taken to train models by removing the noisy and redundancy in it. T-
the extracted features is very less than training SNE is a recent dimension reduction tech-
the model from scratch. The performance of pre- nique. It is a nonlinear technique which maps
trained models is also very good. the data to a lower dimensional space where
def get_models():
ANN = Sequential()
ANN.add(Dense(128, input_dim = X_train.shape[1], activation = 'relu'))
ANN.add(BatchNormalization())
ANN.add(Dropout(0.2))
ANN.add(Dense(64, activation='relu'))
ANN.add(Dense(32, activation='relu'))
ANN.add(Dense(16, activation='relu'))
ANN.add(Dense(8, activation='relu'))
ANN.add(Dense(len(train_it.class_indices), activation='softmax'))
ANN.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
KNN = KNeighborsClassifier()
RF = RandomForestClassifier(n_estimators = 50)
ADB = AdaBoostClassifier()
10.4.4 Experimental data of the data is used for training, and for each vali-
dation and testing, 15% of data is used.
The data used in this study is taken from the
images generated from the original sample of
Health Insurance Portability and Accountability 10.4.6 Experimental results
Act (HIPAA) compliant and validated sources,
First, the CNNs with different layers are
consisting of 500 total images of colon tissue
trained from scratch, and then the pretrained
and augmented to 10,000 using the augmenter
CNN architectures are used for the task. The
package [41]. There are two classes present in
weights of pretrained models are fixed and
the data: colon adenocarcinoma and benign
fully connected layer is the attached as the last
colon tissue. Each has 5000 images. This dataset
layer of pretrained CNN, which takes the flat-
is available publicly on Kaggle as colon cancer
tened output from the CNN as its input and
histopathological images [42]. Each image is in
two neurons at last with softmax activation
jpeg file format, and the size of each image is
function for predicting the class an instance
768 3 768 pixels. The images were reshaped to
belongs (Fig. 10.1). The results of CNN models
128 3 128 pixels to reduce the computational
with different layers are given in (Table 10.1).
cost and training time.
The results obtained by training the whole
CNNs are not very good. They are not able to
classify the images correctly and giving just ran-
10.4.5 Performance evaluation measures
dom results. This is quite expected because the
For measuring the model’s performance on the images are complex and simple CNNs do not
data, various metrics can be used as accuracy have the ability to give the results.
scores can sometimes be misleading. High accu- Table 10.2 clearly shows that the performance
racy scores do not always imply that the model is of the pretrained models is far better than the
accurate. Sometimes, the imbalanced model can own designed and trained CNN models. This is
predict a certain class and get a high accuracy because the data is very complex, and it needs a
score. Thus, to avoid this situation, training accu- huge network and good weights to classify image
racy score, validation accuracy score, test accuracy classes correctly. In the above table, ResNet50 is
score, F1 score, Cohen kappa score, ROC AUC giving the best results, followed by VGG16,
score, recall, and precision score are calculated. ResNet101, and VGG19 with an F1 score of
Based on these scores, best model is selected. 70% 99.8%, 99.33%, 99.26%, and 99.06%, respectively.
Benign Tissue
Adenocarcinoma
Histopathological
Colon Tissue
FIGURE 10.1 The general framework for Histopathological image classification using a CNN model. CNN,
Convolutional neural network.
Then pretrained models are used with clas- also nearly same as the VGG16. The best
sical ML algorithms. In this method, pretrained result is obtained by the VGG19 is with the
models are used as a feature extractor, and the SVM model with test accuracy of 99.66%
output is flattened and fed to the classical ML and worst result are obtained with the KNN
models as input (Fig. 10.2). model with test accuracy of 88.26%
Table 10.3 shows that VGG16 is giving very (Table 10.4).
good results on the dataset. The best result is The results obtained by using ResNet50 are
given by the VGG16 as feature extractor fol- very accurate, with any ML model the test
lowed by SVM with test accuracy of 99.73% accuracy is around 96%. The best result with
and the worst result is given by the KNN with ResNet50 is given by the SVM with test accu-
test accuracy of 84.8%. racy of 99.86% and worst is given by KNN
VGG19 achieved very closely similar with test accuracy of 95.93%, which is similar
results as VGG16 model so the results are to the previous case (Table 10.5).
ANN
K-NN
SVM
RF
Adaboost
XGBoost
Histopathological
Colon Tissue
Dimension
Deep Feature Extraction Classification
Reduction
FIGURE 10.2 The general framework for Histopathological image classification using deep feature extraction.
TABLE 10.3 Performance of VGG16 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
TABLE 10.4 Performance of VGG19 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
TABLE 10.6 Performance of ResNet101 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
TABLE 10.7 Performance of MobileNetV2 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
ResNet101 achieved similar results as the accuracy given by this is with ANN model
ResNet50 model with deeper network. The with test accuracy value of 94.4% and worst is
results are also similar to the results of given with AdaBoost with a value of 87.33%
ResNet50 model. Here best test accuracy is which is quite low (Table 10.7).
given by the SVM with a value of 99.7% and MobileNet is the earlier version of
worst test accuracy is given by KNN model MobileNetV2 model and is similar to the V2 of
with test accuracy of 95.06% (Table 10.6). MobileNet. Therefore the results are also simi-
The performance of MobileNetV2 model is lar of this model’s best test accuracy score is
not as good as VGG or ResNet. The best 95.9% with ANN model and worst accuracy
TABLE 10.8 Performance of MobileNet deep features extraction with different machine learning models.
Model Training accuracy Val accuracy Test accuracy F1 score Kappa Recall Precision
TABLE 10.9 Performance of InceptionV3 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
TABLE 10.10 Performance of InceptionResNetV2 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
score is with AdaBoost with test accuracy score not very good. The best test accuracy is given
of 88.6% which is similar to the MobileNetV2 by InceptionV3 is with SVM with value of
(Table 10.8). 88.86% and worst is AdaBoost with a value of
InceptionV3 is developed by the Google and 82.33% (Table 10.9).
is upgraded version of InceptionV1. This is not This model is inspired by the Inception
suitable for the colon dataset as the results are and ResNet architecture. It has 164 layers and
TABLE 10.11 Performance of DenseNet169 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
TABLE 10.12 Performance of DenseNet121 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
TABLE 10.13 Performance of Xception deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
11
Brain hemorrhage detection using computed
tomography images and deep learning
Abdullah Elen1, Aykut Diker1 and Abdulhamit Subasi2,3
1
Department of Software Engineering, Faculty of Engineering and Natural Sciences, Bandirma Onyedi
Eylul University, Bandirma, Balikesir, Turkey 2Institute of Biomedicine, Faculty of Medicine,
University of Turku, Turku, Finland 3Department of Computer Science, College of Engineering, Effat
University, Jeddah, Saudi Arabia
O U T L I N E
subarachnoid (SAH), epidural (EDH), and sub- weight sharing, pooling, and convolution that
dural (SDH) [4]. Studies conducted on popula- can be used for maintaining essential opera-
tions indicate that incidence of ICH is estimated tions of CNNs [23]. CNNs can extract mean-
to be about 1030 per 100 000 people [5,6]. ingful features and classify them further, for
Furthermore, 3 months later, more than one- the diagnosis of hemorrhage from CT scan
third of survivors have significant disabilities [7]. images. Sample normal brain CT image and
Computed tomography (CT) can be used to hemorrhage CT image are given in Fig. 11.1.
determine the source of hemorrhage and its Since CNNs are popular in the field of com-
localization. CT uses consecutive 2D slices and puter vision, researchers proposed more
stacks them to generate 3D image as an output advanced deep neural networks to obtain more
[8]. The types of ICH can be diagnosed by an accuracy. Deep neural network models such as
expert with the help of their properties in the residual networks (ResNet) [24], AlexNet [25],
CT images such as lesion shape, size, etc. and VGGNet [26], and other different models are
therefore manual diagnosis is a tedious proce- proposed, which are usually large and special-
dure [9]. Hence, automated diagnosis and ized networks relative to the classical CNNs.
detection of hemorrhage has gained attention Each network has its own pros and cons, as
in the last decades [1012]. Deep learning- mentioned in their original study. Variants or
based methods have shown good performance competitors of these deep neural network mod-
in medical image classification [1214], medi- els are usually employed in transfer learning
cal image segmentation [1517], and disease [2729]. In transfer learning, the model is
diagnosis [1821]. Convolutional neural net- trained on a dataset, and weights of the trained
work (CNN) is employed for various classifica- model are saved for further usage. Then, most
tion tasks related with the medical images of the layers are frozen during the training
[1921]. CNNs are capable of extracting fea- phase, and only a few layers are trained. It can
tures from images and learning automatically. be seen that these networks are accepted as
They are devised such that they can under- state-of-the-art methods in the literature and
stand the image content [22]. CNNs are opti- preferred by most of the researchers in the field
mized for images. There are concepts such as of computer vision and medical imaging.
Image resize
Hemorrhage
Training Model
10-Fold ResNet-18
Hemorrhage Grayscale cross EfficientNetB0
and VGG-16
Dataset validation DarkNet-19
Sharpness
Normal
Width scaling
Filter number
Width scaling
Layer
Resolution Resolution
Resolution
scaling scaling
(A) (B) (C) (D) (E)
FIGURE 11.4 (A) Basic network example. (BD) Traditional scaling that increases only one dimension of network
width, depth, or resolution. (E) EfficientNet’s composite scaling method.
structure of neural networks. It optimizes both after each convolution process. For nonlinear
the accuracy and efficiency as measured on the activation, the convolution output is employed.
floating-point operations per second basis. The Spatial pooling is handled via five pooling layers.
improved inverted bottleneck convolution is A 2 3 2 size filter and stride 2 are used for max-
used in this architecture (MBConv). The pooling. Following a succession of convolutional
researchers then scaled up this baseline net- and max-pooling layers, three fully connected
work to create the EfficientNets family of deep layers are built. The final layer [44] is the softmax
learning models [42,43]. Its architecture is layer. The architecture of VGG-16 is depicted in
given in the below diagram in Fig. 11.5. Fig. 11.6.
MBConv6, 5x5
7x7x192
MBConv6, 5x5
224x224x3 224x224x64
112x112x128
56x56x256
28x28x512 7x7x512
14x14x512
1x1x4096 1x1x1000
Convolution +ReLU
Max pooling
Softmax
224x224x3 224x224
112x112
56x56
28x28
14x14
7x7 1000
Convolution
Max pooling
Average pool
Softmax
positive (FP), and false negative [46]. The perfor- samples. Additionally, ROC has been considered
mance measures are given in Eqs. (11.211.6). to evaluate the model performance. The ROC is
a 2D graph that is drawn the true-positive rate
TP 1 TN
ACC 5 (11.2) against the false-negative rate. The training of
TP 1 TN 1 FP 1 FN the CNN models was realized in 50 epochs, the
TP mini-batch size was 16 as can be seen from the
SEN 5 (11.3)
TP 1 FN given MATLAB code of each CNN models in
TN Section 11.3.
SPE 5 (11.4)
TN 1 FP
2 3 TP
F 2 score 5 (11.5) 11.4.1 Dataset
2 3 TP 1 FP 1 FN
TPxTN 2 FPxFN The Head CT Hemorrhage Image Dataset
MCC 5 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðTP 1 FPÞðTP 1 FNÞðTN 1 FPÞðTN 1 FNÞ (HCTHID) contains a total of 200 CT images of
100 normal (healthy) and hemorrhage [47]. The
(11.6)
images in the dataset have different sizes and are
Hereby, TP and TN represent the number of all in PNG format. In this study, we used all
correctly predicted positive and negative sam- images by converting them to 32 3 32 pixels.
ples, whereas FP and FN correspond to the num- Sample images of the HCTHID are shown in
ber of incorrectly predicted positive and negative Fig. 11.8, the first row of the figure depicts
FIGURE 11.8 Sample CT images of (A) normal and (B) hemorrhage brain. CT, Computed tomography.
sion matrix, ROC curves, performance The ACC, SEN, SPE, F-score, and MCC
metrics tables of the CNN models, and clas- values were obtained by using the 10-fold cross
sification results obtained by using the validation of the EfficientNet-B0 CNN model.
FIGURE 11.9 Test result of the DarkNet-19; (A) confusion matrix and (B) ROC curve. ROC, Receiver operating
characteristic.
FIGURE 11.10 Test result of the EfficientNet-B0; (A) confusion matrix and (B) ROC curve. ROC, Receiver operating
characteristic.
FIGURE 11.11 Test result of the ResNet-18; (A) confusion matrix and (B) ROC curve. ROC, Receiver operating
characteristic.
FIGURE 11.12 Test result of the VGG-16; (A) confusion matrix and (B) ROC curve. ROC, Receiver operating
characteristic.
(Continued)
seen in Fig. 11.11, the accuracy rate of the con- The ACC, SEN, SPE, F-score, and MCC
fusion matrix and mean score of the ROC values were obtained by using the 10-fold cross
curve, respectively 80.50%, and 89.50%. validation of the VGG-16 CNN model. As seen
All scores of the classifiers CNN architectures with its image processing analysis, it can make a
are reported in Table 11.5. According to Table 11.5, big impact. Computer-aided, especially deep
the best hemorrhage classification accuracy was learning-based medical image methods have
obtained 83.50% with DarkNet-19 CNN model. increased in recent years. In this chapter, four
deep CNN approaches are considered in hemor-
rhage classification. Several machine learning
11.5 Discussions models, CNN models and our method are com-
pared according to performance criteria, which
ICH is one of the most serious risks to human consist of accuracy, sensitivity, specificity and
healthcare. Head trauma, excessive blood F-score in Table 11.6.
TABLE 11.6 Classification performance of the proposed method and comparison with other studies.
References Methods Image Datasets Accuracy
Dawud et al. (2019) [48] Deep learning and machine learning CT image dataset 93%
(AlexNet 1 SVM)
Shahangian et al. [49] K-NN and MLP CT scan image dataset 93.3%
Lee [50] ResNet-50 CT scan image dataset F1 score
with 674258 samples 88%
Lee et al. [31] Deep learning algorithm for ANN 250 cases with 9085 CT 91.7%
images samples
Chilamkurthy et al. [51] Deep Learning CT scan images from AUC 94.19%
4304 samples
This study ResNet-18 CT image dataset with 200 ACC:
EfficientNet-B0 samples 83.50%
DarkNet-19 SEN: 82%
VGG-16 SPE: 85%
F-score
83.20%
In Ref. [31] ICH classification and its subtype normal on brain CT images has emerged. The
classification were made. The aim of this study dataset used consists of two classes. The classi-
was to evaluate if the method could be used to fication accuracy of DarkNet-19 CNN model
identify ICH and classify its subtypes without has been superior compared to the other CNN
requiring a CNN. CT images were split into 10 models. The proposed model was tested by
subdivisions based on the intracranial height. operating the Brain CT Image database. As a
For the classification of the ICH to subtypes, the result of the best classification, the values of
accuracy success for subarachnoid hemorrhage ACC 83.50%, SEN 82%, SPE 85%, F-Score
was substantially impeccable at 91.7%. 83.20%, and MCC 65% with DarkNet-19 CNN
Ref. [48] addressed the problem of detecting model were obtained. Considering the classifi-
cerebral hemorrhage in the early stages of hemor- cation accuracies of the study, it can be said
rhage, which is a challenging task for radiologists. that it has a low accuracy rate as a disadvan-
AlexNet, which is the popular CNN architecture, tage. The reason for this is that the dataset
is used to solve this problem. Besides, the modi- used has less data than the sources in the dis-
fied AlexNet is supported by the SVM classifier. cussion section. Nevertheless, it can be said
With the Alexnet 1 SVM classifier structure used that promising results were obtained.
in the study, the classification of CT image accu-
racy rate of 93% was obtained.
References
[1] C.J. van Asch, M.J. Luitse, G.J. Rinkel, I. van der Tweel,
11.6 Conclusion A. Algra, C.J. Klijn, Incidence, case fatality, and func-
tional outcome of intracerebral haemorrhage over time,
In this chapter, a pretrained CNN model according to age, sex, and ethnic origin: a systematic
that can distinguish between hemorrhage and review and meta-analysis, Lancet Neurol. 9 (2) (2010).
network, Int. J. Image, Graph. Signal. Process. 8 (3) hemorrhages in head CT scans, NeuroImage Clin. 32
(2016) 1927. Available from: https://doi.org/ (2021) 102785. Available from: https://doi.org/
10.5815/ijigsp.2016.03.03. 10.1016/j.nicl.2021.102785.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual [35] M.F. Mushtaq, et al., BHCNet: Neural Network-
learning for image recognition,” in Proc. IEEE Based Brain Hemorrhage Classification Using Head
Computer Soc. Conf. Computer Vis. Pattern Recognit., CT Scan, IEEE Access. 9 (2021) 113901113916.
2016, vol. 2016Decem, pp. 770778. Available from: Available from: https://doi.org/10.1109/
https://doi.org/10.1109/CVPR.2016.90. ACCESS.2021.3102740.
[25] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet [36] H. Ye, et al., Precise diagnosis of intracranial hemor-
classification with deep convolutional neural net- rhage and subtypes using a three-dimensional joint
works, Commun. ACM 60 (6) (2017). Available from: convolutional and recurrent neural network, Eur.
https://doi.org/10.1145/3065386. Radiol. 29 (11) (2019) 61916201. Available from:
[26] K. Simonyan, A. Zisserman, Very deep convolutional https://doi.org/10.1007/s00330-019-06163-2.
network. Large-scale image Recognition, 2015. [37] M.R. Arbabshirani, et al., Advanced machine learning
[27] L. Wen, X. Li, X. Li, L. Gao, A new transfer learning in action: identification of intracranial hemorrhage on
based on VGG-19 network for fault diagnosis, in: Proc. computed tomography scans of the head with clinical
2019 IEEE 23rd International Conference on Computer workflow integration, NPJ Digit. Med. 1 (1) (2018) 9.
Supported Cooperative Work in Design, CSCWD Available from: https://doi.org/10.1038/s41746-017-
2019, 2019, pp. 205209. Available from: https://doi. 0015-z.
org/10.1109/CSCWD.2019.8791884. [38] J.-L. Solorio-Ramı́rez, M. Saldana-Perez, M.D.
[28] M.A. Fayemiwo, et al., Modeling a deep transfer Lytras, M.-A. Moreno-Ibarra, C. Yáñez-Márquez,
learning framework for the classification of COVID- Brain Hemorrhage Classification in CT scan images
19 radiology dataset, PeerJ Comput. Sci. 7 (2021) using minimalist machine learning, Diagnostics 11
e614. Available from: https://doi.org/10.7717/ (8) (2021) 1449. Available from: https://doi.org/
peerj-cs.614. 10.3390/diagnostics11081449.
[29] S. Lu, Z. Lu, Y.-D. Zhang, Pathological brain detection [39] C. Yanez-Marquez, Toward the bleaching of the black
based on AlexNet and transfer learning, J. Comput. boxes: minimalist machine learning, IT Prof. 22 (4)
Sci. 30 (2019) 4147. Available from: https://doi.org/ (2020) 5156. Available from: https://doi.org/
10.1016/j.jocs.2018.11.008. 10.1109/MITP.2020.2994188.
[30] P. Kumaravel, S. Mohan, J. Arivudaiyanambi, N. [40] E.L. Yuh, A.D. Gean, G.T. Manley, A.L. Callen, M.
Shajil, H.N. Venkatakrishnan, A simplified framework Wintermark, Computer-aided assessment of head
for the detection of intracranial hemorrhage in CT computed tomography (CT) studies in patients with
brain images using deep learning, Curr. Med. Imaging suspected traumatic brain injury, J. Neurotrauma
Former. Curr. Med. Imaging Rev. 17 (10) (2021) 25 (10) (2008) 11631172. Available from: https://doi.
12261236. Available from: https://doi.org/10.2174/ org/10.1089/neu.2008.0590.
1573405617666210218100641. [41] S. Yune, H. Lee, S. Do, D. Ting, Case-based learning
[31] J.Y. Lee, J.S. Kim, T.Y. Kim, Y.S. Kim, Detection and based on artificial intelli-gence radiology atlas:
classification of intracranial haemorrhage on CT Example of intracranial hemorrhage and urinary stone
images using a novel deep-learning algorithm, Sci. detection, J. Gen. Intern. Med. 33 (Supplement 1)
Rep. 10 (1) (2020). Available from: https://doi.org/ (2018).
10.1038/s41598-020-77441-z. [42] K. Ali, Z. Shaikh, A. Khan, A. Laghari, Multiclass skin
[32] H. Lee, et al., An explainable deep-learning algorithm cancer classification using. EfficientNets a first step
for the detection of acute intracranial haemorrhage towards preventing skin cancer, Neurosci. Inform.
from small datasets, Nat. Biomed. Eng. 3 (3) (2019) 2 (4) (2022).
173182. Available from: https://doi.org/10.1038/ [43] V. Kumar, Implementing EfficientNet: a powerful con-
s41551-018-0324-9. volutional neural network. https://analyticsindiamag.
[33] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, Li Fei-Fei, com/implementing-efficientnet-a-powerful-convolu-
ImageNet: A large-scale hierarchical image database, tional-neural-network/, June 19, 2020 (accessed
in: IEEE Conference on Computer Vision and Pattern 10.02.22).
Recognition 2010, pp. 248255. Available from: [44] P. Saha, M.S. Sadi, O.F.M.R.R. Aranya, S. Jahan, F.-A.
https://doi.org/10.1109/cvpr.2009.5206848. Islam, COV-VGX: an automated COVID-19 detection
[34] X. Wang, et al., A deep learning algorithm for auto- system using X-ray images and transfer learning,
matic detection and classification of acute intracranial Inform. Med. Unlocked 26 (2021) 100741.
12
Artificial intelligence-based retinal
disease classification using optical
coherence tomography images
Sohan Patnaik1 and Abdulhamit Subasi2,3
1
Department of Mechanical Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India
2
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 3Department of
Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
1990 Tanno et al. [4,5], at Yamagata University, For the basic framework, we use a convolutional
created a technology which was referred to as architecture followed by either a machine
heterodyne reflectance tomography, and spe- learning-based classification model or a fully con-
cifically in 1991, Huang et al., at Massachusetts nected neural network with softmax activation,
Institute of Technology, effectively coined the which gives the probability of each of the dis-
term “optical rational tomography.” From that eases present in the retina. It is obvious to keep
point of time, OCT with micrometer resolution in mind that the retinal image might not have a
and cross-sectional imaging capacities has disease too. So, our framework addresses that
become an extraordinary biomedical tissue- also and captures if there is no retinal disease.
imaging procedure that continuously got new
specialized abilities starting from early elec-
tronic signal detection, via utilization of broad- 12.2 Related work
band lasers and linear pixel arrays to ultrafast
tunable lasers to expand its performance and The execution of autograding framework
sensitivity envelope. Ocular (or ophthalmic) algorithms for optical coherence tomography
OCT is used heavily by ophthalmologists and (OCT) images has undergone a long turn of
optometrists to obtain high-resolution images events. With the emergence of the artificial intel-
of the retina and anterior segment. ligence, specialists have investigated the diagnos-
With regard to OCT’s capability to show cross- tic instrument with various strategies. The use of
sections of tissue layers with micrometer resolu- an autograding framework has also encountered
tion, OCT provides a straightforward and effec- a long turn of events. Prior algorithms started
tive method of assessing cellular organization, with an image segmentation model. Similar to
photoreceptor integrity [69], axonal thickness in the methodology that human experts use, the
glaucoma [10], macular degeneration [11], diabetic segmentation algorithms detected the edge of the
macular edema [12], multiple sclerosis, and other features and made diagnoses by a binary classifi-
eye diseases or systemic pathologies which have cation algorithm [16]. As the convolutional neu-
ocular signs [13]. Additionally, ophthalmologists ral network (CNN) came into picture, it was
leverage OCT to assess the vascular health of the gradually implemented in the classification
retina via a technique called OCT angiography model. One study endeavored to utilize CNN to
[14]. In ophthalmological surgery, especially reti- perceive the features and make classifications
nal surgery, an OCT can be mounted on the [17]. In recent years, some CNN models have
microscope. Such a system is called an intraopera- been modified to achieve higher accuracy [18,19].
tive OCT (iOCT) and provides support during the Since the start of the 21st century, OCT technol-
surgery with clinical benefits [15]. ogy is more frequently used in detecting the fea-
Diagnosing the diseases from the retinal cross- tures of Age-related macular degeneration (AMD)
sectional images obtained using OCT is still a and Diabetic Macular Edema (DME) [2022], with
challenge when the number of images is very the increasing desire for OCT image autograding,
high. Here comes the need of an intelligent agent many research communities have advanced assets
with better computational skills than a normal in this field to attempt to accomplish more accurate
human being. Deep learning has made remark- models. Improvement of automated image classifi-
able progress in the field of image classification. cation/grading frameworks started with an auto-
Keeping that in mind, we propose a deep mated segmentation algorithm [16]. In 2014 Ehlers
learning-based diagnosis of three types of retinal et al. [16] proposed an automated classification
diseases—drusen, diabetic macular edema framework to perceive AMD and DME. They
(DME), and choroidal neovascularization (CNV). used image segmentation to detect the particular
capture certain interpretable information, we first between the convolutional layers. The best
need a feature representation of the image. This is accuracy and F1 score on the test set was
accomplished by using the convolution operator obtained using the 7-layered CNN, the archi-
between the image and some lower dimensional tecture of which is shown in Fig. 12.2. The
kernels. After several layers of convolution, we input image was mapped to 150 3 150 3 3,
arrive at a small feature map that indeed captures where 3 is the number of channels (RGB) and
the features of the image such as edges, colors, 150 3 150 is the spatial layout of the image.
spatial layout, etc. Moreover, a typical CNN archi- For the classification stage, the output feature
tecture also incorporates certain pooling layers map of the convolutional network was flat-
such as max pooling and average pooling which tened and then passed to a fully connected
just reduce the dimension of the feature maps layer in order to get a vector representation
without any learnable parameters. of the image. Finally, a softmax layer with
For our research, we experimented with four neurons was employed to obtain the
networks with 2- to 8-layered CNNs along probabilities of the four classes (three dis-
with some max-pooling layers embedded in eases and normal).
Normal
CNV
DME
Drusen
tf.keras.layers.Flatten(),
# Import libraries
import tensorflow as tf
import tensorflow_addons as tfa
# Download VGG19 Model
vgg19 = tf.keras.applications.VGG19(
include_top = False,
weights = 'imagenet',
input_tensor = None,
input_shape = INPUT_SHAPE,
pooling = None,
classes = 1000
)
epochs = range(len(acc))
plt.figure(figsize=(7,7))
plt.plot(epochs, acc, 'r', label = 'Training accuracy')
plt.plot(epochs, val_acc, 'b', label = 'Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure(figsize = (7,7))
plt.plot(epochs, loss, 'r', label = 'Training Loss')
plt.plot(epochs, val_loss, 'b', label = 'Validation Loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
cm = confusion_matrix(test_generator.classes, y_pred)
df_cm = pd.DataFrame(cm, list(test_generator.class_indices.keys()),
list(test_generator.class_indices.keys()))
print('Classification Report\n')
target_names = list(test_generator.class_indices.keys())
print(classification_report(test_generator.classes, y_pred, target_names = target_names))
312 12. Artificial intelligence-based retinal disease classification using optical coherence tomography images
ANN
K-NN
SVM
RF
Adaboost
XGBoost
12.4.3 Deep feature extraction and normal retina. We used six machine learning
machine learning models, namely, artificial neural network,
K-nearest neighbors, support vector machines,
It is quite intuitive that the pretrained convolu- random forest classifier, AdaBoost classifier, and
tional models indeed capture the low-level fea- XGBoost classifier. A basic picture of the feature
tures in the image. Keeping that in mind, we first extraction model on top of which the machine
extracted the feature maps from the pretrained learning-based classification model was employed
CNN models and flattened them to obtain a vec- is shown in Fig. 12.3.
0.98
Accuracy
0.96
0.94
2 3 4 5 6 7 8
Number of Convoluonal Layers
FIGURE 12.4 Train, validation, and test accuracy.
The results for the test set on pretrained be attributed to the fact that pretraining helps
image recognition models which have been the models to understand and capture certain
trained previously on ImageNet dataset are low-level features, such as edges, gradients of
shown in Table 12.2. Here, we can see that the color, shapes, etc.
maximum accuracy and F1 score, that is, From Fig. 12.5, we can observe that almost
0.9628 is obtained for VGG19. Good results can all the pretrained image recognition models
Classifier Training accuracy Validation accuracy Test accuracy F1 score Kappa ROC area
0.90
0.85
Accuracy
0.80
0.75
0.70
ResNet50 VGG16 VGG19 Incepon_v3 MobileNet DenseNet169 DenseNet121 InceponResNetv2 MobileNetv2 ResNet101
Pre-trained Model
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision
neighbors, support vector machines, Random conclude that the 7-layered CNN architecture
Forest classifier, AdaBoost classifier, and proposed by us is the best model when
XGBoost classifier. From the tables, we can employed on the test set.
13
Diagnosis of breast cancer from
histopathological images with deep
learning architectures
Emrah Hancer1 and Abdulhamit Subasi2,3
1
Department of Software Engineering, Mehmet Akif Ersoy University, Burdur, Turkey 2Institute of
Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 3Department of Computer
Science, College of Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
are considered at different magnifications to from the perspective of machine learning and
investigate the cellular and tissue level varia- image processing have resulted in the emergence
tions. For instance, the tissue patterns are of deep learning discipline, which has widely
investigated at 100 3 magnification, while the attracted the interest of researchers from different
size and shape of nuclei structures are investi- fields. Deep learning architectures, especially con-
gated at 400 3 magnification. After obtaining volutional neural networks (CNNs), can extract
information from these features, pathologists intrinsic features from raw image data without
can determine a tumor slide as benign and the requirement of more effort. Therefore CNNs
malignant. In case of malignancy, a further have widely been adopted in CAD systems,
analysis is carried out to grade the tumor and resulting in a groundbreaking performance in
based on the grade a treatment is advised to biomedical applications, especially diagnostic
the corresponding patient. Breast cancer may pathology [4].
be in the form of different types and each type CNNs are similar to conventional neural net-
has its own microscopic features [2]. works in such a way that they are built on neu-
Pathologists examine morphological features rons with learnable weights and biases. Each
of H&E stained tissue samples from related neuron takes some inputs and then performs a
breast regions under a microscope to establish a transformation process. The whole network still
definitive diagnosis. Any differences observed represents a differentiable function that trans-
in any features of the interested region is forms a raw image data to a class score.
regarded as abnormal and then a confirmation Moreover, a loss function (e.g., softmax) is still
process is carried out to verify it as a malignant apparent on the last (denoted as fully con-
tumor. Pathologists also need to grade the nected) layer and all the learning procedures for
tumor to examine the degree of cancer in some conventional neural networks are still per-
cases [3]. Unfortunately, the visual analysis formed. So, what is the difference between
manually carried out by the pathologists is an CNNs and conventional neural networks? The
error-prone, tiresome, and subjective task, caus- inputs of CNNs are images and so CNNs allow
ing inevitable errors in decisions. To ease the us to encode the characteristics of them into the
workload on experts and/or pathologists and architecture. The forward function can then be
improve the efficiency of the diagnosis perfor- implemented more efficiently, thereby reducing
mance, researchers have focused on automating the number of parameters in the network.
the diagnosis process through computer-aided Thanks to the effectiveness of CNNs in CAD
diagnosis (CAD) systems in recent years. The systems, the diagnosis process of breast cancer
basic steps of CAD systems are as follows: (1) from histopathological images has aroused the
preprocessing, (2) segmentation, (3) feature interest of researchers. Especially with the relea-
extraction, and (4) classification. The CAD sys- sement of the largest publicly available datasets,
tems wrapped around traditional machine the studies have rapidly been increased to
learning methods use specified classifiers over a develop CNN-based automated systems to deal
set of handcrafted features obtained from histo- with these datasets. Spanhol et al. [5] used a
pathology images to predict the output labels. variant of AlexNet [6] to form an automated
However, the classification performance of them breast cancer diagnosis method on histopatho-
is not competitive and even far from need. logical images created based on a set of pixel
Moreover, extracting handcrafted features is a patches generated by using random strategies
computationally intensive and complex process and sliding window. The obtained accuracy
due to the requirement of extensive prior-domain was between 81% and 89%. In another work,
knowledge. Fortunately, the recent advancements Spanhol et. al. [7] extracted deep features using
# Train/Validation/Test Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train,
y_train, test_size=0.25, random_state=42)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
y_valid = to_categorical(y_valid)
Cancer
CNN
Histopathological
Image
FIGURE 13.2 The general methodology of CNN architecture for breast cancer diagnosis from histopathological images.
CNN, Convolutional neural network.
histopathological image dataset for the training CNN model is built on the train image set.
stage. In this stage, we should first deal with Finally, the performance of the trained model
data origin problems such as noise, artefacts, is evaluated on the test image set to verify the
inconsistency, and class-imbalance. Then, we diagnosis performance of the model on the
split the preprocessed data into the train and unseen dataset. A sample CNN model with 6
the test image sets. In the training stage, a layers is shown as below.
import tensorflow as tf
from tensorflow import keras
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu', input_shape = (50,50,3)))
model.add(Conv2D(32, (3,3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.3))
model.add(Dense(2, activation = "softmax"))
model.summary()
13.2 Materials and methods 327
13.2.2.2 Pretrained convolutional neural whole architecture. At the end of partially
network-based diagnosis method connected layers, two fully connected layers
A number of pretrained CNN architectures are leveraged. The number 16 in VGG16
have been introduced due to the motivation of represents that the architecture has 16
addressing new datasets, such as MNIST, layers. A fixed-sized RGB image is used to
CIFAR-10, and competitions such as ImageNet. train the model, and the only preprocessing
Some of these architectures are given as follows. that takes place at the training stage is
subtracting the mean RGB values computed
1. VGG [18], which is treated as one of the for each pixel on the training set.
successful CNN architectures, was first Proportionally to the network depth, the
proposed to win the ImageNet competition training process of VGG slows down and
in 2014. VGG does not have a complex the parameter size becomes quite large.
structure, that is, VGG16 applies VGG is easy to explainable and works
convolutional layers with 3 3 3 sized filters properly for classical classification
and a stride 1 and uses same padding and problems. However, the large number of
max pooling with 2 3 2 sized filters and parameters causes high computational cost.
stride 2. The convolution and max pooling A sample VGG16 model with 7 CNN layers
layers are arranged consistently over the is shown as below.
# VGG16 Model
import tensorflow as tf
from tensorflow import keras
base_model=tf.keras.applications.VGG19(input_shape=( 50,50,3),
include_top=False,weights="imagenet")
model=Sequential()
model.add(base_model)
model.add(Dropout(0.5))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(2048,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2,activation='softmax'))
model.summary()
2. ResNet [19] is an exotic type of architectures 3. DenseNet [20] connects each layer to every
that uses a network of microarchitecture other layer. For L layers, the total number
modules unlike conventional sequential of direct connections is L(L 1 1)/2. Each
pretrained architectures such as AlexNet and layer takes the feature maps as inputs
VGG. Microarchitecture represents the generated by the preceding layers. In
building blocks used to build the entire other words, the input of a layer in a
network. ResNet has a much deeper DenseNet architecture is the
architecture than VGG but has a smaller concatenation of feature maps of previous
number of actual weighting parameters. This layers. The architecture in DenseNet is
is because ResNet leverages global average divided into DenseBlocks, where the
pooling rather than fully connected layers. dimensionality of a feature map remains
ResNet addresses vanishing gradients, constant, but the number of filters
accelerates the training speed, provides higher changes between them. Convolution and
accuracy in classification problems, and pooling are the first layers in DenseNet.
detects redundant extracted features. On the Then, there is a dense block followed by a
other hand, ResNet has an increased complex transition layer, and finally a dense block
architecture. Moreover, skip connections followed by a classification layer.
between layers cause extra dimensionality. DenseNet addresses vanishing gradients,
The ResNet50 model is generated as follows. enhances feature reuse, and reduces the
4. MobileNet [21] applies the same convolution rate of a deep learning model based on the loss
as CNNs to filter images, but its method function value. Accordingly, it helps to reduce
differs from that of CNN in such a way that the overall loss and increase the accuracy.
it performs depth convolution and point There are variety of optimizers such as
convolution different from the conventional gradient descent, stochastic gradient descent,
convolution as done by CNNs. Accordingly, AdaGrad, root mean square (RMS Prob), and
the efficiency of CNN is increased and so it Adam. As a deep learning model generally
is possible to integrate MobileNet in the consists of millions of parameters, choosing the
mobile systems. In other words, it is best weights for it can be a daunting task.
possible to obtain a better response in a Therefore it is necessary to choose a
short time due to the time efficiency. The suitable optimizer for your application.
MobileNet model is generated as follows. 2. A loss function: The function is used to
quantify how good or bad the model
Cancer
Histopathological
Image Pre-trained Model
Fully Output
Connected Layer
Layer
FIGURE 13.3 The general methodology of pretrained architecture for breast cancer diagnosis.
divided into the following groups: (e.g., accuracy, confusion matrix, F1-score,
regression metrics (e.g., mean absolute and receiver operating characteristic curve
error, mean squared error, and root mean [ROC]).
squared error) and classification metrics In this study, the compilation stage is imple-
mented as follows.
def f1_score(y_true, y_pred): #taken from old keras source code
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
recall = true_positives / (possible_positives + K.epsilon())
f1_val = 2*(precision*recall)/(precision+recall+K.epsilon())
return f1_val
METRICS = [
tf.keras.metrics.BinaryAccuracy(name= 'accuracy'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall'),
tf.keras.metrics.AUC(name='auc'),
f1_score,
]
model.compile(optimizer='Adam', loss='binary_crossentropy',metrics=METRICS)
history=model.fit(X_train, y_train,validation_data=(X_valid, y_valid),verbose = 1,
epochs = 20,callbacks=[lrd,mcp,es])
Classifiers Train accuracy Val accuracy Test accuracy F1-score ROC area
13.4 Conclusion [7] F.A. Spanhol, et al., Deep features for breast cancer
histopathological image classification, in: 2017 IEEE
International Conference on Systems, Man, and
In this chapter, we leveraged deep learning Cybernetics (SMC), 2017.
architectures to carry out a comparative study on [8] Bayramoglu, N., J. Kannala, J. Heikkilä, Deep learning
detecting breast cancer from histopathological for magnification independent breast cancer histopa-
images. According to the results, we obtained thology image classification, in: 2016 23rd International
Conference on Pattern Recognition (ICPR), 2016.
92.2% test accuracy and 0.934 F1-score with
[9] B. Wei, et al., Deep learning model based breast cancer
VGG16 in the IDC dataset. It can therefore be histopathological image classification, in: 2017 IEEE
revealed that deep learning architectures, espe- 2nd International Conference on Cloud Computing
cially pretrained models, are really good alterna- and Big Data Analysis (ICCCBDA), 2017.
tives for the diagnosis process of the breast [10] S. Pratiher, S. Chattoraj, Manifold learning & stacked
sparse autoencoder for robust breast cancer classifi-
cancer. We would also want to notify that increas-
cation from histopathological images. arXiv, 2018,
ing the deepness of an architecture does not verify 1806.06876.
better diagnosis performance if not well-designed [11] A.-A. Nahid, Y. Kong, Histopathological breast-image
and well-prepared. For the future, we would like classification using local and frequency domains by
to work upon transfer learning to improve the convolutional neural network, Information 9 (1) (2018).
[12] A.-A. Nahid, M.A. Mehrabi, Y. Kong, Histopathological
effectiveness of the diagnosis process. To achieve
breast cancer image classification by deep neural net-
this, we will first design a deep architecture by work techniques guided by local clustering, BioMed.
integrating well-designed modules to each other. Res. Int. (2018) 2362108. 2018.
Moreover, we will propose a diagnosis methodol- [13] D. Bardou, K. Zhang, S.M. Ahmad, Classification of breast
ogy based on a feature selection method to select cancer based on histology images using convolutional
neural networks, IEEE Access. 6 (2018) 2468024693.
the most appropriate features from the feature set
[14] Shallu, R. Mehra, Breast cancer histology images clas-
extracted by deep architectures for classification. sification: training from scratch or transfer learning?
ICT Express 4 (4) (2018) 247254.
[15] Z. Xiang, et al., Breast cancer diagnosis from histopatho-
References logical image based on deep learning, in: 2019 Chinese
[1] S. Boumaraf, et al., A new transfer learning based Control and Decision Conference (CCDC), 2019.
approach to magnification dependent and independent [16] A. Janowczyk, A. Madabhushi, Deep learning for digital
classification of breast cancer in histopathological images, pathology image analysis: a comprehensive tutorial with
Biomed. Signal. Process. Control. 63 (102192) (2021). selected use cases, J. Pathol. Inform. 7 (1) (2016) 29. ISSN
[2] R. Rashmi, K. Prasad, C.B.K. Udupa, Breast histopath- 21533539, https://doi.org/10.4103/2153-3539.186902.
ological image analysis using image processing techni- [17] IBM Cloud Education, Convolutional neural networks
ques for diagnostic purposes: a methodological https://www.ibm.com/cloud/learn/convolutional-
review, J. Med. Syst. 46 (1) (2021) 7. neural-networks, 2020.
[3] X. Zhou, et al., A comprehensive review for breast his- [18] K. Simonyan, A. Zisserman, Very deep convolutional
topathology image analysis using classical and deep networks for large-scale image recognition, in: 3rd
neural networks, IEEE Access. 8 (2020) 9093190956. International Conference on Learning Representations
[4] F.A. Zeiser, et al., Breast cancer intelligent analysis of (ICLR2015), 2015.
histopathological data: a systematic review, Appl. Soft [19] K. He, et al., Deep residual learning for image recogni-
Comput. 113 (107886) (2021). tion, in: 2016 IEEE Conference on Computer Vision
[5] F.A. Spanhol, et al., Breast cancer histopathological and Pattern Recognition (CVPR), 2016.
image classification using Convolutional Neural [20] G. Huang, et al., Densely connected convolutional net-
Networks, in: 2016 International Joint Conference on works, in: IEEE Conference on Computer Vision and
Neural Networks (IJCNN), 2016. Pattern Recognition (CVPR2017), 2017.
[6] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet [21] A.G. Howard, et al., MobileNets: efficient convolu-
classification with deep convolutional neural net- tional neural networks for mobile vision applications,
works, Commun. ACM 60 (6) (2017) 8490. CoRR (2017). abs/1704.04861.
14
Artificial intelligence based Alzheimer’s
disease detection using deep
feature extraction
Manav Nitin Kapadnis1, Abhijit Bhattacharyya2 and
Abdulhamit Subasi3,4
1
Department of Electrical Engineering, Indian Institute of Technology Kharagpur, Kharagpur,
West Bengal, India 2Department of Electronics and Communication Engineering, National Institute of
Technology Hamirpur, Hamirpur, Himachal Pradesh, India 3Institute of Biomedicine, Faculty of
Medicine, University of Turku, Turku, Finland 4Department of Computer Science, College of
Engineering, Effat University, Jeddah, Saudi Arabia
O U T L I N E
worldwide the total number of AD patients can provide insight about beta-amyloid protein level,
reach up to 152 million [1] and 1 out of 85 per- (3) tau PET-CT are utilized for tau (protein
sons will be affected by AD [2]. In contrast to responsible for formation of neurofibrillary tan-
other brain disorders, AD has significantly high gles in the nerve cells) detection [6,7]. For exam-
mortality rate and is rising each year. ple, higher level of beta-amyloid protein can be
AD has multiple stages such as predementia, confirmed by a positive amyloid PET-CT scan
early, middle, and advanced stages. In the prede- that can assist to confirm AD. Other conditions
mentia stage, the common symptoms include such as head trauma, stroke, and tumors contrib-
mild cognitive impairment (MCI) with forgetful- uting to dementia can also be disregarded using
ness that mimics the natural aging process. In the the aforementioned diagnostic methods [8]. The
early AD stage, person suffers from impairment MRI scan measures the volume alteration at
of executive functions, learning, and memory, characteristic positions to analyze AD which can
often leading to language difficulty. Patient feels provide up to 87% analytical accuracy [9,10].
more speech difficulty, reading and writing skills More often the appraisal is performed on
are largely impeded in the middle stage. In the temporoparietal cortical atrophy and mesial
advanced stage, AD patients exhibit apathy and temporal lobe atrophy (i.e., entorhinal cortex
fail to perform even simple tasks independently. and hippocampus). Direct estimation is quanti-
Eventually, the patients become immobilized and fied by measuring the volume loss of hippo-
death occurs [3,4]. campal or parahippocampal tissue, whereas
The early diagnosis of this disease can reduce indirect estimation is based on the parahippo-
the surging pervasiveness and high mortality. Till campal fissure’s magnification.
date, AD has no curable treatment and only after Recently, the accessibility of advanced nonin-
early identification of the disorder, the disease vasive imaging methods has gained significant
progression can be slowed with cognition enhanc- attention of the researchers developing reliable
ing drugs, physical exercises, and proper lifestyle and precise tools for brain condition monitoring.
management. The diagnosis of AD is commonly In fact, many computer-assisted systems for AD
carried out based on patient’s illness history and diagnosis and severity analysis have been pro-
also using physiological and neurological features. posed and implemented in literature [1113]. In
Patient’s medical history can be received from general, the primary steps involved in a
relatives and assessing patient’s behavior [5]. computer-aided AD diagnosis system are listed
In recent times, for the diagnosis of AD, clini- as follows: (1) brain imagery collection using a
cians generally rely on patient’s brain image recommended and standard imaging technique,
analysis, as current imaging systems provide (2) enhancement of the brain images with a
wide range of information about the subject’s suitable image enhancement tool, (3) extraction
health condition. Commonly used imaging tech- of automatic/handcrafted discriminatory fea-
niques are computed tomography (CT), positron tures from the enhanced images, (4) selection of
emission tomography-CT (PET-CT), and mag- dominant features using feature selection techni-
netic resonance imaging (MRI). A CT image can ques, (5) building a classification model for brain
be used for diagnosing dementia by inspecting image categorization, and (6) validation of the
the different brain region sizes that includes fron- built system using new test images.
tal lobe, temporal lobe, and hippocampus. PET- In this work, we aim to build AI-driven AD
CT provides information about different types of diagnosis method using brain MRI images.
brain functions such as (1) fluorodeoxyglucose More specifically, we propose different transfer
(FDG) PET-CT are generally used for measuring learning (TL) models [pretrained deep neural
glucose levels in the brain, (2) amyloid PET-CT networks (DNNs)] for automatic learning and
scans. Several features were extracted, includ- outperforming the existing methods [28]. In
ing wavelet transform-based SS features, wave- another work, Wang et al. [29] presented a 3D
let entropy, and wavelet orientation. The DF estimation-based method for discrimination
authors employed multilayer perceptron with of AD and healthy subjects. They extracted fea-
biogeography-based optimization algorithm tures using 3D DF method and selected statisti-
for the classification of the extracted features cally significant features using Bhattacharyya
that outperformed the existing methods. distance, Welch’s t-test (WTT), and Student’s t-
Zhang et al. [24] introduced an AD detection test. Finally, selected features were classified
framework using MRI scans based on under- using SVM and TSVM classifiers.
sampling method. Principal component analy- Zhang et al. [30] presented a computer-aided
sis along with singular value decomposition diagnostic (CAD) system for AD detection with
methods were used for feature computation MRI images. The authors used maximum inter-
and discriminatory feature selection. Further, class variance for selecting key slices of 3D MRI.
decision tree (DT) and SVM classifiers were Afterward, for each slice set eigenbrain was
utilized for achieving significant detection generated. Then, significant eigenbrains were
performance. selected using WTT and fed to SVM classifier
Zhang and Wang [25] computed displace- with different kernels. Further, the prediction
ment field (DF) in MRI for detecting abnormal- accuracy was notably improved using PSO algo-
ities present in normal brain for AD detection. rithm. Zhang et al. [31] introduced a CAD sys-
The discrete wavelet transform (DWT)- tem to detect AD from MRI scans. From each
based features were computed and feature MRI scan wavelet entropy and Hu moment
dimensionality was reduced using PCA. The invariant features were extracted. Then
final set of selected features were categorized extracted features were classified using com-
using three different classifiers, namely, SVM, putation of generalized eigenvalues with
twin SVM (TSVM), and generalized eigenvalue SVM. Hett et al. [32] presented multi-textural
proximal SVM. The authors concluded that DF (MTL) pipeline for feature computation from
is useful in AD diagnosis when MRI scans MRI scans. The AD structural information
were utilized. El-Dahshan et al. [26] described was estimated via MTL approach. Further,
an MRI-based AD classification method using adaptive fusion method was applied for fus-
a hybrid approach. Their framework includes ing texture grading features computed using
three basic steps that are as follows: feature 3D Gabor filter. Their work achieved signifi-
extraction, dimensionality reduction, and clas- cant performance improvement over existing
sification. They extracted DWT-based MRI fea- biomarker methods. Gao et al. [33] employed
tures and then feature reduction was carried deep learning for obtaining early-stage AD
by irrelevant points using PCA. Finally, two information and classification. They fused 2D
classification algorithms, namely, ANN and and 3D CNN that provided significant per-
KNN were employed for the classification of formance with a softmax layer.
normal and AD MRI scans. Wang et al. [27] Ayadi et al. [34] described a hybrid method
introduced a novel AD classification algorithm for extracting features from brain MRI scans
using the features of Zernike moment (ZM), and proposed a classification system. Initially,
followed by a linear regression (LR) classifier. DWT-based features were extracted from the
The ZM was used to extract features with test images, and further, Bag-of-Words method
lengths between 10 and 256 from each MRI adopted for grouping key image features.
image. The computed features were classified Finally, several ML techniques, such as ran-
LR that achieved an accuracy of 97.51%, dom forest (RF), AdaBoost, KNN, and SVM
inventor and major maintainer, offered the name markings that are not accessible in large num-
Xception. The Xception architecture is an expan- bers and are not the part of a larger tool such as
sion of the standard that replaces the Inception ImageNet. This causes issues since neural net-
architecture. works require a lot of training data to build
One of the challenges that image analysts from start. However, the most important thing
encounters is that labeled training data may not to remember about image data is that the char-
be suitable for a particular purpose. Consider acteristics retrieved from an image are just that:
the following scenario: you have a set of images features from a specific dataset are particularly
that must be combined to get a single image. reusable across data sources [35].
Although retrieval applications do not use The following code snippet gives us a tem-
labels, semantic compatibility between features plate for replicating our model architecture for
is critical. In other cases, you might want to feature extraction using a TL model, (import-
classify a dataset using a specific collection of ing of libraries step is skipped):
x = base_model.output
x = Dropout(0.5)(x)
x = Flatten()(x)
x = BatchNormalization()(x)
x = Dense(1024,kernel_initializer='he_uniform')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)
x = Dense(1024,kernel_initializer='he_uniform')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)
x = Dense(1024,kernel_initializer='he_uniform')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)
model_feat = Model(inputs=base_model.input,outputs=predictions)
train_features = model_feat.predict(x_train)
test_features = model_feat.predict(x_test)
model = Sequential()
model.add(Flatten(input_shape=(224,224,3)))
model.add(Dense(units=4, activation='relu'))
model.add(Dense(units=8, activation='relu'))
model.add(Dense(units=16, activation='relu'))
model.add(Dense(units=4))
model.add(Activation('softmax'))
METRICS = [
tf.keras.metrics.BinaryAccuracy(name='accuracy'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall'),
tf.keras.metrics.AUC(name='auc'),
f1_score,
]
exponential_decay_fn = exponential_decay(0.01, 5)
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(exponential_decay_fn)
model.compile(optimizer= tf.keras.optimizers.Adam(
learning_rate = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-
07, amsgrad=False, name='Adam'),loss='categorical_crossentropy',metrics=METRICS)
Model = KNeighborsClassifier()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)
14.3.2.3 Support vector machine along with feature extraction from image such
The SVM is a supervised ML approach for as Gaussian blur, edge detection features,
classifying and predicting data. It is, however, among others. They achieved impressive results
mainly used to tackle categorization problems. for early Alzheimer’s detection on MRI images.
Each data item is represented as a point in n- Alam et al. [47] suggested an innovative method
dimensional space (where n is the number of for identifying AD from healthy controls by
features), with the value of each feature being employing dual-tree complex wavelet trans-
the SVM algorithm’s position value. Then, to forms, main coefficients from MRI transaxial
complete classification, we find the hyperplane slices, linear discriminant analysis, and TSVM.
that clearly divides the two categories. Rabeh The following code snippet gives a deep detail
et al. [46] implemented SVM for AD detection about the model architecture used for SVM:
Model = SVC()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)
342 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction
Model = RandomForestClassifier()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)
Model = AdaBoostClassifier()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)
Model = XGBClassifier()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)
FIGURE 14.1 General framework for Alzheimer’s disease detection utilizing artificial intelligence methods.
(A) (B)
(C) (D)
FIGURE 14.2 MRI scans for different subjects: (A) nondemented, (B) mild demented, (C) very mild demented, and (D)
moderate demented. MRI, Magnetic resonance imaging.
includes total of 5480 images. We divided this gives a description of class distribution of the
dataset into training, validation, and test sets. four classes in the each of the three datasets.
The training and test sets are provided by the
creator of datasets themselves. The training set
is further split into training and validation set, 14.4.2 Performance evaluation measures
80% and 20%, respectively. The training, vali- The performance measurements for the train-
dation, and testing set contains around 4097, ing set and the test set cannot be presumed to
1024, and 1279 images, respectively. Table 14.1 be identical. Although the training set contains
the majority of the instances, the test set is TP are correctly predicted positive values,
intended to be more realistic. A lack of data is a demonstrating that the value of the actual class
significant challenge for ConvNets. In this and the value of the projected class are both
example, the methodologies utilized to evaluate yes. For instance, if the actual class value shows
the performance measurements are still debat- that this patient survived, and the anticipated
able. In the case of the training set, the classifier class also suggests the same. TN are correctly
is fine-tuned to produce the best performance predicted negative values, implying that the
measurements. The most essential thing to value of the actual class is zero and the value of
remember when training is that no instances the predicted class is zero as well. For instance,
from the test set must be included in the classi- if the real class states the patient did not survive
fier construction. As a result, the performance and the predicted class says the same.
measurements from the test set may be FP and FN occur once your actual class is
expected to be similar to the performance mea- different from the predicted class. FP are when
sures from the control set. A classifier assigns a the actual class is not the same as the predicted
categorization to an image. It is believed to be class. For instance, if the actual class implies
successful if the category matches the specified that this patient did not survive, but the fore-
category. If there is a discrepancy, it is pre- cast class implies that this patient will. FN are
sumed to be a mistake. The majority of perfor- situations in which the real class is yes, but the
mance metrics are dependent on a classifier’s expected class is no. For instance, if the
error rate. A large number of examples are patient’s actual class value indicates that he or
included in the training set. To have the best she survived, while the predicted class value
performance measurements, the validation indicates that the person would die.
dataset should have a distribution that is nearly Accuracy is the most straightforward perfor-
identical to the test set. The validation set mance metric, because it is simply the ratio of cor-
should be used to fine-tune parameters, while rectly predicted observations to total observations.
the test set should be used to determine the
Accuracy 5 ðTP 1 TNÞ=ðTP 1 FP 1 FN 1 TNÞ
final values of performance measures [57].
A confusion matrix is a table, which is often Precision is the ratio of correctly predicted
employed to illustrate the performance of a positive observations to the total predicted pos-
classification model on a set of test data for itive observations.
which the true values are known. All the
metrics we have used to evaluate the perfor- Precision 5 TP=TP 1 FP
mance of the model can be assessed using the Recall is the ratio of correctly predicted positive
confusion matrix (see Fig. 14.3). observations to all observations in actual class.
The observations that are accurately pre-
Recall 5 TP=TP 1 FN
dicted and hence represented in green are true
positives (TP) and true negatives (TN). We F1-score is the weighted average of precision
wish to reduce false positives (FP) and false and recall. Hence, this score takes both FP and
negatives (FN), thus they are presented in red. FN into account. Naturally, it is not as simple
Predicted class
Class = Yes Class = No
Actual Class
Class = Yes True Posive False Negave
Class = No False Posive True Negave
ðRecall 1 PrecisionÞ
print()
print('------------------------ Train Set Metrics---------------------')
print()
print("accuracy : {}%".format(train_accuracy))
print("F1_score : {}".format(train_F1))
print("Recall : {}".format(train_recall))
print("Precision : {}".format(train_precision))
print("Confusion Matrix :\n {}".format(train_confusion_matrix))
print()
print('------------------------ Validation Set Metrics----------------')
print()
print("accuracy : {}%".format(val_accuracy))
print("F1_score : {}".format(val_F1))
print("Recall : {}".format(val_recall))
print("Precision : {}".format(val_precision))
print("Confusion Matrix :\n {}".format(val_confusion_matrix))
print()
print('------------------------ Test Set Metrics-----------------------')
print()
print("accuracy : {}%".format(test_accuracy))
print("F1_score : {}".format(test_F1))
print("Recall : {}".format(test_recall))
print("Precision : {}".format(test_precision))
print("Confusion Matrix : {}".format(test_confusion_matrix))
ANN 74.44 75.05 75.02 0.0164 0.0086 0.5238 ANN 75.82 76.61 77.56 0.4869 0.4292 0.5677
KNN 60.35 42.19 42.73 0.4099 0.4273 0.4036 KNN 63.21 45.8 44.69 0.4273 0.4469 0.4246
SVM 50 50 50 0.3333 0.5 0.25 SVM 51.86 51.56 51.09 0.3758 0.5109 0.4307
Random 100 43.65 46.02 0.4321 0.4602 0.4219 Random 100 49.51 46.09 0.431 0.4609 0.4183
Forest Forest
AdaBoost 49.58 49.51 49.45 0.3866 0.4945 0.3925 AdaBoost 50.56 51.37 50 0.3952 0.5 0.4534
XGBoost 91.89 47.56 46.64 0.4257 0.4664 0.4201 XGBoost 89.97 49.41 48.52 0.4509 0.4852 0.4457
XGBoost 89.97 49.41 48.52 0.4509 0.4852 0.4457 Training Validation Test F1-
Model accuracy accuracy accuracy score Recall Precision
Training Validation Test F1- All of the classifiers are overfitting in this
Model accuracy accuracy accuracy score Recall Precision case, except ANN, which shows consistent accu-
ANN 76.38 77.49 76.11 0.2971 0.2048 0.561 racy of around B75%. Here ANN can be consid-
KNN 64.62 46.88 48.59 0.4658 0.4859 0.4603
ered a good classifier for AD classification.
SVM 50 50 50 0.333 0.5 0.25
InceptionV3
Random 100 48.73 48.67 0.4664 0.4867 0.4598
Forest
Training Validation Test F1-
AdaBoost 51.51 50.88 48.91 0.4424 0.4891 0.4437 Model accuracy accuracy accuracy score Recall Precision
XGBoost 90.23 49.32 48.05 0.4573 0.4805 0.4508 ANN 74.76 75.07 75.04 0.003 0.0016 1
[32] K. Hett, V.-T. Ta, J.V. Manjón, P. Coupé, Adaptive in: 2020 IEEE International Symposium on Sustainable
fusion of texture-based grading for Alzheimer’s dis- Energy, Signal Processing and Cyber Security (iSSSC),
ease classification, Comput. Med. Imaging Graph. 70 Gunupur Odisha, India, Dec. 2020, pp. 16. Available
(2018) 816. Available from: https://doi.org/10.1016/ from: https://doi.org/10.1109/iSSSC50941.2020.9358867.
j.compmedimag.2018.08.002. [45] A.J. Dinu, R. Ganesan, Early detection of Alzheimer’s
[33] X.W. Gao, R. Hui, Z. Tian, Classification of CT brain disease using predictive k-NN instance based approach
images based on deep learning networks, Comput. and T-test method, Int. J. Adv. Trends Comput. Sci. Eng.
Methods Prog. Biomed. 138 (2017) 4956. Available 8 (1.4) (Sep. 2019) 2937. Available from: https://doi.
from: https://doi.org/10.1016/j.cmpb.2016.10.007. org/10.30534/ijatcse/2019/0581.42019.
[34] W. Ayadi, W. Elhamzi, I. Charfi, M. Atri, A hybrid [46] A.B. Rabeh, F. Benzarti, H. Amiri, Diagnosis of
feature extraction approach for brain MRI classifica- Alzheimer diseases in early step using SVM (support
tion based on bag-of-words, Biomed. Signal. Process. vector machine), in: 2016 13th International Conference
Control. 48 (2019) 144152. Available from: https:// on Computer Graphics, Imaging and Visualization
doi.org/10.1016/j.bspc.2018.10.010. (CGiV), Beni Mellal, Morocco, Mar. 2016, pp. 364367.
[35] C.C. Aggarwal, Neural Networks and Deep Learning: A Available from: https://doi.org/10.1109/CGiV.2016.76.
Textbook, Springer, Cham, Switzerland, 2018. Available [47] S. Alam, G.-R. Kwon, J.-I. Kim, C.-S. Park, Twin SVM-
from: https://link.springer.com/book/10.1007/978-3-319- based classification of Alzheimer’s disease using com-
94463-0. plex dual-tree wavelet principal coefficients and LDA,
[36] S.J. Pan, Q. Yang, A survey on transfer learning, IEEE J. Healthc. Eng. 2017 (2017) 8750506. Available from:
Trans. Knowl. Data Eng. 22 (10) (Oct. 2010) https://doi.org/10.1155/2017/8750506.
13451359. Available from: https://doi.org/10.1109/ [48] P.J. Moore, T.J. Lyons, J. Gallacher, Alzheimer’s
TKDE.2009.191. Disease Neuroimaging Initiative, Random forest pre-
[37] K. Simonyan, A. Zisserman, Very deep convolutional diction of Alzheimer’s disease using pairwise selection
networks for large-scale image recognition. arXiv pre- from time series data, PLoS One 14 (2) (2019)
print arXiv:1409.1556, https://arxiv.org/abs/1409.1556, e0211558. Available from: https://doi.org/10.1371/
Sep. 2014. journal.pone.0211558.
[38] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning [49] A. Sarica, A. Cerasa, A. Quattrone, Random Forest
for image recognition, ArXiv151203385 Cs, http://arxiv. algorithm for the classification of neuroimaging data
org/abs/1512.03385, Dec. 2015 (accessed 01.03.22). in Alzheimer’s disease: a systematic review, Front.
[39] Y. Bengio, Learning deep architectures for AI, Found. Aging Neurosci. 9 (329) (2017). Available from:
Trendss Mach. Learn. 2 (1) (2009) 1127. Available https://doi.org/10.3389/fnagi.2017.00329.
from: https://doi.org/10.1561/2200000006. [50] A.V. Lebedev, et al., Random Forest ensembles for
[40] E. Alpaydin, Introduction to Machine Learning, Third detection and prediction of Alzheimer’s disease with a
edition, The MIT Press, Cambridge, Massachusetts, 2014. good between-cohort robustness, NeuroImage Clin. 6
[41] A. Subasi, Use of artificial intelligence in Alzheimer’s (2014) 115125. Available from: https://doi.org/
disease detection, Artificial Intelligence in Precision 10.1016/j.nicl.2014.08.023.
Health, Elsevier, 2020, pp. 257278. Available from: [51] M.S. Ali, Md. K. Islam, J. Haque, A.A. Das, D.S. Duranta,
https://doi.org/10.1016/B978-0-12-817133-2.00011-2. M.A. Islam, Alzheimer’s disease detection using m-
[42] S. Vieira, W.H.L. Pinaya, A. Mechelli, Using deep learn- Random Forest algorithm with optimum features extrac-
ing to investigate the neuroimaging correlates of psychiat- tion, in: 2021 1st International Conference on Artificial
ric and neurological disorders: methods and applications, Intelligence and Data Analytics (CAIDA), Riyadh, Saudi
Neurosci. Biobehav. Rev. 74 (2017) 5875. Available Arabia, Apr. 2021, pp. 16. Available from: https://doi.
from: https://doi.org/10.1016/j.neubiorev.2017.01.002. org/10.1109/CAIDA51941.2021.9425212.
[43] M. Deepika Nair, M.S. Sinta, M. Vidya, A study on vari- [52] A. Savio, M. Garcı́a-Sebastián, M. Graña, J. Villanúa,
ous deep learning algorithms to diagnose Alzheimer’s Results of an Adaboost approach on Alzheimer’s disease
disease, in: D. Pandian, X. Fernando, Z. Baig, F. Shi, detection on MRI, in: J. Mira, J.M. Ferrández, J.R. Álvarez,
(Eds.), Proc. International Conference on ISMAC in F. de la Paz, F.J. Toledo, (Eds.). Bioinspired applications
Computational Vision and Bio-Engineering 2018 (ISMAC- in artificial and natural computation, vol. 5602, Berlin,
CVB), vol. 30, Cham: Springer International Publishing, Heidelberg: Springer Berlin Heidelberg, 2009,
2019, pp. 17051710. Available from: https://doi.org/ pp. 114123. Available from: https://doi.org/10.1007/
10.1007/978-3-030-00665-5_157. 978-3-642-02267-8_13.
[44] S. Aruchamy, V. Mounya, A. Verma, Alzheimer’s disease [53] J.H. Morra, Z. Tu, L.G. Apostolova, A.E. Green, A.W.
classification in brain MRI using modified kNN algorithm, Toga, P.M. Thompson, Comparison of AdaBoost and
Note: Page numbers followed by “f” and “t” refer to figures and tables respectively.
357
358 Index
Mammographic Image Analysis Ocular OCT, 305306 Rectified linear unit (ReLU), 16,
Society (MIAS), 114115, Ophthalmological surgery, 306 1819, 54, 269
124128 Optical coherence tomography (OCT), Recurrent neural networks (RNNs),
Mammography, 110112, 138, 306, 318 23, 191, 285286
321322 angiography, 306 Region of interest (ROI), 110
Manhattan distance, 78 Optical rational tomography, 305306 Regularization, 1516
MATLAB 2013a software, 54 Ovarian cancer, 265266 Residual convolutional neural
Matthews correlation coefficient Over Feat, 251 network, 212214
(MCC), 289290 Overfitting problem in neural network Residual network (ResNet), 36, 57, 76,
Max pooling, 271272, 307308, 325, training, 1516 170, 212213, 251, 284285,
337338 regularization, 1516 328, 337338
Medical imaging, 23, 7576, 339 ResNet-18, 286
Melanin, 185186 P ResNet50, 68, 8284, 170, 251, 349
Melanocytes, 185186 Padding, 18, 271272 ResNet101, 69, 8284, 93, 170, 251,
Melanoma, 183184 Paralysis, 207208 277, 337338, 349
Melanoma cancer, 184 Particle swarm optimization (PSO), Restricted Boltzmann machines, 247
Melanoma skin cancer, 185186 335 Retinal diseases, 306
Messidor-2 datasets, 246 Partition clustering, 56 Retinal images, 244245
Metastatic melanoma, 184 Performance evaluation metrics, 61, Reverse transcription-polymerase
Metrics, 329330 123124 chain reaction (RTPCR), 224
Micro-architecture, 337338 Planting lesions, 242243 RMSprop, 116119
Microaneurysms, 242, 244 Polyps, 265266 Root mean square (RMS), 329
Minimalist machine learning (MML), Pooling, 16, 19, 284
285286 Positron emission tomography-CT S
MNIST, 327329 (PET-CT), 334 Scikit-Learn, 59
MobileNet, 8284, 277278, 329, Precision, 123, 253, 346 Secretions, 244
337338, 349 Prediction Segmentation process, 111112
MobileNetV2 model, 69, 8284, 93, breast cancer, 116119, 151164 Semisupervised learning, 4
251, 277278, 337338, 349 lung cancer, 5960 Sensitivity (SEN), 289290
Multi-textural pipeline (MTL magnetic resonance imaging, 88 Sigmoid function, 14
pipeline), 336 Pregnancy, 242 Sigmoid layer, 116119
Multiclass classification, 5960 Preprocessing methods, 110 Sign function, 14
Multilabel classification, 60 Pretrained convolutional neural Similarity-based clustering, 56
Multilayer perceptron (MLP), 187188 network-based diagnosis Single slice method (SS method),
Multilevel feature extractor, 337338 method, 327329 335336
Multiple instance learning (MIL), 187 Pretrained models, 63 Single-task CNN variant, 322323
Multitask CNN variant, 322323 Principal component analysis (PCA), Skin cancer, 183184
59, 268, 285286 Small-cell lung cancer (SCLC), 5152
N Probabilistic neural network (PNN), Softmax classifier, 187, 269
Naive Bayes classifier, 138 335 Sparse coding, 247
Nephropathy, 242 Specificity (SPE), 216217, 289290
Network in Network, 288289 R Squamous-cell carcinoma, 5152
Neural network (NN), 209210 Radiation therapy, 185 Stochastic gradient descent (SGD),
overfitting problem in neural Random Forest (RF), 910, 53, 9394, 116119
network training, 1516 138, 208209, 272273, Strides, 18
Nodular melanoma, 184 336337, 342, 348 Stroke classification models, 208209
Nonlinear functions, 1314 COVID-19 detection with deep Student’s t-test, 336
Nonsmall-cell lung cancer (NSCLC), feature extraction, 231 Subarachnoid ICH (SAH ICH),
5152 RCNN classifier, 138 283284
Recall, 123, 253, 346 Subdural ICH (SDH ICH), 283284
O Receiver operating characteristic curve Superficial spreading melanoma, 184
Occlusion method, 351 (ROC curve), 124, 168, 187, Supervised learning, 4, 747
Ocular melanoma, 184 216217, 234, 244, 289290 AdaBoost, 1112
Index 361
artificial neural networks, 1314 Traditional sequential network Uterine cancer, 265266
bagging, 1011 architectures, 251
boosting, 11 Transfer learning (TL), 3147, 57, 110, V
convolutional neural network, 115116, 149, 225229, Valid padding, 271272
1622 334335 Visual geometry group (VGG), 3435,
data augmentation, 2425 AlexNet, 32 251, 277, 327
decision tree, 9 Inception-ResNet, 4041 VGG16, 5354, 6368, 8284, 93,
deep learning, 14 Inception-v4, 4041 169, 186187, 214215, 251,
GANs, 2530 magnetic resonance imaging, 275, 288, 322323, 337338, 348
K-nearest neighbor approach, 79 8587 VGG19, 68, 8284, 169, 215216,
LSTM, 24 melanoma skin cancer, 195 251, 275, 337338, 348349
overfitting problem in neural MobileNet architecture, 3839 VGGNet, 284285
network training, 1516 ResNet, 36 Visual imaging technologies, 321322
random forest, 910 transfer learning-based Voting feature interval (VFI), 335
RNNs, 23 classification, 309311
transfer learning, 3147 visual geometry group, 3435 W
XGBoost, 1213 Xception, 4243 Weight sharing, 284
Support vector machine (SVM), 89, Trees, 342 Welch’s t-test (WTT), 336
53, 7677, 138, 208209, 268, True Benign, 167
272273, 312, 335337, 341, 348 True malignant, 167 X
classifier, 138 True negative components (TN X-ray
COVID-19 detection with deep components), 216217 images, 2
feature extraction, 231 True negatives (TN), 123, 233234, scans, 224
feature extraction with pretrained 254, 289290, 346 Xception, 4243, 178, 280, 337338,
models, 235 True normal, 167 350
melanoma skin cancer, 189 True positives (TP), 123, 216217, XceptionNet, 8284
Surgery, 185 233234, 254, 289290, 346 XGBoost, 1213, 53, 82, 272273, 343
Twin SVM (TSVM), 336 classifier, 9394, 312, 348
T 2D windows, 324325
t-distributed stochastic neighbor Y
embedding (t-SNE), 272 U YOLO algorithms, 288289
TensorFlow, 42 Ultrasound (US), 321322 YOLOv2, 288289
3D DF estimation-based method, 336 imaging, 137138
Traditional diabetic retinopathy Unsupervised learning, 46, 185. Z
detection approach, 244245 See also Supervised learning
Zernike moment (ZM), 336