Applications of Artificial Intelligence in Medical

Download as pdf or txt
Download as pdf or txt
You are on page 1of 381

APPLICATIONS OF ARTIFICIAL

INTELLIGENCE IN MEDICAL IMAGING


This page intentionally left blank
Artificial Intelligence Applications in Healthcare and Medicine

APPLICATIONS OF
ARTIFICIAL
INTELLIGENCE IN
MEDICAL IMAGING
Edited by

Abdulhamit Subasi
Department of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia;
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2023 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and retrieval system, without permission in writing from the
publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found
at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may
be noted herein).

Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
information, methods, compounds, or experiments described herein. In using such information or methods they should be
mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any
injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions, or ideas contained in the material herein.

ISBN: 978-0-443-18450-5

For Information on all Academic Press publications


visit our website at https://www.elsevier.com/books-and-journals

Publisher: Stacy Masucci


Acquisitions Editor: Rafael E. Teixeira
Editorial Project Manager: Pat Gonzalez
Production Project Manager: Fahmida Sultana
Cover Designer: Harris Greg
Typeset by MPS Limited, Chennai, India
Dedication

A huge thanks to my parents for every time To those who read this book, and appreciate
expecting me to do my best, and telling me I the work that goes into them. If you have any
could accomplish anything, no matter what it feedback, please let me know.
was.
Abdulhamit Subasi
To my wife, Rahime, for her patience and
support.

To my wonderful children, Seyma Nur, Tuba


Nur, and Muhammed Enes. You are always in
my heart and the joys in my life.
This page intentionally left blank
Contents

List of contributors xi 2.2 Literature review 53


2.3 Artificial intelligence models 55
Series preface xiii 2.3.1 Artificial neural networks 55
Preface xv 2.3.2 Deep learning 56
Acknowledgments xvii 2.3.3 Convolutional neural networks 56
2.4 Lung cancer detection using artificial
intelligence 58
1. Introduction to artificial intelligence 2.4.1 Feature extraction using deep learning 58
techniques for medical image analysis 1 2.4.2 Dimension reduction 59
ABDULHAMIT SUBASI 2.4.3 Prediction and classification 59
2.4.4 Experimental data 61
1.1 Introduction 1 2.4.5 Performance evaluation measures 61
1.2 Artificial intelligence for image classification 3 2.4.6 Experimental results 62
1.3 Unsupervised learning (clustering) 5 2.5 Discussion 72
1.3.1 Image segmentation with clustering 6 2.6 Conclusion 72
1.4 Supervised learning 7 References 72
1.4.1 K-nearest neighbor 7
1.4.2 Decision tree 9 3. Magnetic resonance
1.4.3 Random forest 9 imagining-based automated brain tumor
1.4.4 Bagging 10
1.4.5 Boosting 11
detection using deep learning
1.4.6 AdaBoost 11 techniques 75
1.4.7 XGBoost 12 ABHRANTA PANIGRAHI AND ABDULHAMIT SUBASI
1.4.8 Artificial neural networks 13
1.4.9 Deep learning 14 3.1 Introduction 75
1.4.10 The overfitting problem in neural network 3.2 Literature survey 76
training 15 3.3 Deep learning for disease detection 78
1.4.11 Convolutional neural networks 16 3.3.1 Artificial neural networks 78
1.4.12 Recurrent neural networks 23 3.3.2 Deep learning 79
1.4.13 Long short-term memory 24 3.3.3 Convolutional neural networks 80
1.4.14 Data augmentation 24 3.4 Disease detection using artificial
1.4.15 Generative adversarial networks 25 intelligence 82
1.4.16 Transfer learning 31 3.4.1 Feature extraction 82
References 48 3.4.2 Transfer learning 85
3.4.3 Prediction and classification 88
3.4.4 Experimental data 88
2. Lung cancer detection from 3.4.5 Experimental setup 88
histopathological lung tissue images using 3.4.6 Performance evaluation metrics 92
deep learning 51 3.4.7 Experimental results 93
AAYUSH RAJPUT AND ABDULHAMIT SUBASI 3.5 Discussion 95
3.6 Conclusion 106
2.1 Introduction 51 References 106

vii
viii Contents

4. Breast cancer detection from 6.3 Machine learning techniques 187


mammograms using artificial 6.3.1 Artificial neural network 187
6.3.2 k-nearest neighbor 189
intelligence 109
6.3.3 Support vector machine 189
ABDULHAMIT SUBASI, AAYUSH DINESH KANDPAL,
6.3.4 Random Forest 189
KOLLA ANANT RAJ AND ULAS BAGCI
6.3.5 XGBoost 190
4.1 Introduction 109 6.3.6 AdaBoost 190
4.2 Background and literature review 111 6.3.7 Bagging 191
4.3 Artificial intelligence techniques 112 6.3.8 Long short-term memory 191
4.3.1 Artificial neural networks 112 6.3.9 Bidirectional long short-term memory 192
4.3.2 Deep learning 112 6.3.10 Convolutional neural network 193
4.3.3 Convolutional neural networks 114 6.3.11 Transfer learning 195
4.4 Breast cancer detection using artificial 6.4 Results and discussions 196
intelligence 115 6.4.1 Dataset 196
4.4.1 Feature extraction using deep learning 115 6.4.2 Experimental setup 196
4.4.2 Prediction and classification 116 6.4.3 Performance metrics 196
4.4.3 Experimental data 119 6.4.4 Experimental results 197
4.4.4 Performance evaluation measures 123 6.4.5 Discussion 201
4.4.5 Experimental results 124 6.5 Conclusion 203
4.5 Discussion 133 References 203
4.6 Conclusion 134
References 135 7. Brain stroke detection from computed
tomography images using deep learning
5. Breast tumor detection in ultrasound
algorithms 207
images using artificial intelligence 137
AYKUT DIKER, ABDULLAH ELEN AND
OMKAR MODI AND ABDULHAMIT SUBASI ABDULHAMIT SUBASI

5.1 Introduction 137 7.1 Introduction 207


5.2 Background/literature review 138 7.2 Literature survey in brain stroke detection 209
5.3 Artificial intelligence techniques 139 7.3 Deep learning methods 210
5.3.1 Artificial neural networks 139 7.3.1 AlexNet 210
5.3.2 Deep learning 140 7.3.2 GoogleNet 212
5.3.3 Convolutional neural networks 140 7.3.3 Residual convolutional neural network 212
5.4 Breast tumor detection using artificial intelligence 149 7.3.4 VGG-16 214
5.4.1 Feature extraction using deep learning 149 7.3.5 VGG-19 215
5.4.2 Prediction and classification 151 7.4 Experimental results 216
5.4.3 Experimental data 165 7.4.1 Dataset 217
5.4.4 Performance evaluation measures 166 7.5 Conclusion 221
5.4.5 Experimental results 168 References 221
5.5 Discussion 178
5.6 Conclusion 180
References 180
8. A deep learning approach for
COVID-19 detection from computed
6. Artificial intelligence-based skin tomography scans 223
cancer diagnosis 183 ASHUTOSH VARSHNEY AND ABDULHAMIT SUBASI
ABDULHAMIT SUBASI AND SAQIB AHMED QURESHI
8.1 Introduction 224
6.1 Introduction 183 8.2 Literature review 224
6.2 Literature review 186 8.3 Subjects and data acquisition 225
Contents ix
8.4 Proposed architecture and transfer learning 225 10. Automated detection of colon cancer
8.4.1 ResNet 227 using deep learning 265
8.4.2 DenseNet 227
AAYUSH RAJPUT AND ABDULHAMIT SUBASI
8.4.3 MobileNet 228
8.4.4 Xception 228 10.1 Introduction 265
8.4.5 Visual geometry group (VGG) 229 10.2 Literature review 267
8.4.6 Inception/GoogLeNet 229 10.3 Artificial intelligence for colon cancer
8.5 COVID-19 detection with deep feature detection 268
extraction 230 10.3.1 Artificial neural networks 269
8.5.1 K-nearest neighbors 230 10.3.2 Deep learning 269
8.5.2 Support vector machine 231 10.3.3 Convolutional neural networks 270
8.5.3 Random Forests 231 10.4 Disease detection using artificial intelligence 271
8.5.4 Bagging 232 10.4.1 Feature extraction using deep
8.5.5 AdaBoost 232 learning 271
8.5.6 XGBoost 232 10.4.2 Dimension reduction 272
8.6 Results and discussions 233 10.4.3 Prediction and classification 272
8.6.1 Performance evaluation measures 233 10.4.4 Experimental data 274
8.6.2 Experimental results 234 10.4.5 Performance evaluation measures 274
8.6.3 Discussion 237 10.4.6 Experimental results 274
8.7 Conclusion 238 10.5 Discussion 280
References 238 10.6 Conclusion 280
References 280
9. Detection and classification of Diabetic 11. Brain hemorrhage detection using
Retinopathy Lesions using deep computed tomography images and deep
learning 241 learning 283
SIDDHESH SHELKE AND ABDULHAMIT SUBASI ABDULLAH ELEN, AYKUT DIKER AND
ABDULHAMIT SUBASI
9.1 Introduction 241
9.2 Literature survey on diabetic retinopathy 11.1 Introduction 283
detection 244 11.2 Literature survey in brain hemorrhage
9.2.1 Traditional diabetic retinopathy detection detection 285
approach 244 11.3 Deep learning methods 286
9.2.2 Binary and multilevel classification 245 11.3.1 ResNet-18 286
9.2.3 Datasets 246 11.3.2 EfficientNet-B0 287
9.3 Deep learning methods for diabetic retinopathy 11.3.3 VGG-16 288
detection 247 11.3.4 DarkNet-19 288
9.3.1 Deep neural networks 247 11.4 Experimental results 289
9.3.2 Convolutional neural networks 249 11.4.1 Dataset 290
9.3.3 Transfer learning 250 11.5 Discussions 299
9.4 Diabetic retinopathy detection using deep 11.6 Conclusion 300
learning 251 References 300
9.4.1 Prediction and classification 252
9.4.2 Performance evaluation metrics 253 12. Artificial intelligence-based retinal
9.4.3 Experimental results 254 disease classification using optical
9.5 Discussion 261 coherence tomography images 305
9.6 Conclusion 262 SOHAN PATNAIK AND ABDULHAMIT SUBASI
References 262
Further reading 264 12.1 Introduction 305
x Contents

12.2 Related work 306 13.4 Conclusion 332


12.3 Dataset 307 References 332
12.4 Implementation details 307
12.4.1 Convolutional neural network-based 14. Artificial intelligence based
classification 307 Alzheimer’s disease detection using deep
12.4.2 Transfer learning-based classification 309
feature extraction 333
12.4.3 Deep feature extraction and machine
MANAV NITIN KAPADNIS, ABHIJIT BHATTACHARYYA
learning 312
AND ABDULHAMIT SUBASI
12.5 Results and discussions 312
12.6 Discussion 318 14.1 Introduction 333
12.7 Conclusion 318 14.2 Background/literature review 335
References 319 14.3 Artificial intelligence models 337
14.3.1 Deep feature extraction techniques 337
13. Diagnosis of breast cancer from 14.3.2 Classification techniques 339
histopathological images with deep 14.4 Alzheimer’s disease detection using artificial
learning architectures 321 intelligence 344
EMRAH HANCER AND ABDULHAMIT SUBASI
14.4.1 Experimental data 344
14.4.2 Performance evaluation measures 345
13.1 Introduction 321 14.4.3 Experimental results 348
13.2 Materials and methods 323 14.5 Discussion 350
13.2.1 Dataset 323 14.6 Conclusion 352
13.2.2 Methods 324 References 352
13.3 Results and discussions 329
13.3.1 Experimental setup 329 Index 357
13.3.2 Experimental results 331
List of contributors

Ulas Bagci Abhranta Panigrahi


Northwestern University, Chicago, IL, United Department of Biotechnology and Medical
States Engineering, National Institute of Technology
Abhijit Bhattacharyya Rourkela, Rourkela, Odisha, India
Department of Electronics and Communication Sohan Patnaik
Engineering, National Institute of Technology Department of Mechanical Engineering, Indian
Hamirpur, Hamirpur, Himachal Pradesh, India Institute of Technology, Kharagpur, West Bengal,
Aykut Diker India
Department of Software Engineering, Faculty of Saqib Ahmed Qureshi
Engineering and Natural Sciences, Bandirma Indian Institute of Technology Kharagpur,
Onyedi Eylul University, Bandirma, Balikesir, Kharagpur, West Bengal, India
Turkey Kolla Anant Raj
Abdullah Elen Department of Electrical Engineering, Indian
Department of Software Engineering, Faculty of Institute of Technology Kharagpur, Kharagpur,
Engineering and Natural Sciences, Bandirma Onyedi West Bengal, India
Eylul University, Bandirma, Balikesir, Turkey Aayush Rajput
Emrah Hancer Indian Institute of Technology, Kharagpur,
Department of Software Engineering, Mehmet West Bengal, India
Akif Ersoy University, Burdur, Turkey Siddhesh Shelke
Aayush Dinesh Kandpal Indian Institute of Technology, Indore,
Department of Metallurgical and Materials Madhya Pradesh, India
Engineering, National Institute of Technology Abdulhamit Subasi
Rourkela, Rourkela, Odisha, India
Institute of Biomedicine, Faculty of Medicine,
Manav Nitin Kapadnis University of Turku, Turku, Finland; Department of
Department of Electrical Engineering, Indian Computer Science, College of Engineering, Effat
Institute of Technology Kharagpur, Kharagpur, University, Jeddah, Saudi Arabia
West Bengal, India Ashutosh Varshney
Omkar Modi Department of Computer Science and
Indian Institute of Technology, Kharagpur, Engineering, Indian Institute of Technology,
West Bengal, India Kharagpur, West Bengal, India

xi
This page intentionally left blank
Series preface

Artificial intelligence (AI) is a concept, information which entirely altered the


which allows for the optimization of perfor- approach in different areas. AI has reached cer-
mance criteria utilizing a set of data and some tain maturity as an academic subject, and there
experience. The learning process is essentially are many useful books related to this subject.
the execution of the model parameter optimi- Since AI is an interdisciplinary subject, it
zation with a training dataset or past experi- should be implemented in different ways
ence. Models can be predictive, for predicting depending on the application field. Nowadays,
the future, descriptive, for extracting knowl- there is a great interest in AI applications in
edge from input data, or both. Two fundamen- several disciplines. This book series presents
tal activities are accomplished in machine how AI and machine learning methods can be
learning: (1) processing massive amounts of used in the analysis of different data. Different
data and optimizing the model and (2) testing AI applications in different fields, including
the model and efficiently displaying the biomedical engineering, electrical engineering,
solution. computer science, information technology,
The process of applying AI methods to a medical science, healthcare, finance, and econ-
large dataset is called data mining. The logic omy are the applications of engineering techni-
behind the data mining is that a large volume ques to problems in these fields.
of raw data is processed, and efficient predic- This book series will consist of numerous
tion model is constructed, with high predictive volumes, each of which will cover an applica-
accuracy. AI applications are present in differ- tion of AI techniques in a different field. This
ent areas: in finance, for credit scoring, fraud series will benefit a wide range of readers,
detection, or stock market prediction; in including academicians, professionals, gradu-
manufacturing, for optimization, control, and ate students, and researchers from a variety of
troubleshooting; in medicine, for efficient med- fields who are exploring AI applications. The
ical diagnosis; in telecommunication, for net- book series will provide an in-depth account of
work and quality of service optimization. recent research in this emerging topic, and the
Furthermore, AI enables algorithms to learn principles discussed here will spark additional
and adapt to changes in a variety of situations. research in this multidisciplinary field.
Artificial neural networks, fuzzy logic, support The target audience is widespread since this
vector machines, decision tree algorithms, and series will include several areas of applications
deep learning algorithms are all examples of of AI, machine learning, and deep learning
AI techniques. which are given below. Hence, the audience
AI tools have been employed in different can be computer scientists, biomedical engi-
areas for several years. New AI methods such neers and healthcare scientists, financial engi-
as deep learning have assisted uncover neers, economists, researchers and consultants

xiii
xiv Series preface

in electrical, computer engineering and science, many new AI trends in the application of sev-
finance, economy, and security. Nowadays AI eral fields. Among them the most important
is one of the hot topics, which is used in sev- ones are healthcare, cyber security, and
eral data analysis in the world. Hence, there finance.
are many applications areas and audience of
this book series. This book series will include Abdulhamit Subasi
Preface

Artificial intelligence (AI) plays an impor- as deep learning have assisted to uncover
tant role in the field of medical image analysis, information which entirely altered the
including computer-aided diagnosis, image- approach in different areas. AI has reached cer-
guided therapy, image registration, image seg- tain maturity as an academic subject, and there
mentation, image annotation, image fusion, are many useful books related to this subject.
and retrieval of image databases. With Since AI is an interdisciplinary subject, it must
advances in medical imaging, new imaging be implemented in different ways depending
methods and techniques are needed in the field on the application field. Nowadays, there is a
of medical imaging, such as cone-beam/multi- great interest in AI applications in several dis-
slice CT, MRI, positron emission tomography ciplines. This edited book presents how AI and
(PET)/CT, 3D ultrasound imaging, diffuse ML methods can be used in the medical image
optical tomography, and electrical impedance analysis. Different AI applications in different
tomography, as well as new AI algorithms/ fields, including biomedical engineering, elec-
applications. To provide adequate results, trical engineering, computer science, informa-
single-sample evidence given by the patient’s tion technology, medical science, and
imaging data is often not appropriate. It is usu- healthcare, are the applications of AI to pro-
ally difficult to derive analytical solutions or blems in these fields.
simple equations to describe objects such as This book provides the description of vari-
lesions and anatomy in medical images, due to ous biomedical image analyses in several dis-
wide variations and complexity. Tasks in medi- ease detection using AI and can therefore be
cal image analysis therefore require learning used to incorporate knowledge obtained from
from examples for correct image recognition different medical imaging devices such as CT,
(IR) and prior knowledge. This book offers X-ray, PET, and ultrasound. In this way, a
advanced or up-to-date medical image analysis more integrated and, thus, more holistic
methods through the use of algorithms/techni- research on biomedical image analysis may
ques for AI, machine learning (ML), and IR. A contribute significantly to the successful
picture or image is worth a thousand words, enhancement of a single patient’s clinical
indicating that, for example, IR may play a crit- knowledge.
ical role in medical imaging and diagnostics. This book includes several medical image
Data/information can be learned through AI, analysis techniques using AI approaches
IR, and ML in the form of an image, that is, a including deep learning techniques. Deep
collection of pixels, as it is impossible to recruit learning algorithms such as convolutional neu-
experts for big data. ral networks and transfer learning techniques
AI tools have been employed in different are widely used in medical imaging. Medical
areas for several years. New AI methods such image analysis using AI is widely employed in

xv
xvi Preface

the area of medical image classification, seg- methodologies, such as supervised and unsu-
mentation, and detection. The applications of pervised learning. Hence, the key AI algo-
AI in medical imaging are widely used as a rithms are discussed briefly in this chapter.
decision support system for physicians. AI can Relevant PYTHON programming codes and
be used in the diagnosis of different types of routines are provided in each section.
cancers including cervical cancer, ovarian can- Chapter 2 provides lung cancer detection from
cer, breast cancer, prostate cancer, lung cancer, histopathological lung tissue images using AI.
and liver cancer. Chapter 3 provides MRI-based automated
The author of this book has a lot of hands- brain tumor detection by means of AI.
on experience using Python and MATLABs to Chapter 4 presents breast cancer detection
solve real-world problems in the context of the from mammograms using AI. Chapter 5
ML ecosystem. Applications of Artificial includes AI-based breast tumor detection using
Intelligence in Medical Imaging aims to provide ultrasound images. Chapter 6 includes AI-
the readers of various skill levels with the based skin cancer diagnosis. Chapter 7 pre-
knowledge and experience necessary to sents brain stroke detection from CT images
develop useful AI solutions. Additionally, this using deep learning algorithms. Chapter 8 pro-
book serves as a solution manual for creating vides a deep learning approach for COVID-19
sophisticated real-world systems. This pro- detection from CT scans. Chapter 9 includes
vides a structured framework with guidelines, detection and classification of diabetic retinop-
instructions, real-world examples, and code. athy lesions using deep learning. Chapter 10
Additionally, this book benefits from the cru- presents automated detection of colon cancer
cial knowledge that its readers require to com- using histopathological images. Chapter 11
prehend and resolve a variety of ML includes brain hemorrhage detection in CT
difficulties. images utilizing deep learning. Chapter 12 pre-
The book covers different subjects, involving sents AI-based retinal disease classification
cancer diagnosis, including lung cancer, pros- using OCT images. Chapter 13 presents diag-
tate cancer, breast cancer, and skin cancer; nosis of breast cancer from histopathological
COVID-19 detection; histopathological image images with deep learning architectures.
classification; classification of diabetic retinopa- Chapter 14 includes AI-based Alzheimer dis-
thy lesions using CT, MRI, X-ray, and ultra- ease detection using MRI images.
sound; and pathological medical imaging. This
book consists of 14 chapters. Chapter 1 pre- Abdulhamit Subasi
sents topics relevant to the numerous AI
Acknowledgments

First of all, I would like to thank my pub- provided excellent support and did a lot of
lisher Elsevier and its team of dedicated pro- work for this book. Additionally, I would like
fessionals who have made this book writing to thank to Fahmida Sultana for being patient
journey very simple and effortless and many in getting everything necessary completed for
who have worked in the background to make this book.
this book a success. Abdulhamit Subasi
I would like to thank Rafael Teixeira, Linda
Versteeg-Buschman, and Pat Gonzalez, who

xvii
This page intentionally left blank
C H A P T E R

1
Introduction to artificial intelligence
techniques for medical image analysis
Abdulhamit Subasi1,2
1
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 2Department of
Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

1.1 Introduction 1 1.4.7 XGBoost 12


1.4.8 Artificial neural networks 13
1.2 Artificial intelligence for image
1.4.9 Deep learning 14
classification 3
1.4.10 The overfitting problem in neural
1.3 Unsupervised learning (clustering) 5 network training 15
1.3.1 Image segmentation with clustering 6 1.4.11 Convolutional neural networks 16
1.4.12 Recurrent neural networks 23
1.4 Supervised learning 7
1.4.13 Long short-term memory 24
1.4.1 K-nearest neighbor 7
1.4.14 Data augmentation 24
1.4.2 Decision tree 9
1.4.15 Generative adversarial networks 25
1.4.3 Random forest 9
1.4.16 Transfer learning 31
1.4.4 Bagging 10
1.4.5 Boosting 11 References 48
1.4.6 AdaBoost 11

1.1 Introduction is to make inferences from a training sample. In


some circumstances, the training algorithm’s
Artificial intelligence (AI) model is defined efficacy is just as important as its classification
by a set of parameters that are optimized using accuracy. AI techniques are employed as a deci-
training data or previous experience to generate sion assistance system in a variety of fields [1,2].
a computer program. AI generates models Learning is a multidisciplinary phenomenon
using statistical analysis since the main purpose that includes parts of statistics, mathematics,

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00010-4 1 © 2023 Elsevier Inc. All rights reserved.
2 1. Introduction to artificial intelligence techniques for medical image analysis

computer science, physics, economics, and bio- be contaminated by environmental noises


medicine. Surprisingly, not all human tasks are [5,6]. Medical images are impacted by numer-
linked to intelligence, therefore there are cer- ous noise signals, such as Gaussian noise [7],
tain situations when a computer can do better. speckle, and so on, and the image quality
There are some intelligent activities which might be poor as a result. Therefore radiolo-
humans are incapable of performing, and gists may make incorrect interpretations. As a
which robots can perform better than humans. result, before studying these images, noise sig-
Classical machine learning (ML) techniques in nals must be suppressed. Denoising, which is
complicated systems cannot provide the essen- a preprocessing activity in the discipline of
tial intelligent response since important activi- image processing, is performed not only to
ties and decision-making make it vital to reduce noise signals but also to retain image
comprehend the model response and compo- relevant information such as edges, texture
nents for effective decision-making. Every details, fine structures, and so on [8].
behavior, activity, or decision has a systemic Numerous approaches for image denoising
understanding. In contrast, one activity might have been developed, but none of them pro-
be the outcome of another event or set of duce efficient results for various sorts of noise
events from a systematic standpoint. Those are issues. As a result, a framework must be
convoluted and difficult to grasp connections. designed to eliminate noise signals while pre-
As system models and robots are expected to serving image data [9,10].
perform intelligently even in nonpredictive Medical image processing is a powerful
scenarios, learning ideas and models must be tool for disease prognosis and diagnosis.
seen through the lens of new expectations. Nevertheless, as the volume of digital data
These expectations necessitate ongoing learn- increases, so does the requirement for accu-
ing from many sources of knowledge. The racy and efficacy in medical image proces-
analysis and customization of data for these sing procedures. Researchers can construct a
techniques, as well as their effective use, are computer-aided diagnostic (CAD) system for
crucial [2,3]. disease categorization using medical imaging
Medical imaging is concerned with many and current advances in AI. Physicians can
sorts of images utilized in medical applica- now view hidden features in medical photos
tions. Medical images include X-ray images, thanks to the advancement of the most mod-
magnetic resonance imaging (MRI), computed ern imaging technology. As a result, com-
tomography (CT) images, ultrasound (US) puter assistance is not only essential but also
images, and others that radiologists utilize in indispensable in the physician’s diagnosis
the diagnostic process to detect and analyze procedure. Automated approaches can be
anomalies in the human body [4]. The image used to help clinicians in the early detection
data may be deteriorated owing to a variety of of diseases, reducing the need on invasive
circumstances including natural events. procedures for diagnosis [11,12]. However,
Various devices will be utilized to capture the there are number of drawbacks for using
images. These equipment are not always flaw- MRI and CT. CT scans are ideal for bone frac-
less, and they might degrade the quality of tures, chest disorders, and the identification
images captured. Some issues arise through- of abdominal malignancies. On the other
out the acquisition process that might help hand, MRI is appropriate for assessing brain
reduce image quality. Medical imaging tech- tumors and soft tissues. A CT scan takes
nologies are more precise and give high- around 5 minutes, whereas an MRI can take
quality medical images, but they might also up to 30 minutes. MRI does not utilize

Applications of Artificial Intelligence in Medical Imaging


1.2 Artificial intelligence for image classification 3
ionized radiation, but there is a potential risk data. Furthermore, because the gray-level
of radiation exposure with CT. The MRI fre- intensities in nature overlap, it is impossible
quently creates claustrophobia, but CT scan to create the CAD utilizing texture informa-
does not. Furthermore, CT scans are less tion [1419]. The researchers’ main chal-
expensive than MRIs [13]. lenge is creating an effective, accurate, and
The use of recent developments in health- robust CAD framework for the classification
care technologies has considerably enhanced of tissues since representations of tissues are
human health. Various medical imaging overlapping in nature. Furthermore, the
technologies, such as computer-assisted accuracy is insufficient for commercial usage
image processing, help in the early detection of these diagnostic devices. As a result, there
of disorder. The rising volume of medical is an urgent need to develop a diagnostic
imaging data, such as CT, MRI, and US, framework capable of properly and rapidly
places a significant diagnostic strain on radi- characterizing tissues such as tumors, cysts,
ologists. In this context, automated diagnos- stones, and normal tissues [13]. The building
tic systems will improve diagnostic accuracy blocks of CAD framework are shown in
while also lowering costs and increasing Fig. 1.1.
efficiency. Recently, digital imaging data has
grown exponentially, exceeding the number
of radiologists’ availability. This increase in
workload has a direct influence on the per- 1.2 Artificial intelligence for image
formance of radiologists. As a result, human classification
analysis is out of sync with the volume of
data to be processed. As a result, it is needed The classification of distinct objects in
to create a computer-assisted image classifi- images is referred to as image classification.
cation and analysis structure to help radiolo- The various items or areas of image must be
gists deal with such massive amounts of recognized and classified. The accuracy of the

Normal
Abnormal

FIGURE 1.1 The building blocks of CAD framework. CAD, Computer-aided diagnostic.

Applications of Artificial Intelligence in Medical Imaging


4 1. Introduction to artificial intelligence techniques for medical image analysis

outcome is determined by the classification supervised learning, the algorithm is trained


algorithm. It is frequently based on a single with a set of training data for a specific objec-
image or a collection of images. When image tive. Semisupervised learning falls somewhere
sets are employed, the set will comprise many in the middle of these two groups. The training
images of the same object from various angles inputs and intended outputs are provided in
and under various situations. When compared supervised learning, and the algorithm learns
to classifying with single images, it will be the relationship between input and output. The
more successful since the algorithm can adapt mapping between input and output is already
variable situations such as variations in back- established. The inputs are provided in unsu-
drop, lighting, or appearances. It is also insen- pervised learning, and the algorithm learns to
sitive to image rotation and other uncover patterns or characteristics to create the
transformations. The algorithm is fed image output. The method does not need to know the
pixels, which has numeric characteristic as number of outputs ahead of time. The training
input. The result is a single value or a series of inputs and intended outputs are only partially
values representing the class. The algorithm is provided in semisupervised learning, and the
a mapping function, which converts pixel data algorithm learns to uncover the missing pat-
into the proper class. The classification process terns and relations [20].
might be either unsupervised or supervised. A Image classification is one of the most diffi-
set of training data containing class informa- cult problems for an algorithm to learn and
tion is provided in supervised classification, complete. When numerous images are pro-
and the number of classes is known. It is simi- vided, the human brain learns and classifies
lar to learning from a trainer. On the other both existing and new images with near-
hand, the number of classes is unknown in perfect accuracy. AI algorithms are created to
unsupervised classification, and no training precisely imitate the activity of the human
data are supplied. It is necessary to learn the brain. When the images are taken in various
link (or mapping) between the data to be cate- circumstances, such as changing the lighting,
gorized and the distinct classifications. It is rotating, or translating the items in the image,
similar to learning without a trainer. having hidden or incomplete objects, the task
Unsupervised and supervised approaches can becomes more complex. Such circumstances
be coupled to generate semisupervised meth- result in hundreds of distinct images having
ods if some information about the mapping of the same item, further complicating the classifi-
data to classes is known. The most essential cation/recognition task. Image categorization
factors connected with input data that are used can be done pixel-by-pixel or object-by-object.
to classify the data are known as features. The properties of each pixel are retrieved in
Defining specific qualities of an item called fea- pixel-based classification to designate it as
tures is critical in classification. For classifica- belonging to a certain class. In order to extract
tion, features extracted from visual objects are areas or objects in an image and assess their
employed [20]. properties, segmentation is used in object-
ML is a branch of AI, which allows com- based classification. In order to do classifica-
puter systems to learn from input/training tion, features or properties must be retrieved.
data. There are three types of learning: unsu- The algorithm’s efficiency is determined by the
pervised, supervised, and semisupervised. amount of features employed in the process.
Unsupervised learning occurs when learning This raises the issue of the “curse of
occurs with unknown input data. In dimensionality.” Dimension reduction is

Applications of Artificial Intelligence in Medical Imaging


1.3 Unsupervised learning (clustering) 5
required that equates to feature reduction and dissimilar ones are divided into distinct classes.
thus lower the computational complexity. This term is certainly vague and maybe ambigu-
More processing and data storage are required ous. Furthermore, it is difficult to find a more
as the number of characteristics increases. This precise description. There are various causes for
raises the algorithm’s time complexity. More this problem. One major issue is that the two
efficient algorithms identify things with the aims expressed in the abovementioned sentence
fewest characteristics and in the least amount frequently contradict one another. Closeness (or
of time [20]. similarity) is not a transitive connection, but clus-
Since the AI algorithms have the potential to ter involvement is an equivalence relation, and
learn, they are being employed for image clas- specifically a transitive one. This input is clus-
sification. A collection of training data is pro- tered by splitting it horizontally on both lines by
vided, and the network must be fed a clustering method that emphasizes not separat-
correlations between training inputs and ing close-by points [2,21].
desired outputs. The network is trained using Another basic issue with clustering is a lack
known data to detect and classify new input. of “ground truth,” which is a typical issue
AI algorithms were created to facilitate learn- with unsupervised learning. So far, the book
ing, with the logic being learned by the algo- has mostly dealt with supervised learning.
rithm during training. Inputs include patterns The goal of supervised learning is straightfor-
or data, outputs are defined, and the algorithm ward: we would like to train a classifier to
learns to determine the relation between inputs predict the labels of future samples as accu-
and outputs. When the issue is difficult, such rately as possible. A supervised learner can
as image classification, additional hidden also quantify the accomplishment or possibil-
layers are needed, causing the neural network ity of hypotheses by calculating the empirical
to become “deep.” Hundreds of hidden layers loss using labeled training data. On the other
enhance classification accuracy, and the learn- hand, clustering is an unsupervised learning
ing becomes “deep learning” [20]. issue in which no labels are predicted. Rather,
we would like to find a realistic approach to
arrange the data. As a result, there is no sim-
1.3 Unsupervised learning (clustering) ple approach for assessing clustering perfor-
mance. Furthermore, even with complete
Clustering is one of the most frequently uti- understanding of the underlying data distri-
lized approaches for analyzing experimental bution, it is unclear what the “right” cluster-
data. People strive to achieve an initial under- ing is for that data or how to evaluate a
standing of their results in various areas, from proposed clustering [2,21].
social sciences to computer science to biology, by Clustering is a method that groups together
creating expressive groups among the data similar things. We might utilize one of differ-
points. Companies, for instance, cluster clients ent kinds of inputs. The input to the algorithm
based on their customer profiles for focused mar- in similarity-based clustering is a dissimilarity
keting, astronomers cluster stars based on their matrix or distance matrix D. Similarity-based
unique proximity, and bioinformation cluster clustering provides the benefit of easily incor-
genes based on similarities in their expression. porating domain-specific similarity or kernel
Clustering is intuitively defined as the act of functions. The advantage of feature-based
grouping a collection of objects so that similar clustering is that it may use “raw” data that is
objects are grouped in the same class and hypothetically noisy. Aside from the two

Applications of Artificial Intelligence in Medical Imaging


6 1. Introduction to artificial intelligence techniques for medical image analysis

input kinds, there exist two possible output such as removing cancerous tissues from body
types: hierarchical clustering, in which a scans, are an important aspect of medical
nested partition tree is produced, and flat diagnostics. One of the initial steps in image
clustering, also known as partition clustering, recognition is to segment them and discover
in which the objects are divided into disjoint distinct things inside them. This may be
sets. Some methods state that D is a true dis- accomplished through the use of features such
tance matrix, whereas others do not. If we as frequency-domain transformations and his-
have a similarity matrix S, we may transform togram plots [2,24].
it to a dissimilarity matrix by using any mono- Image segmentation is a critical preproces-
tonically decreasing function. The most fre- sing step in computer vision and image recog-
quent technique to describe item dissimilarity nition. Image segmentation, which is the
is through the dissimilarity of their properties. breakdown of an image into a number of non-
Some typical attribute dissimilarity functions overlapping relevant sections with the same
include the hamming distance, city block dis- qualities, is a critical method in digital image
tance, square (Euclidean) distance, and corre- processing, and segmentation accuracy has a
lation coefficient [2,22]. direct impact on the efficacy of subsequent
Clustering is one of the simple techniques activities [25]. Because image segmentation is
employed by humans to accommodate the critical in many image processing applica-
massive quantity of information they get every tions, various image segmentation algorithms
day. Handling each piece of information as a were built during the last few decades.
separate object would be tough. As a result, However, these methods are always being
humans appear to group things into clusters. sought since image segmentation is a difficult
Each cluster then characterizes the precise issue, which necessitates a better solution for
qualities of the entities that form it. As with the successive image processing stages.
supervised learning, it is assumed that all pat- Although the clustering approach was not
terns are described in terms of features that designed specifically for image processing, it
constitute one-dimensional feature vectors. In a is utilized for image segmentation by the com-
number of circumstances, a stage known as the puter vision community. The k-means cluster-
clustering inclination should be present. It cov- ing method, for example, requires previous
ers a few tests that determine if there is a clus- information of the number of clusters (k) to be
tering pattern in the provided data or not. For categorized into. Every pixel in the picture is
example, if the dataset is completely random, iteratively and repeatedly assigned to the
attempting to untangle clusters is futile. cluster, the centroid of which is closest to the
Different feature options and proximity mea- pixel. The centroid of each cluster is identified
surements are available. Clustering criteria and based on the pixels assigned to that cluster.
clustering methods may provide wildly differ- Both the selection of pixel membership in the
ing clustering results [2,23]. clusters and the computation of the centroids
are based on distance calculations. Because it
is straightforward to compute, the Euclidean
1.3.1 Image segmentation with clustering distance is the mostly utilized. The utilization
Images are widely recognized as one of the of Euclidean distance produces error in the
most important approaches of delivering final image segmentation [2,26]. A simple
information. An example would be the usage k-means clustering Python code for image
of images for robotic navigation. Other uses, segmentation is given below.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 7
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from IPython.display import Image
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import KMeans
import cv2

img = cv2.imread(filepath)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
r, g, b = cv2.split(img)
r = r.flatten()
g = g.flatten()
b = b.flatten()

K=3
attempts=10
ret,label,center=cv2.kmeans(vectorized,K,None,criteria,attempts,cv2.KMEANS_RANDOM_CENTERS)
label = label.flatten()

center = np.uint8(center)
res = center[label.flatten()]
result_image = res.reshape((img.shape))

plt.imshow(result_image)
plt.show()

1.4 Supervised learning


neighbors” of the unknown instance. A dis-
1.4.1 K-nearest neighbor tance metric, such as Euclidean distance, is
employed to define “closeness.” We usually
When given big training sets, the k-nearest
normalize the values of each variable to pre-
neighbor (k-NN) approach is labor demanding
and did not obtain popularity until more com- vent attributes with high start ranges from out-
puting power became available. It has now weighing attributes with smaller initial ranges.
become extensively utilized in pattern recogni- The unknown instance is allocated to the most
tion. k-NN classifiers are based on analogy frequent class among its k-NNs in k-NN classi-
learning, which involves comparing a particu- fication. When k 5 1, the unknown instance is
lar test instance to training instances, which are allocated to the class of the training instance in
similar to that. A k-NN classifier explores the pattern space that is closest to it. Moreover k-
pattern space for the k training instances, NN classifiers can be utilized for numeric pre-
which are closest to the unknown instance. diction, which is when a given unknown
These k training instances are k “nearest instance is given a real-valued prediction.

Applications of Artificial Intelligence in Medical Imaging


8 1. Introduction to artificial intelligence techniques for medical image analysis

In this scenario, the classifier provides the 1.4.1.1 Support vector machine
average of the real-valued labels related to the Support vector machine (SVM) is a classifica-
unknown instance’s k-NNs. A test set is tion technique for both linear and nonlinear
employed to assess the error rate of the classi- data. SVM transforms the original training data
fier starting with k 5 1. This method is into a higher dimension via a nonlinear map-
employed as many times as needed by increas- ping. It seeks the linear optimum separation
ing k to achieve one additional neighbor. The k hyperplane inside this new dimension. The
value, which generates the lowest error rate, is SVM employs support vectors and margins to
selected. Distance-based assessments are uti- get hyperplane. Despite the fact that even the
lized by k-NN classifiers, which assign equal quickest SVMs have a long training period, they
weight to each attribute. As a result, when are remarkably precise due to their ability to
given noisy or irrelevant qualities, they may predict complicated nonlinear decision limits.
suffer from low accuracy. However, the They have a lower risk of overfitting than other
approach has been tweaked to include attribute approaches. The discovered support vectors also
weighting as well as the pruning of noisy data serve as a concise representation of the learnt
instances. The distance measure you choose model. SVMs may be used for both classification
can have a big impact. Other distance metrics, and numerical prediction. Medical imaging,
such as the Manhattan (city block) distance, object recognition, handwritten digit recognition,
may also be used. When classifying test and speaker identification are just a few of the
instances, nearest-neighbor classifiers can be domains where they have been used [27].
incredibly slow. The use of partial distance Separating lines can be drawn in an infinite
computations and modifying the stored number of ways. We want to find the “best” one,
instances are two further methods for reducing which will (hopefully) have the least amount of
classification time. The distance is computed classification error on previously unseen dataset.
using a subset of the n characteristics in the How are we going to find the best line? We
partial distance technique. If the distance sur- want to find the optimal hyperplane when we
passes a certain threshold, the process will stop generalize to n dimensions. Regardless of the
working on the current stored instance and go amount of input features, we’ll refer to the deci-
on to the next one. The editing approach gets sion boundary we’re looking for as a “hyper-
rid of any training instances that are not useful. plane.” To put it another way, how can we
Because it minimizes the overall amount of determine the optimum hyperplane? An SVM
instances saved, this strategy is also known as solves this problem by investigating for the max-
pruning [27]. A simple Python code for k-NN imum marginal hyperplane. Nevertheless, we
is given below. anticipate the hyperplane with the bigger margin

#Import k-NN Model


from sklearn.neighbors import KNeighborsClassifier
#Create the Model
clf = KNeighborsClassifier(n_neighbors = 5)
#Train the model with Training Dataset
clf.fit(Xtrain,ytrain)
#Test the model with Testset
ypred = clf.predict(Xtest)

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 9
to be more accurate than the hyperplane with required for a tree node test, it is forwarded to
the lower margin in classifying future data another test, which uses a different characteristic
instances. The related margin creates the most to build a split that is as close to the original as
significant distinction between categories [27]. A feasible. For data with missing values, many sur-
simple Python code for SVM is given below. rogate splits can be discovered and utilized in
the correct sequence [29]. A simple Python code
#Import SVM Model
for decision tree is given below.
from sklearn import svm
C = 10.0 # SVM regularization parameter #Import Decision Tree Model
#Create the Model from sklearn import tree
clf =svm.SVC(kernel = 'linear', C = C) #Create the Model
#Train the model with Training Dataset clf = tree.DecisionTreeClassifier()
clf.fit(Xtrain,ytrain) #Train the Model with Training dataset
#Test the model with Testset clf.fit(Xtrain,ytrain)
ypred = clf.predict(Xtest) #Test the Model with Testing dataset
ypred = clf.predict(Xtest)

1.4.2 Decision tree


1.4.3 Random forest
Decision tree is a nonparametric technique
and generates binary trees from data with both Random forest (RF) is a collection of classifiers
discrete and continuous attributes. The impurity in which each classifier in the ensemble is a deci-
reduction criteria are used to assess split charac- sion tree classifier. To calculate the split, the indi-
teristics, with impurity specified as the so-called vidual decision trees are built employing a
Gini (diversity) index. It is also feasible to utilize random selection of characteristics at each node.
entropy or any other measure of impurity In more technical terms, each tree is determined
instead of the Gini index. Breiman et al. [28] also by the values of a random vector sampled sepa-
suggested twoing, a strategy for dealing with rately and with the same distribution across all
multiclass issues using two-class criteria. Instead trees in the forest and each tree votes during clas-
of analyzing the original classes, twoing entails sification. Bagging in conjunction with random
dividing the classes into two superclasses and attribute selection may be used to construct RFs.
doing two-class analysis on the groupings. A training set of d tuples, D, is provided. The fol-
Naturally, if one tries to examine all potential lowing is the general process for producing k
groups when the number of classes is big, the decision trees for the ensemble. A training set, Di,
number of alternative groupings might create of d tuples is sampled with replacement from D
combinatorial explosion. Breiman et al. [28] sug- for every iteration, i (i 5 1, 2,..., k). Because each
gested an efficient approach for determining Di is a bootstrap sample of D, certain tuples may
ideal superclasses for each potential split instead appear several times in Di while others may be
of examining all splits for all possible class eliminated. Let F be the number of features that
groups. The approach works for impurity criteria will be utilized to calculate the split at each node,
with two classes (compatible with Gini index). where F is substantially less than the number of
There exists the option of taking misclassification accessible features. To build a decision tree classi-
costs into account while making judgments. fier, Mi randomly selects F characteristics as can-
Surrogate splits are used to deal with missing didates to split each node. The trees are allowed
data values. When a data item lacks a value to develop to their full potential without being

Applications of Artificial Intelligence in Medical Imaging


10 1. Introduction to artificial intelligence techniques for medical image analysis

trimmed. Forest-RI is a name given to RFs gener- 1.4.4 Bagging


ated in this manner using random input selec-
tion. Forest-RC, another type of RF, employs Combination of the decisions of many models
random linear combinations of the input charac- entails combining the numerous outputs into a sin-
teristics. Rather than picking a subset of the char- gle forecast. In the case of classification, the sim-
acteristics at random, it generates new features plest method is to have a vote; in the case of
that are a linear combination of the current fea- numerical prediction, the simplest method is to
tures. To put it another way, a feature is created compute the average. Assume that three training
by providing L, the number of original features datasets of the same size are chosen at random
to be merged. At each node, L characteristics are from the problem area to introduce bagging by
chosen at random and appended utilizing coeffi- employing a specific ML approach to create a deci-
cients that are uniform random integers on [1,1]. sion tree for every dataset. You may anticipate
F linear combinations are created, and the opti- these trees to be almost identical and to predict the
mal split is found by searching through them. same thing for each new test case. Remarkably,
When there are only a few characteristics avail- this assumption is frequently incorrect, especially
able, this type of RF is beneficial for reducing the when the training datasets are small. This is a
correlation between individual classifiers. RFs really alarming truth that appears to put a shadow
have the same accuracy as AdaBoost but are on the entire enterprise! Since decision tree induc-
more resistant to errors and outliers. As the num- tion is an unstable method, small changes in the
ber of trees in a forest grows vast, the generaliza- training data can simply achieve a new feature
tion error converges. As a result, overfitting is not being selected at a certain node, with substantial
an issue. The accuracy of an RF is defined by the consequences for the subtree below that node.
strength of the individual classifiers as well as a Consider the experts to be decision trees in their
measure of their interdependence. The ideal situ- own right. By allowing them to vote on each test
ation is to keep individual classifiers strong while case, we can merge the trees. If one class obtains
decreasing their correlation. The number of char- more votes than the others, that class is assumed
acteristics chosen for consideration at each split to be accurate. The more the better, in general: As
has no effect on RFs. Typically, values up to more votes are counted, the predictions generated
log2d 1 1 are used. RFs are efficient on very big by voting grow more reliable [30].
databases because they examine many fewer fea- By recreating the process outlined earlier
tures for each split. They have the potential to be with a specific training set, bagging seeks to off-
faster than either bagging or boosting. Internal set the volatility of learning methods. Rather
estimations of variable significance are provided than selecting a new, independent training
by RFs [27]. A simple Python code for RF is given dataset every time, the old training data is
below. modified by removing certain occurrences and

#Import Random Forest Ensemble Model


from sklearn.ensemble import RandomForestClassifier
#Create the Model
clf = RandomForestClassifier(n_estimators = 200)
#Train the model with Training set
clf.fit(Xtrain,ytrain)
#Test the model with Test set
ypred = clf.predict(Xtest)

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 11
reproducing others. Examples from the original of prior models influences the performance of
dataset are randomly picked and replaced to subsequent models. By providing more weight
generate a new dataset of the same size. This to examples handled improperly by previous
sampling approach invariably duplicates some models, boosting promotes subsequent models
instances while eliminating others. When pre- to become experts for those situations. Finally,
senting the bootstrap technique for assessing the rather than assigning equal weight to all mod-
generalization error of a learning system, this els, boosting weighs the contribution of a
concept may strike a chord; fact, the phrase bag- model by its confidence [31].
ging stands for bootstrap aggregating. Bagging
uses a learning method to each of these artifi-
cially produced datasets and the classifiers that 1.4.6 AdaBoost
result vote for the class to be predicted. The way The technique of boosting has several var-
the training datasets are generated differs iations. The widely known is AdaBoost
between bagging and the idealized technique approach, which is a frequently used classifi-
outlined above. Bagging simply resamples the cation method. It may be used to any classifi-
existing training data rather than generating cation learning method, just like bagging. To
independent datasets from the domain. The make things easier, we will suppose that the
resampled datasets are distinct from one learning algorithm can manage weighted
another, yet they are not independent because instances, in which each instance’s weight
they are all based on the same dataset [30]. A is a positive number. When instance weights
simple Python code for Bagging is given below. are included, the error of a classifier is

#Import Bagging Ensemble Model


from sklearn.ensemble import BaggingClassifier
#Create a Bagging Ensemble Classifier
bagging = BaggingClassifier(tree.DecisionTreeClassifier(),max_samples = 0.5,
max_features=0.5)
#Train the model using the training set
bagging.fit(Xtrain,ytrain)
#Predict the response for test set
ypred = bagging.predict(Xtest)

determined as the sum of the weights of the


1.4.5 Boosting misclassified instances divided by the total
The boosting approach for merging several weight of all instances, rather than the pro-
models takes use of this knowledge by looking portion of misclassified instances. The learn-
for models. Boosting, like bagging, combines ing algorithm may be pushed to focus on a
the output of individual models by voting for certain group of examples, notably those with
classification or averaging for numeric predic- a high weight, by weighting instances.
tion. It combines models of the same type, such Because there is a higher motivation to appro-
as decision trees, in the same way as bagging priately identify such cases, they become
does. Boosting, on the other hand, is an itera- much more essential [31].
tive process. Individual models are produced The boosting process starts by giving all
individually in bagging, but the performance samples in the training data the same weight.

Applications of Artificial Intelligence in Medical Imaging


12 1. Introduction to artificial intelligence techniques for medical image analysis

The learning process is then used to create a 1.4.7 XGBoost


classifier for this data, and every sample is
reweighted based on the classifier’s output. XGBoost [32] is an efficient gradient tree
The weight of correctly classified samples is boosting technique which generates decision
reduced, while the weight of incorrectly classi- trees sequentially. In all computer platforms, it
fied samples increases. This results in a collec- has the capacity to do relevant computations
tion of “easy” occurrences with low weight pretty fast. As a result, XGBoost is extensively
and a collection of “hard” samples with high employed for its ability to model newer fea-
weight. A classifier is developed for the tures and classify labels. With its implementa-
reweighted data in the following iteration— tions in tabular and structured datasets, the
and all future ones—which concentrates on XGBoost algorithm has attracted a considerable
successfully categorizing the difficult incidents. amount of interest. The XGBoost method began
The weights of the instances are then increased with a decision tree-based technique, in which
or decreased based on the output of this new graphical representations of probable choice
classifier. Hence, certain difficult situations solutions are produced based on particular cir-
may become even more difficult, and easier cumstances. Then, using a majority voting
situations may become even easier; on the mechanism, an ensemble meta-algorithm
other hand, some difficult situations may turn called “bagging” was devised to aggregate
out to be easier, and easier ones harder. The forecasts from numerous decision trees. This
weights represent how often the samples have bagging method extended further to create a
been misclassified by the classifiers developed forest or aggregation of decision trees by ran-
thus far after each iteration. This approach domly choosing attributes. The performance of
gives a sophisticated approach by producing a the models was improved by decreasing the
sequence of experts that complement one errors associated with the construction of
another by retaining a measure of “hardness” sequential models. As an additional enhance-
with each occurrence. Boosting frequently ment, the gradient decent approach was used
results in classifiers, which are substantially to eliminate errors in the sequential model.
more precise on new data than those produced Finally, the XGBoost method was recognized
by bagging. Unlike bagging, however, boosting as a useful way for optimizing the gradient
may occasionally fail in practice [31]. A simple boosting algorithm by deleting missing data
Python code for AdaBoost is given below. and avoiding overfitting concerns utilizing

#Import Adaboost ensemble model


from sklearn.ensemble import AdaBoostClassifier
#Create an Adaboost Ensemble Classifier
clf = AdaBoostClassifier(tree.DecisionTreeClassifier(),n_estimators = 10,
algorithm= 'SAMME',learning_rate = 0.5)
#Train the model using the training set
clf.fit(Xtrain,ytrain)
#Predict the response for test set
ypred = clf.predict(Xtest)

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 13
parallel processing [33]. A simple Python code level. Neurons are generally arranged in layers.
for XGBoost is given below. On their inputs, different layers conduct vari-
ous modifications [20].
% pip install xgboost
They have become more popular in applica-
#Import XGBoost ensemble model
tions that are tough to represent using standard
from xgboost import XGBClassifier
# Create XGB model
rule-based programming. The initial aim of the
model = XGBClassifier() neural network approach was to resolve pro-
#Train the model using the training set blems in the same manner as a human brain
model.fit(Xtrain, ytrain) would. Then, interest shifted to selecting certain
# make predictions for test data mental skills, resulting in biological abnormali-
ypred = model.predict(Xtest) ties such as backpropagation, and modifying
the network to indicate feedback information.
Computer (machine) vision, image classifica-
1.4.8 Artificial neural networks tion, medical diagnosis, pattern/object identifi-
Artificial neural networks (ANNs) are cation, machine translation, speech recognition,
computational models, which are modeled social network filtering, video game play, and
after the biological neural networks, which many more disciplines have all used neural net-
make up the human brain. An ANN is made works [20]. A simple Python code for ANN is
up of artificial neurons, which are a collection given below.

#Import ANN model


from sklearn.neural_network import MLPClassifier
#Create the Model
mlp = MLPClassifier(hidden_layer_sizes = (100, ), learning_rate_init = 0.001,
alpha = 1, momentum = 0.9,max_iter = 1000)
#Train the Model with Training set
mlp.fit(Xtrain,ytrain)
#Test the Model with Test set
ypred = mlp.predict(Xtest)

of linked units. Each neuronal connection has The selection of an activation function of the
the ability to send a signal from one to the neural network design process is crucial. The
other. The signal(s) can be processed by the necessity to forecast a binary class label drives
receiving neuron, which can then transfer them the selection of the sign activation function in the
to downstream neurons linked to it. Neurons case of the perceptron. Other scenarios in which
have states that are normally characterized by other goal variables might be predicted are also
real numbers, ranging from 0 to 1. Besides, possible. Nonlinear functions such as the hyper-
neurons contain weights, which change as they bolic tangents, sigmoid, or sign can be utilized in
learn, allowing them to change the intensity of different layers. The identity or linear activation
the signal they transmit downstream to other function is the simplest fundamental activation
neurons. They may also have a threshold, with function Φ(.) represented by a linear function:
the downstream signal only being transmitted
if the combined output is below (or above) that ΦðvÞ 5 v

Applications of Artificial Intelligence in Medical Imaging


14 1. Introduction to artificial intelligence techniques for medical image analysis

When the target is a real value, the linear application. Conventional neural networks
activation function is frequently utilized at the contain one input layer, two or three hidden
output node. When a smoothed surrogate loss layers, and one output layer. Deep neural net-
function is required for discrete outputs, it is works contain one input layer, many hidden
also employed. The hyperbolic tangent, sig- layers, and one output layer. The greater the
moid, and sign functions were the classic acti- number of hidden layers, the deeper the net-
vation functions utilized early in the work. The layers are linked, with the previous
construction of neural networks. The sign acti- layer’s output becoming the next layer’s input.
vation cannot be utilized to create the loss The network’s performance is determined by
function at training time due to its nondiffer- the weights of its inputs and outputs. Training
entiability, since it may be utilized to map to the network includes specifying the right
binary outputs at prediction time. The sigmoid weights for numerous layers. Deep networks
activation produces a result in the range of (0, need more computational speed, processing
1), which is useful for calculations, which capacity, a huge database, and the right paral-
should be interpreted as probabilities. The tanh lel processing software [20].
function is shaped similarly to the sigmoid Deep learning is an AI area that focuses on
function, with the exception that it is horizon- building huge neural network models capa-
tally and vertically rescaled to [ 2 1, 1]: ble of generating correct data-driven deci-

tanhðvÞ 5 2 sigmoid ð2vÞ 2 1
sions. Deep learning is best suited to
scenarios where the data is complicated and
When the outputs of the calculations must huge datasets are accessible. Deep learning is
be both positive and negative, the tanh func- used in the health-care industry to interpret
tion is preferred to the sigmoid. It is also easier medical images (X-rays, MRI scans, and CT
to train owing of its mean-centering and bigger scans) to diagnose health issues [34]. Based
gradient (due to stretching) as compared to the on the given data, ML algorithms build their
sigmoid. The tanh and sigmoid functions have own logic. The algorithm learns on its own,
been widely used for introducing nonlinearity thus no coding is required to tackle every
into neural networks. However, recently, a problem. A large number of medical images
variety of piecewise linear activation functions must be given into the algorithm in order for
have gained popularity [34]: it to learn to classify. It is supervised learning
if the images have previously been catego-
ΦðvÞ 5 maxfv; 0gðRectified Linear Unit ½ReLUÞ rized and fed. Otherwise, it is unsupervised
ΦðvÞ 5 maxfmin½v; 1; 2 1gðhard tanhÞ learning. The most basic application is cate-
gorizing a pattern into two groups, such as
determining whether a medical image
1.4.9 Deep learning belongs to tumor tissue or not [20]. A simple
It has been discovered that deep neural net- Python code for deep neural network is given
works are best fitted for the image processing below.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 15
from keras.models import Sequential
from keras.layers import Dense
# define the keras model
model = Sequential()
model.add(Dense(10, input_shape=(4,), activation='relu', name='fc1'))
model.add(Dense(10, activation='relu', name='fc2'))
model.add(Dense(3, activation='softmax', name='output'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])

# Train the model


model.fit(Xtrain, ytrain, verbose=2, batch_size=10, epochs=200)
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

1.4.10 The overfitting problem in neural


network training generalization power. Simultaneously, when
there is a large amount of training data avail-
Despite neural networks’ impressive reputa- able, a too simple model is not likely to catch
tion as universal function approximators, complicated correlations between characteristics
important obstacles such as overfitting remain and targets. A decent rule of thumb is that the
in training neural networks. The overfitting total amount of training data points should be
problem implies the fact that fitting a model to at least 2 to 3 times greater than the number of
a specific training dataset does not ensure that neural network parameters. In ML, the concept
it will perform well on unknown test data. of overfitting is frequently viewed in terms of
When the models are sophisticated and the the trade-off between bias and variance. In gen-
dataset is small, there is always a performance eral, even when a significant quantity of data is
available, neural networks need rational strat-
difference between training and test data [34].
egy to limit the detrimental consequences of
If we applied our solution to unknown test
overfitting [34].
data, we would most likely get extremely poor
results since the learnt parameters are errone-
ously implied and are not likely to generalize 1.4.10.1 Regularization
well to new situations. This form of erroneous Because a higher number of parameters lead
inference is generated by a lack of training data, to overfitting, constraining the model to utilize
which allows random details to be encoded into fewer nonzero parameters is an obvious solu-
the model. Hence, the solution fails to general- tion. When the amount of data accessible is
ize effectively to previously unknown test data. small, regularization is very critical.
Increasing the number of training examples Regularization, according to one biological
enhances the model’s generalization power but explanation, relates to progressive forgetting,
increasing the model’s complexity decreases its where “less important” examples are erased.

Applications of Artificial Intelligence in Medical Imaging


16 1. Introduction to artificial intelligence techniques for medical image analysis

Generally, more complicated models with reg- the filter and the spatial area in a layer is
ularization are preferable to simpler models performed at every available location to cre-
without regularization [34]. ate the next layer, in which the activations
keep their spatial connections from the prior
layer. Since every activation in a specific
1.4.11 Convolutional neural networks layer is a function of just a small spatial
Convolutional neural network (CNN) is a region in the preceding layer, the connec-
deep learning network, which has grown in tions in a CNN are relatively sparse. Except
popularity for image categorization. Fig. 1.2 for the final pair of two of three levels, all
depicts the CNN architecture. It is made up layers retain their spatial structure. As a
of an input layer as well as hundreds of fea- result, it is feasible to physically visualize
ture detection layers. Feature detection which elements of a picture impact which
layers conduct one of the three actions. portions of activations in a layer. Lower level
Convolution, pooling, and Rectified Linear layer features capture lines or other primi-
Unit (ReLU) are all terms used to describe tive forms, but higher level layer features
the process of convolution [20]. CNNs are catch more complex shapes. As a result, sub-
biologically inspired networks utilized in sequent layers can generate numbers by
computer vision to classify images and rec- assembling the shapes in these intuitive
ognize objects. A convolution process is characteristics. Furthermore, a subsampling
specified for the convolution layers, where a layer simply averages the data in the local
filter is employed to transfer the activations areas of size 2 3 2 to reduce the geographical
from one layer to the next. A convolution footprints of the layers by a factor of 2.
operation employs a three-dimensional CNNs were the most effective of all forms of
weighted filter with the same depth as the neural networks in the past. They are com-
current layer just a smaller spatial area. The monly employed in image identification,
dot product of all the weights in the filter object detection and location, and even lan-
and any choice of spatial region in a layer guage processing [35].
describes the value of the hidden state in the Convolution processes an image by passing
subsequent layer. The interaction between it through convolution filters, which trigger

Convolution Pooling Convolution Pooling


Fully
Connected

FIGURE 1.2 CNN architecture. CNN, Convolutional neural network.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 17
particular characteristics in the image. Pooling convolution of a predefined mask with pixel
uses nonlinear downsampling to decrease the values. This is analogous to a linear filtering
amount of data that must be processed. ReLU procedure, and the convolution result is
keeps positive values while negative values determined by the predefined mask. Another
are set to zero. The classification layer is feature map at a higher level than the first is
located before to the output layer. It is a fully created, when the first layer’s activation out-
linked layer with an N-dimensional output, put (feature map) is fed into the second hid-
where N is the number of categories to be den layer and the convolution procedure is
classified. This layer produces an N- repetitive. This approach generates distinct
dimensional vector, with each member repre- activation maps for various image characteris-
senting the likelihood that the input image tics, including complex features. At the net-
belongs to one of the N classes. The catego- work’s end, there is a completely linked layer
rized output is provided by the final output that accepts input from the preceding layer
layer, which employs a softmax function. and creates N-dimensional output. For an N-
Each layer of the network processes data, class issue, it may return N alternative proba-
which is then passed on to the next layer. bility values, each expressing the likelihood
CNNs are based on the biological anatomy of that the item belongs to that class. For the
the visual cortex. The simple and complicated classification to be accurate, the CNN must be
cells of the visual cortex activate receptive trained on millions of pictures using the back-
field subregions of a visual field. Instead of propagation technique. The procedure is non-
being totally linked, the subregions of the pre- linear since each convolution layer is followed
ceding layer are linked to the neurons of a by an activation ReLU layer. The negative
layer in CNN. Other subregions have no effect activation values are set to zero, while the
on the neurons. Unlike traditional neural nets, positive activation values are kept. Following
the subregions are permitted to overlap and the ReLU levels are pooling layers, the most
thus provide spatially related results. This is well-known of which is the max-pooling
the primary distinction between CNN and layer, which uses the activation inputs and
other neural networks [20]. creates a downsampled output. During the
network’s training phase, there are additional
1.4.11.1 Functioning of convolutional dropout layers that remove specific activation
neural network outputs. Overfitting is avoided as a result of
In the operation of a CNN, an image con- this [20].
taining one or more items to be categorized The conditions in every layer of CNNs are
will be sent into the CNN. The amount of organized according to a spatial grid pattern.
input values will be determined by the image The spatial associations are passed down from
size and pixel depth. These are merely inte- one layer to the next, since each feature value
gers that must be interpreted or recognized as is dependent on a tiny local geographic loca-
objects of a particular type. CNN attempts to tion in the preceding layer. It is critical to retain
mimic the human visual cortex, which pos- these spatial correlations among the grid cells,
sesses sensitivity to specific areas of the visual as the convolution operation and the transition
field in tiny clusters of cells. Some neurons in to the next layer are heavily based on these lin-
the brain are susceptible to specific informa- kages. A layer’s depth in a CNN must not be
tion in visual images, such as curves, edges, confused with the network’s overall depth. The
and so on. The convolution layer is the first CNN works similar to a standard feed-forward
layer of CNN, and it conducts spatial neural network, with the exception that the

Applications of Artificial Intelligence in Medical Imaging


18 1. Introduction to artificial intelligence techniques for medical image analysis

procedures in its levels are spatially structured performed just over the area of the layer
and the links between layers are sparse (and wherever the values are defined [35].
carefully planned). Convolution, pooling, and
ReLU are the three types of layers that are typi- 1.4.11.3 Strides
cally seen in a CNN. The activation of the Convolution can also be used to minimize
ReLU is similar to that of a standard neural the image’s spatial footprint in various ways.
network. Furthermore, a final set of layers is The method described above executes convo-
frequently completely linked and translates to lution at every point in the feature map’s spa-
a set of output nodes in an application-specific tial location. However, the convolution does
manner. The CNN’s input data is structured not have to be performed at every spatial
into a two-dimensional grid structure, with place in the layer. The concept of strides may
pixels representing the values of individual be used to decrease the granularity of the con-
grid points. As a result, each pixel in the pic- volution. A stride of one is the most usual,
ture correlates to a certain spatial position. but a stride of two is also rarely utilized. In
However, a multidimensional array values at normal conditions, strides of more than two
each grid place is required to represent the spe- are uncommon. Larger strides can aid with
cific hue of the pixel. We have an intensity of memory constraints or decrease overfitting if
the three primary hues in the RGB color the spatial resolution is excessively high. A
scheme, which are red, green, and blue, respec- bigger receptive field is beneficial to detect a
tively [35]. complex feature in a greater spatial region of
the image. The hierarchical feature engineer-
1.4.11.2 Padding ing method of a CNN captures increasingly
One thing to keep in mind is that the con- complicated forms in later layers.
volution procedure shrinks the (q 1 1)th layer Historically, another process known as max-
in contrast to the qth layer. In general, this pooling has been used to improve the recep-
style of image reduction is undesirable since tive fields [35].
it has a tendency to lose some information
near the image’s edges. Padding can be used 1.4.11.4 The Rectified Linear Unit layer
to remedy this problem. To preserve the spa- The pooling and ReLU operations are com-
tial footprint, padding is used to add (Fq 1)/2 bined with the convolution operation. The
“pixels” all around the edges of the feature activation of the ReLU is similar to how it is
map. In the case of padding hidden layers, done in a standard neural network. Because
these pixels are really feature values. the ReLU is a basic one-to-one mapping of
Regardless of whether the input or hidden activation values, it has no effect on the layer’s
layers are padded, the value of each of these dimensions. The activation function is paired
padded feature values is set to 0. As a result, with a linear transformation with a matrix of
the input volume’s spatial height and breadth weights to produce the next layer of activa-
will both grow by (Fq 21), which is exactly tions in classic neural networks. An ReLU
what they would decrease after the convolu- layer is frequently not clearly displayed in
tion. Because their values are set to 0, the graphical representations of convolution neu-
padded parts have no effect on the final dot ral network designs, and it often follows a
product. Padding allows the convolution convolution operation. It’s worth noting that
operation to be performed with a piece of the the ReLU activation function is a very new
filter “sticking out” from the layer’s bound- addition to neural network architecture.
aries, allowing the dot product to be Saturating activation functions like tanh and

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 19
sigmoid were employed in the past. In terms same duties as a standard feed-forward net-
of speed and accuracy, the ReLU has a signifi- work. In most circumstances, more than one
cant advantage over these activation func- fully connected layer may be used to enhance
tions. Improved speed is also linked to the computing capacity toward the end. The
precision since it permits for the adoption of connections between these levels are arranged
more complex models and the training of in the same way that a traditional feed-forward
them for longer periods of time. The ReLU network is. The fully connected layers include
activation function has progressively substi- many parameters, as the fully connected layers
tuted the other activation functions in CNN are densely connected. The fully connected
architecture in recent years, to the point that layers generally contain more connections than
the ReLU will be used as the default activation the convolutional layers, despite the fact that
function [35]. the convolutional layers have more activations.
Based on the characteristics of the application,
1.4.11.5 Pooling one may utilize logistic, softmax, or linear acti-
The pooling process, on the other hand, is vation. To construct a single value, one option
quite different. Unlike filters, the pooling pro- for employing completely linked layers is to
cess applies on tiny grid patches of size apply average pooling throughout the whole
Pq 3 Pq in each layer and creates a new layer spatial region of the final set of activation
of the same depth. The maximum of these maps [35].
values is provided for each square region with
the size Pq 3 Pq in each of the dq activation 1.4.11.7 Training a convolutional network
maps. This method is stated as max-pooling. The backpropagation technique is used to
The spatial dimensions of each activation map train a CNN. The convolution, ReLU, and
are dramatically reduced when they are max-pooling layers are the three main types
pooled. Pooling takes place at the level of each of layers. Because the ReLU is similar to a
activation map, unlike convolution processes. standard neural network, it is quite simple to
As a result, the operation of pooling has no backpropagate via it. To max-pool with no
effect on the amount of feature maps. In other overlap across pools, all that is required is to
words, the depth of the pooled layer is the figure out which unit is the pool’s maximum
same as the depth of the layer on which the value. With regard to the pooled state, the
pooling operation was conducted. The usual partial derivative of the loss flows back to the
size Pq of the region across which pooling is unit with the highest value. Except for the
performed is 2 3 2. At a stride of 2, there largest entry in the grid, all other entries are
would be no overlap between the several zones given a value of 0. ReLU procedures and
being pooled, and this kind of arrangement is backpropagation through maximizing are sim-
extremely frequent. However, it has been ilar to those used in conventional neural net-
argued that having at least some overlap works [35].
among the geographical units at which the
pooling is conducted is advantageous since it 1.4.11.8 Dropout
reduces the likelihood of overfitting [35]. Dropout is a method for creating a neural
network ensemble that employs node sam-
1.4.11.6 Fully connected layers pling rather than edge sampling. When a
Every hidden state in the first fully con- node is removed, all incoming and outgoing
nected layer is coupled to every feature in the connections to and from that node must be
final spatial layer. This layer performs the removed as well. For the nodes, just the

Applications of Artificial Intelligence in Medical Imaging


20 1. Introduction to artificial intelligence techniques for medical image analysis

network’s input and hidden layers are sam- gradient-descent strategy is stopped. The size
pled. Dropout is a technique of combining of the parameter space is effectively reduced to
node sampling with weight sharing. a smaller neighborhood inside the starting
Backpropagation is then used by the training values of the parameters when early stopping
procedure to update the weights of the sam- is used. Early stopping functions as a regulari-
pled network utilizing a single sampled zer in this case since it effectively limits the
example. Dropout has the major consequence parameter space [34].
of incorporating regularization into the learn-
ing process. Dropout efficiently introduces
noise into both the input data and the hidden 1.4.11.10 Batch normalization
interpretations by dropping both input and Batch normalization is a relatively new
hidden units. Regularization is a type of noise technique for dealing with vanishing and
addition. Dropout prevents hidden units exploding gradient issues that leads to activa-
from adapting to one other’s features, a pro- tion gradients in consecutive layers to either
cess known as feature coadaptation. Because decrease or increase in magnitude. Internal
the impact of dropout is a masking noise that covariate shift is another significant issue in
eliminates part of the hidden units, this deep network training. The issue is that
method imposes some redundancy between throughout training, the parameters are
the characteristics learnt at the various hid- adjusted, and hence the hidden variable acti-
den units. Increased resilience is the result of vations are adjusted as well. The goal of batch
this form of redundancy. Since every one of normalization is to create features with identi-
the sampled subnetworks is trained with a cal variance by adding extra “normalization
limited number of sampled examples, drop- layers” between hidden layers, which with-
out is efficient. As a result, just the additional stand this sort of actions. This extra node
labor of sampling the hidden units is must be taken into account by the backpropa-
required. However, because dropout is a reg- gation algorithm in order to guarantee the loss
ularization approach, it limits the network’s derivative of layers before the batch normali-
expressive ability. Thus, in order to fully ben- zation layer compensates for the transforma-
efit from dropout, one must employ larger tion entailed with these new nodes. Batch
models and more units. As a result, there is a normalization has the unique characteristic of
hidden computational overhead. Moreover, if acting as a regularizer. It is worth noting that
the initial training dataset is big enough to the same data point might result in slightly
limit the chance of overfitting, the computa- distinct adjustments based on which batch it
tional benefits of dropout may be minor but is in. This impact might be viewed as a form
still noticeable [35]. of noise that is introduced to the updating
process. Adding a tiny bit of noise to the train-
1.4.11.9 Early stopping ing data is a common way to accomplish regu-
Early stopping is a popular type of regulari- larization. Although there is no perfect
zation, where the gradient descent is stopped agreement on this topic, it has been empiri-
after just a few iterations. Keeping a portion of cally noted that regularization approaches
the training data and then assessing the mod- such as dropout do not enhance the perfor-
el’s error on the hold-out set is one technique mance once batch normalization is applied
to determine the stopping point. When the [35]. A simple Python code for CNN is given
error on the hold-out set starts to grow, the below.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 21
from tensorflow.keras.preprocessing.image import load_img ,img_to_array
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D,AveragePooling2D
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.layers import Flatten,Dropout,SpatialDropout2D
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=10,
restore_best_weights=True, verbose=1)

#Create the CNN Model


model = Sequential()
#1st Convolutional Layer
model.add(Conv2D(32, (3, 3), padding='valid', strides=(1, 1),input_shape=img_shape))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(1,1), padding='same'))

#2nd Convolutional Layer


model.add(Conv2D(32, (3, 3), padding='same', strides=(1, 1)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2,2), padding='same'))
model.add(Dropout(0.2))

#3rd Convolutional Layer


model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

#4th Convolutional Layer


model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

Applications of Artificial Intelligence in Medical Imaging


22 1. Introduction to artificial intelligence techniques for medical image analysis

#Passing it to a Fully Connected layer


model.add(Flatten())
# 1st Fully Connected Layer
model.add(Dense(64, input_shape=(32,32,3,)))
model.add(BatchNormalization())
model.add(Activation('relu'))
# Add Dropout to prevent overfitting
model.add(Dropout(0.4))

#2nd Fully Connected Layer


model.add(Dense(32))
model.add(BatchNormalization())
model.add(Activation('relu'))
# Add Dropout to prevent overfitting
model.add(Dropout(0.4))

#Output Layer
model.add(Dense(3))
model.add(BatchNormalization())
model.add(Activation('softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

# Train the model


history = model.fit(
Xtrain,
ytrain,
epochs=200,
validation_split=0.25,
batch_size=10,
verbose=2,
callbacks=[early_stopping])
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

# plot training history


from matplotlib import pyplot
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='validation')
pyplot.legend()
pyplot.show()

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 23

1.4.12 Recurrent neural networks The most typical use of recurrent neural
networks (RNNs) is text data.
All neural networks are built to handle multi- 3. Sequences in biological data are often seen,
dimensional data with features that are mainly and the symbols may relate to one of the
independent to each other. Particular data nucleobases or amino acids, which comprise
types, such as biological data, text, and time DNA’s building blocks.
series, do, nevertheless, have sequential rela-
tionships between the properties. The following Particular values in a sequence might be
are some examples of such dependencies: actual or symbolic in nature. Time series is
another term for real-valued sequences. For any
1. The values on successive time stamps in a form of data, RNNs may be employed.
time-series data collection are tightly tied to Symbolic values are more commonly used in
one another. If the values of these time practical applications. The vanishing and
stamps are treated as separate exploding gradient problems are one of the
characteristics, important information about most crucial problems in this field. This prob-
the connections between them is lost. While lem is especially prominent in deep networks,
the values at various time stamps are such as RNNs. Hence, a variety of RNN types
processed separately; however, this causes have been developed, including the gated recur-
information lost. rent unit and long short-term memory (LSTM).
2. Even though text is frequently processed as Image captioning, sequence-to-sequence learn-
a bag of words, the sequencing of the words ing, sentiment analysis, and machine translation
might provide superior semantic insights. In are just a few of the areas where RNNs and
such instances, it is critical to build models their derivatives have been applied [35]. A sim-
that account for the sequencing information. ple Python code for RNN is given below.

from keras.layers import Dense


from keras.models import Sequential
from keras.layers import SimpleRNN
#Create Model
model = Sequential()
model.add(SimpleRNN(32))
model.add(Dense(1, activation = 'sigmoid'))
# Compile model
model.compile(optimizer = 'rmsprop', loss = 'binary_crossentropy', metrics
= ['acc'])
# Fit model
history = model.fit(Xtrain, ytrain,epochs = 100, batch_size = 20, validation_split=0.2)
# Evaluate the model
_, train_acc = model.evaluate(Xtrain, ytrain, verbose = 0)
_, test_acc = model.evaluate(Xtest, ytest, verbose = 0)
print('Training Accuracy: %.3f, Testing Accuracy: %.3f' % (train_acc,test_acc))

Applications of Artificial Intelligence in Medical Imaging


24 1. Introduction to artificial intelligence techniques for medical image analysis

1.4.13 Long short-term memory 1.4.14 Data augmentation


Problems with vanishing and exploding gra- Data augmentation is a popular approach
dients affect RNNs. This is a typical dilemma to reduce overfitting in CNN. New training
in neural network updates, in which repeated patterns are created utilizing transforma-
multiplication by the matrix is intrinsically tions on the existing samples in data aug-
unstable; the gradient either vanishes during mentation. Data augmentation works
backpropagation or blows up to huge values in particularly effectively in the field of image
an unstable manner. At various time stamps, processing. Because numerous transforma-
consecutive multiplication with the weight tions, such as rotation, patch extraction,
matrix causes this form of instability. One way reflection, and translation, do not mainly
to look at this issue is that a neural network, alter the attributes of an image’s object.
which only employs multiplicative updates, is Moreover, they improve the dataset’s gener-
only excellent at learning short sequences and, alization ability when trained with the sup-
hence, has good short-term memory but poor plemented dataset. Because many of these
long-term memory by default. One approach types of data augmentation involve rela-
to solving this challenge is to use the LSTM tively little work, the augmented images do
and long-term memory to change the recur- not require to be produced clearly before-
rence equation for the hidden vector. The hand. They can instead be generated during
operations of LSTM are designed to give the the training process when an image is being
user fine-grained control over the data kept in examined. In many circumstances, employ-
this long-term memory [35]. A simple Python ing picture patches to create the training
code for LSTM is given below. dataset might be beneficial [35]. A simple

from keras.layers import Dense


from keras.models import Sequential
from keras.layers import LSTM
#Create Model
model = Sequential()
model.add(LSTM(32))
model.add(Dense(1, activation = 'sigmoid'))
# Compile model
model.compile(optimizer = 'rmsprop', loss = 'binary_crossentropy', metrics= ['acc'])
# Fit model
history = model.fit(Xtrain, ytrain,epochs = 100,batch_size = 10, validation_split = 0.2)
# Evaluate the model
_, train_acc = model.evaluate(Xtrain, ytrain, verbose = 0)
_, test_acc = model.evaluate(Xtest, ytest, verbose = 0)
print('Training Accuracy: %.3f, Testing Accuracy: %.3f' % (train_acc, test_acc))

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 25
Python code for data augmentation is given to those found in a real repository. Moreover, the
below. For detailed examples, check the objective is to produce synthetic images, which
machinelearningmastery.com website1. are so genuine that a trained observer will be

#Example of horizontal shift image augmentation from machinelearningmastery.com


from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('img.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(width_shift_range=[-200,200])
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
for i in range(9):
# define subplot
pyplot.subplot(330 + 1 + i)
# generate batch of images
batch = it.next()
# convert to unsigned integers for viewing
image = batch[0].astype('uint8')
# plot raw pixel data
pyplot.imshow(image)
# show the figure
pyplot.show()

1.4.15 Generative adversarial networks unable to tell whether an item is produced syn-
thetically or belongs to the original dataset. The
In generative adversarial networks (GANs), created objects are frequently used to generate
two neural network models are used at the same vast volumes of synthetic data for AI algorithms,
time. First, a generative model generates syn- and they may also be used to augment data.
thetic instances of images, which are comparable Furthermore, by providing context, this

1
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-
networks/.

Applications of Artificial Intelligence in Medical Imaging


26 1. Introduction to artificial intelligence techniques for medical image analysis

technique may be used to generate objects with autoencoder. GAN is frequently used to generate
different aspects. The parameters of the genera- image with a variety of contexts. The image setup
tor and discriminator are simultaneously is, without a doubt, the most prevalent use of
updated throughout the training phase of a GANs. The image setting generator is known as a
GAN. The discriminator and the generator are deconvolutional network. As a result, the match-
both neural networks. The generator can be com- ing GAN is also known as a deep convolutional
pared to the decoder part of a variational generative adversarial network (DCGAN) [35]. A
autoencoder. However, the training procedure simple Python code for GAN is given below. For
differs significantly from that of a variational details, check the related KERAS web page2.

import matplotlib.pyplot as plt


import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
from tensorflow.keras import layers

# DCGAN generator
def get_generator():
noise_input = keras.Input(shape=(noise_size,))
x = layers.Dense(4 * 4 * width, use_bias=False)(noise_input)
x = layers.BatchNormalization(scale=False)(x)
x = layers.ReLU()(x)
x = layers.Reshape(target_shape=(4, 4, width))(x)
for _ in range(depth - 1):
x = layers.Conv2DTranspose(
width, kernel_size=4, strides=2, padding="same", use_bias=False,
)(x)
x = layers.BatchNormalization(scale=False)(x)
x = layers.ReLU()(x)
image_output = layers.Conv2DTranspose(
3, kernel_size=4, strides=2, padding="same", activation="sigmoid",
)(x)

return keras.Model(noise_input, image_output, name="generator")

2
https://keras.io/examples/generative/gan_ada/.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 27
# DCGAN discriminator
def get_discriminator():
image_input = keras.Input(shape=(image_size, image_size, 3))
x = image_input
for _ in range(depth):
x = layers.Conv2D(
width, kernel_size=4, strides=2, padding="same", use_bias=False,
)(x)
x = layers.BatchNormalization(scale=False)(x)
x = layers.LeakyReLU(alpha=leaky_relu_slope)(x)
x = layers.Flatten()(x)
x = layers.Dropout(dropout_rate)(x)
output_score = layers.Dense(1)(x)

return keras.Model(image_input, output_score, name="discriminator")

class GAN_ADA(keras.Model):
def __init__(self):
super().__init__()

self.augmenter = AdaptiveAugmenter()
self.generator = get_generator()
self.ema_generator = keras.models.clone_model(self.generator)
self.discriminator = get_discriminator()

self.generator.summary()
self.discriminator.summary()

def compile(self, generator_optimizer, discriminator_optimizer, **kwargs):


super().compile(**kwargs)

# separate optimizers for the two networks


self.generator_optimizer = generator_optimizer
self.discriminator_optimizer = discriminator_optimizer

self.generator_loss_tracker = keras.metrics.Mean(name="g_loss")
self.discriminator_loss_tracker = keras.metrics.Mean(name="d_loss")
self.real_accuracy = keras.metrics.BinaryAccuracy(name="real_acc")
self.generated_accuracy = keras.metrics.BinaryAccuracy(name="gen_acc")
self.augmentation_probability_tracker = keras.metrics.Mean(name="aug_p")
self.kid = KID()

Applications of Artificial Intelligence in Medical Imaging


28 1. Introduction to artificial intelligence techniques for medical image analysis

@property
def metrics(self):
return [
self.generator_loss_tracker,
self.discriminator_loss_tracker,
self.real_accuracy,
self.generated_accuracy,
self.augmentation_probability_tracker,
self.kid,
]

def generate(self, batch_size, training):


latent_samples = tf.random.normal(shape=(batch_size, noise_size))
# use ema_generator during inference
if training:
generated_images = self.generator(latent_samples, training)
else:
generated_images = self.ema_generator(latent_samples, training)
return generated_images

def adversarial_loss(self, real_logits, generated_logits):


# this is usually called the non-saturating GAN loss

real_labels = tf.ones(shape=(batch_size, 1))


generated_labels = tf.zeros(shape=(batch_size, 1))

# the generator tries to produce images that the discriminator considers as real
generator_loss = keras.losses.binary_crossentropy(
real_labels, generated_logits, from_logits=True
)
# the discriminator tries to determine if images are real or generated

discriminator_loss = keras.losses.binary_crossentropy(
tf.concat([real_labels, generated_labels], axis=0),
tf.concat([real_logits, generated_logits], axis=0),
from_logits=True,
)

return tf.reduce_mean(generator_loss), tf.reduce_mean(discriminator_loss)

def train_step(self, real_images):


real_images = self.augmenter(real_images, training=True)

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 29
# use persistent gradient tape because gradients will be calculated twice
with tf.GradientTape(persistent=True) as tape:
generated_images = self.generate(batch_size, training=True)
# gradient is calculated through the image augmentation
generated_images = self.augmenter(generated_images, training=True)

# separate forward passes for the real and generated images, meaning
# that batch normalization is applied separately
real_logits = self.discriminator(real_images, training=True)
generated_logits = self.discriminator(generated_images, training=True)

generator_loss, discriminator_loss = self.adversarial_loss(


real_logits, generated_logits
)

# calculate gradients and update weights


generator_gradients = tape.gradient(
generator_loss, self.generator.trainable_weights
)
discriminator_gradients = tape.gradient(
discriminator_loss, self.discriminator.trainable_weights
)
self.generator_optimizer.apply_gradients(
zip(generator_gradients, self.generator.trainable_weights)
)
self.discriminator_optimizer.apply_gradients(
zip(discriminator_gradients, self.discriminator.trainable_weights)
)

# update the augmentation probability based on the discriminator's performance

self.augmenter.update(real_logits)
self.generator_loss_tracker.update_state(generator_loss)
self.discriminator_loss_tracker.update_state(discriminator_loss)
self.real_accuracy.update_state(1.0, step(real_logits))
self.generated_accuracy.update_state(0.0, step(generated_logits))
self.augmentation_probability_tracker.update_state(self.augmenter.probability)

Applications of Artificial Intelligence in Medical Imaging


30 1. Introduction to artificial intelligence techniques for medical image analysis

# track the exponential moving average of the generator's weights to decrease


# variance in the generation quality
for weight, ema_weight in zip(
self.generator.weights, self.ema_generator.weights
):
ema_weight.assign(ema * ema_weight + (1 - ema) * weight)

# KID is not measured during the training phase for computational efficiency
return {m.name: m.result() for m in self.metrics[:-1]}

def test_step(self, real_images):


generated_images = self.generate(batch_size, training=False)

self.kid.update_state(real_images, generated_images)

# only KID is measured during the evaluation phase for computational efficiency
return {self.kid.name: self.kid.result()}

def plot_images(self, epoch=None, logs=None, num_rows=3, num_cols=6, interval=5):


# plot random generated images for visual evaluation of generation quality
if epoch is None or (epoch + 1) % interval == 0:
num_images = num_rows * num_cols
generated_images = self.generate(num_images, training=False)

plt.figure(figsize=(num_cols * 2.0, num_rows * 2.0))


for row in range(num_rows):
for col in range(num_cols):
index = row * num_cols + col
plt.subplot(num_rows, num_cols, index + 1)
plt.imshow(generated_images[index])
plt.axis("off")
plt.tight_layout()
plt.show()
plt.close()

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 31
# create and compile the model
model = GAN_ADA()
model.compile(
generator_optimizer=keras.optimizers.Adam(learning_rate, beta_1),
discriminator_optimizer=keras.optimizers.Adam(learning_rate, beta_1),
)

# save the best model based on the validation KID metric


checkpoint_path = "gan_model"
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
save_weights_only=True,
monitor="val_kid",
mode="min",
save_best_only=True,
)

# run training and plot generated images periodically


model.fit(
train_dataset,
epochs=num_epochs,
validation_data=val_dataset,
callbacks=[
keras.callbacks.LambdaCallback(on_epoch_end=model.plot_images),
checkpoint_callback,
],
)

# load the best model and generate images


model.load_weights(checkpoint_path)
model.plot_images()

1.4.16 Transfer learning the ImageNet data can be employed to define an


entirely separate image dataset by passing it
One of the obstacles of the image analysis is through a pretrained CNN and extracting the
that for a specific use, labeled training data might multidimensional features from the fully con-
not still be accessible. Taking into account the nected layers. This new form can also be used for
condition where one has a set of images, which a completely other purpose, such as clustering or
require to be employed for image retrieval. It is retrieval. This technology is so widely used that
crucial for the features to be semantically consis- CNNs are seldom trained from scratch. Because
tent since the labels do not exist in retrieval appli- a shared resource such as ImageNet could be
cations. The ImageNet dataset includes more employed to extract features in situations when
than a million images taken from 1000 classes appropriate training data is not available to
experienced in daily life. The idea behind the address various issues, this kind of off-the-shelf
selection of 1000 classes is that they can be used feature extraction technique can be employed as
to derive image features for general-purpose set- a type of transfer learning. If any extra training
tings. For instance, the features extracted from data is accessible and can only be utilized to

Applications of Artificial Intelligence in Medical Imaging


32 1. Introduction to artificial intelligence techniques for medical image analysis

fine-tune the deeper layers that are closer to the convolutional layer filters the first convolutional
output layer. The weights (closer to the input) of layer’s response-normalized and pooled output.
the early layers are fixed. While holding the early The third, fourth, and fifth convolutional layers
layers fixed, the purpose for training only the have no intervening pooling or normalizing layers.
deeper layers is that the earlier layers catch only The fully connected layers contain 4096 neurons.
simple features such as edges, whereas the dee- To accomplish classification, AlexNet’s final layer
per layers capture more complex features. For the employs a 1000-way softmax. It is worth noting
application at hand, the simple features do not that the last layer of 4096 activations is frequently
alter too much, while the deeper features might utilized to generate a flat 4096-dimensional repre-
be vulnerable to the desired application [35]. sentation of an image for purposes other than clas-
sification. These characteristics may be extracted
from any out-of-sample image by simply feeding it
1.4.16.1 AlexNet
through the trained neural network. These charac-
It is worth noting that the original AlexNet teristics frequently transfer well to various tasks
architecture had two parallel processing pipelines, and datasets. In most CNNs today, the activation
which are controlled by two GPUs cooperating to function is virtually entirely centered on the ReLU,
create the training model at a faster speed and with which was not the case prior to AlexNet. In order
memory sharing. After each convolutional layer, to enhance generalization, dropout with L2-weight
the ReLU activation function was used, followed decay was utilized [35]. A simple Python code for
by normalization and max-pooling. The second AlexNet is given below.

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D,AveragePooling2D,BatchNormalization
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.layers import Flatten,Dropout,SpatialDropout2D
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=10,
restore_best_weights=True,verbose=1)

AlexNet = Sequential()

#1st Convolutional Layer


AlexNet.add(Conv2D(filters=96, input_shape=img_shape, kernel_size=(11,11),
strides=(4,4),padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#2nd Convolutional Layer


AlexNet.add(Conv2D(filters=256, kernel_size=(5, 5), strides=(1,1), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))
1.4 Supervised learning 33
#3rd Convolutional Layer
AlexNet.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))

#4th Convolutional Layer


AlexNet.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))

#5th Convolutional Layer


AlexNet.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#Passing it to a Fully Connected layer


AlexNet.add(Flatten())
# 1st Fully Connected Layer
AlexNet.add(Dense(4096, input_shape=(32,32,3,)))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
# Add Dropout to prevent overfitting
AlexNet.add(Dropout(0.2))

#2nd Fully Connected Layer


AlexNet.add(Dense(4096))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
#Add Dropout
AlexNet.add(Dropout(0.2))

#3rd Fully Connected Layer


AlexNet.add(Dense(1000))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
#Add Dropout
AlexNet.add(Dropout(0.2))

Applications of Artificial Intelligence in Medical Imaging


34 1. Introduction to artificial intelligence techniques for medical image analysis

#Output Layer
AlexNet.add(Dense(3))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation( 'softmax'))

AlexNet.compile(optimizer='adam',loss='categorical_crossentropy' ,metrics=['acc'])

# Train the model


history = AlexNet.fit(
Xtrain,
ytrain,
epochs=200,
validation_split=0.25,
batch_size=10,
verbose=2,
callbacks=[early_stopping])
# Test on unseen data
results = AlexNet.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

# plot training history


from matplotlib import pyplot
pyplot.plot(history.history[ 'loss'], label='train')
pyplot.plot(history.history[ 'val_loss'], label='test')
pyplot.legend()
pyplot.show()

1.4.16.2 Visual geometry group limited part of the image if the network is not
Visual geometry group (VGG) [36] also deep. VGG always employs filters with a spa-
noted the rising trend of increased network tial footprint of 3 3 3 and a pooling size of
depth. The studied networks were built in a 2 3 2. The convolution is performed using
variety of topologies with layer sizes ranging stride 1 and padding of 1. Stride 2 is applied
from 11 to 19, with 16 or more layers being for pooling. Another intriguing feature of
the most effective. VGG’s significant break- VGG’s architecture is that the number of fil-
through was that it lowered filter sizes while ters is frequently doubled after each max-
increasing depth. It is critical to recognize this pooling. The aim is to constantly double the
type of a smaller filter size since it needs a depth when the spatial footprint decreased by
greater depth. A tiny filter can only capture a a factor of two. This design idea leads in some

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 35
degree of balance in processing attempt known to create instability. Pretraining was
between layers that is utilized by certain sub- used to tackle this challenge, where a shallow
sequent topologies such as ResNet. The usage architecture was initially trained, and then
of deep configurations has the disadvantage more layers were included [35]. A simple
of increasing sensitivity with startup that is Python code for VGG16 is given below.

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.layers import Flatten,Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=10,
restore_best_weights=True, verbose=1)

base_model = tf.keras.applications.VGG16(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)

for l in base_model.layers:
l.trainable = False

model = Sequential()
model.add(base_model)

model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])

Applications of Artificial Intelligence in Medical Imaging


36 1. Introduction to artificial intelligence techniques for medical image analysis

# Train the model


history = model.fit(
Xtrain,
ytrain,
epochs=200,
validation_split=0.25,
batch_size=10,
verbose=2,
callbacks=[early_stopping])
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

# plot training history


from matplotlib import pyplot
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

1.4.16.3 ResNet substantial differences between training and


ResNet [37] included 152 layers that is test error, many deep networks have high error
nearly an order of size more than existing on both training and test data. This means that
architectures. Training a 152-layered architec- the optimization process has not progressed
ture is often not achievable unless some signifi- sufficiently. In neural network learning, layer-
cant advances are used. The fundamental wise implementations need all ideas in the
problem with training such deep networks is image to be at the same level of abstraction,
to hinder the gradient flow between layers by while hierarchical feature engineering is the
the enormous number of operations in deep golden standard. Some concepts need fine-
layers, which might change the magnitude of grained connections, whereas others can be
the gradients. Increased depth causes issues learned employing shallow networks [35]. A
such as vanishing and exploding gradients. simple Python code for ResNet101 is given
Even though some deep networks have below.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 37
from tensorflow.keras import Model
from tensorflow.keras.models import
Sequential
from tensorflow.keras.layers import
Conv2D
from tensorflow.keras.layers import
MaxPooling2D,AveragePooling2D,BatchNormalization
from tensorflow.keras.layers import
Dense,Activation
from tensorflow.keras.layers import
Flatten,Dropout,SpatialDropout2D,AveragePooling2D,
GlobalAveragePooling2D
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=10,


restore_best_weights=True, verbose=1)

base_model = tf.keras.applications.ResNet101(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

model = Sequential()
model.add(base_model)

model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])

Applications of Artificial Intelligence in Medical Imaging


38 1. Introduction to artificial intelligence techniques for medical image analysis

# Train the model


history = model.fit(
Xtrain,
ytrain,
epochs=200,
validation_split= 0.25,
batch_size= 10,
verbose= 2,
callbacks=[early_stopping])
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[ 0]))
print('Final test set accuracy: {:4f}'.format(results[ 1]))

# plot training history

from matplotlib import pyplot


pyplot.plot(history.history[ 'loss'], label='train')
pyplot.plot(history.history[ 'val_loss' ], label='test')
pyplot.legend()
pyplot.show()

1.4.16.4 MobileNet architecture and merging data based on the convolutional


The MobileNet model is developed on kernels to build a new representation. For a
depthwise separable convolutions, a type of substantial reduction in computational cost,
factorized convolution that divides a regular the filtering and combining stages can be split
convolution into a depthwise convolution and into two parts by means of factorized convolu-
a 1 3 1 convolution known as a pointwise con- tions called depthwise separable convolutions.
volution. In MobileNets, a single filter is Depthwise separable convolutions are com-
added to each input channel with depthwise posed of two layers: depthwise and pointwise
convolution. In order to integrate the depth- convolutions. The output of the depthwise
wise convolution outputs, the pointwise con- layer is then linearly combined by means of
volution is then followed by a 1 3 1 pointwise convolution, a basic 1 3 1 convolu-
convolution. In a single step, in order to gen- tion. MobileNets utilize batchnorm and ReLU
erate a new set of outputs, a standard convo- nonlinearities for both layers. Depthwise con-
lution filters and mixes inputs. The depthwise volution is far more efficient than ordinary
separable convolution divides this into two convolution. It does, however, simply filter
layers, one for filtering and another for com- input channels and does not merge them to
bining. In order to break the interaction generate new features. An extra layer, which
between the kernel size and the number of computes a linear combination of the output
output channels, MobileNet employs depth- of depthwise convolution with 1 3 1 convolu-
wise separable convolutions. The ordinary tion is essential in order to generate these new
convolution includes the impact of filtering features. Downsampling is employed in the

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 39
depthwise convolutions as well as the first MobileNet includes 28 layers if depthwise and
layer using strided convolution. A last aver- pointwise convolutions are included as inde-
age pooling decreases the spatial resolution to pendent layers [38]. A simple Python code for
1, before the completely linked layer. MobileNet is given below.

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.layers import Flatten,Dropout,SpatialDropout2D
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=10,
restore_best_weights=True, verbose=1)

base_model = tf.keras.applications.MobileNet(
alpha = 0.75,
include_top=False,
weights="imagenet",

input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

model = Sequential()
model.add(base_model)

model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])

Applications of Artificial Intelligence in Medical Imaging


40 1. Introduction to artificial intelligence techniques for medical image analysis

# Train the model


history = model.fit(
Xtrain,
ytrain,
epochs=200,
validation_split= 0.25,
batch_size=10,
verbose=2,
callbacks=[early_stopping])
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

# plot training history


from matplotlib import pyplot
pyplot.plot(history.history[ 'loss'], label='train')
pyplot.plot(history.history[ 'val_loss'], label='test')
pyplot.legend()
pyplot.show()

1.4.16.5 Inception-v4 and Inception- Inception is tested with many variations. Only
ResNet two of them are discussed in detail here. The
The Inception architecture is highly adjust- first, “Inception-ResNet-v1,” essentially corre-
able, which means that the amount of filters in sponds to the computational cost of Inception-
the various layers can be changed without v3, while “Inception-ResNet-v2” corresponds
affecting the quality of the fully trained net- to the raw cost of the recently proposed
work. To balance the calculation among the Inception-v4 network. Another minor techni-
numerous model subnetworks and improve cal difference between residual and nonresi-
training speed, the layer sizes are adjusted dual Inception variations is that batch
carefully. For Inception-v4, identical selections normalization is utilized only on top of the
are made for the Inception blocks across all standard layers, not on top of the summations,
grid sizes in order to get rid of unnecessary in the case of Inception-ResNet. Although it is
baggage. Cheaper Inception blocks are rational to predict that extensive usage of
employed for the residual versions of the batch normalization would be beneficial, it is
Inception networks than the original aimed to make each model replica trainable
Inception. Each Inception block is followed by on a single GPU. The overall number of
a filter-expansion layer that is used to increase Inception blocks is significantly increased by
the dimensionality of the filter bank before eliminating the batch normalization on top of
adding it to meet the depth of the input crite- those layers. The more usage of computer
ria. This is required to compensate for the resources will eliminate the need for this
reduction in dimensionality caused by the trade-off [39]. A simple Python code for
Inception block. The residual version of Inception-ResNet-V2 is given below.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 41
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.layers import Flatten,Dropout,SpatialDropout2D
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=10,
restore_best_weights=True, verbose=1)

base_model = tf.keras.applications.InceptionResNetV2(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

model = Sequential()
model.add(base_model)

model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
model.add(Dropout(02))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy'
metrics=['acc'])

Applications of Artificial Intelligence in Medical Imaging


42 1. Introduction to artificial intelligence techniques for medical image analysis

# Train the model


history = model.fit(
Xtrain,
ytrain,
epochs=200,
validation_split= 0.25,
batch_size=10,
verbose=2,
callbacks=[early_stopping])
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

# plot training history


from matplotlib import pyplot
pyplot.plot(history.history[ 'loss'], label='train')
pyplot.plot(history.history[ 'val_loss'], label='test')
pyplot.legend()
pyplot.show()

1.4.16.6 Xception Xception architecture contains 36 convolutional


Chollet [40] suggested a depthwise separable layers that serve as the network’s feature extrac-
CNN model based purely on convolution layers. tion foundation. Because image classification is
In fact, the mapping of cross-channel and spatial the only focus of the experimental assessment,
correlations in CNN feature maps may be the suggested convolutional base will be fol-
completely dissociated. As this theory is a more lowed by a logistic regression layer. Before the
powerful version of the hypothesis that under- logistic regression layer, fully connected layers
pins the Inception design, and the suggested can be added if desired. The 36 convolutional
architecture is termed Xception (short for layers are divided into 14 modules, each of
“Extreme Inception”). Based on this stronger which is surrounded by linear residual connec-
premise, an “extreme” version of an Inception tions, except for the first and last modules. In a
module would first utilize a 1 3 1 convolution to nutshell, the Xception architecture is a linear
map cross-channel correlations and then individ- stack of depthwise detachable convolution layers
ually map the spatial correlations of each output with residual links. This allows the architecture
channel. This extreme version of an Inception to characterize and reconfigure very simply; it
module is almost equivalent to a depthwise sep- only takes 3040 lines of code to define and
arable convolution, which is a neural network modify using a high-level library such as Keras
building procedure that has gained popularity as or TensorFlow, similar to architectures such as
its adoption in the TensorFlow framework. In VGG-16, but dissimilar architectures such as
deep learning frameworks such as TensorFlow Inception V2 or V3 that are far more complicated
and Keras, a depthwise separable convolution to describe [40]. A simple Python code for
consists of a depthwise convolution. The Xception is given below.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 43
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.layers import Flatten,Dropout,SpatialDropout2D
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=10,
restore_best_weights=True, verbose=1)

base_model = tf.keras.applications.Xception(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

model = Sequential()
model.add(base_model)

model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))

model.add(Dropout(02))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])

Applications of Artificial Intelligence in Medical Imaging


44 1. Introduction to artificial intelligence techniques for medical image analysis

# Train the model


history = model.fit(
Xtrain,
ytrain,
epochs=200,
validation_split=0.25,
batch_size=10,
verbose=2,
callbacks=[early_stopping])
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

# plot training history


from matplotlib import pyplot
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

1.4.16.7 Densely connected convolutional layers, which vary the size of feature maps, are
networks an important component of convolutional net-
One benefit of ResNets is that the gradient works. In this architecture, the network is par-
can flow straight via the identity function from titioned into numerous densely linked dense
later layers to previous levels. Yet, the identity blocks to assist downsampling. Layers between
function and the output of H0 are merged blocks are referred to as transition layers
through summing, that might restrict informa- because they perform convolution and pooling.
tion flow in the network. A novel connectivity In testing, a batch normalization layer is
pattern is suggested with direct connections employed with a 1 3 1 convolutional layer, and
from any layer to all following layers, in order a 2 3 2 average pooling layer as transition
to increase information flow between layers layers. DenseNet can have very small layers,
even further. This network design is known as which distinguishes it from conventional net-
a dense convolutional network (DenseNet) due work topologies [41]. A simple Python code for
to its dense connectivity. Downsampling DenseNet121 is given below.

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 45
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.layers import Flatten,Dropout,SpatialDropout2D
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
early_stopping = EarlyStopping(monitor='val_loss', mode='min', patience=10,
restore_best_weights=True, verbose=1)

base_model = tf.keras.applications.DenseNet121(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

model = Sequential()
model.add(base_model)

model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128,activation='relu'))
model.add(Dropout(02))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',
metrics=['acc'])

Applications of Artificial Intelligence in Medical Imaging


46 1. Introduction to artificial intelligence techniques for medical image analysis

# Train the model


history = model.fit(
Xtrain,
ytrain,
epochs=200,
validation_split=0.25,
batch_size=10,

verbose=2,
callbacks=[early_stopping])
# Test on unseen data
results = model.evaluate(Xtest, ytest)
# Pront the results
print('Final test set loss: {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

# plot training history


from matplotlib import pyplot
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

1.4.16.8 Feature extraction with pretrained so widespread that training is almost never ini-
models tiated from scratch [35].
Pretrained CNNs from publicly available Many deeper architectures with feed-
resources such as ImageNet are frequently forward topologies include numerous layers
available for usage in different applications and where consecutive transformations of the pre-
datasets. This is accomplished by retaining the ceding layer’s inputs lead to progressively
majority of the pretrained weights in the neural complex data representations. In the output
network with the exception of the final classifi- layer, properly transformed feature interpreta-
cation layer. The final classification layer’s tions are more susceptible to basic sorts of
weights are determined by the dataset at hand. predictions. The nonlinear activations in inter-
The last layer must be trained since the class mediate layers are responsible for this level of
labels in a given situation may differ from those sophistication. Tanh and sigmoid activation
in ImageNet. Nonetheless, the weights in the functions were traditionally the most common
early layers are still valuable since they learn options in the hidden layers, but the ReLU
different sorts of shapes in images which can be activation has gained popularity in recent
used for nearly any type of classification appli- years due to the attractive attribute, which is
cation. In addition, feature activations in the better at avoiding the disappearing and burst-
penultimate layer can be employed for unsuper- ing gradient difficulties. One way to look at
vised applications. It is worth noting that the the division of work between the hidden
usage of pretrained convolutional networks is layers and the final prediction layer is that the

Applications of Artificial Intelligence in Medical Imaging


1.4 Supervised learning 47
early layers provide a feature representation, data, the dimensionality of which is specified
which is more appropriate for the task at by the number of units in that layer. This
hand. This learnt feature representation is approach may be viewed as a type of hierar-
subsequently used by the final layer. The chical feature engineering, with features in
characteristics learned in the hidden layers are earlier layers representing rudimentary data
mostly generalizable to different datasets and qualities and those in later layers representing
problem settings in the same domain. This complicated data attributes having semantic
characteristic may be used in a variety of meaning to the class labels. Transfer learning
ways by simply replacing a pretrained net- is another term for the method of employing
work’s output node(s) with a new application- pretrained models [35]. A DenseNet121 deep
specific output layer for problem at hand and feature extraction example is given below. For
the dataset. Each hidden layer produces a the details check the machinelearningmastery.
transformed feature representation of the com web page3

# Example of using the DenseNet121 model as a feature extraction model


(Adapted from machinelearningmastery.com)
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.densenet import preprocess_input
from keras.applications.densenet import decode_predictions
from keras.applications.densenet import DenseNet121
from keras.models import Model
from pickle import dump
# load an image from file
image = load_img('BrainMRI.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the DenseNet121 model
image = preprocess_input(image)
# load model
model = DenseNet121()
# remove the output layer
model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
# get extracted features
features = model.predict(image)
print(features.shape)

3
https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-
models/.

Applications of Artificial Intelligence in Medical Imaging


48 1. Introduction to artificial intelligence techniques for medical image analysis

References [16] A. Quatrehomme, I. Millet, D. Hoa, G. Subsol, W.


Puech, Assessing the classification of liver focal lesions
[1] E. Alpaydin, Introduction to Machine Learning, MIT by using multi-phase computer tomography scans,
press, 2014. Medical Content-Based Retrieval for Clinical Decision
[2] A. Subasi, Practical Machine Learning for Data Support, Springer, 2012, pp. 8091.
Analysis Using Python, Academic Press, 2020. [17] S.G. Mougiakakou, I.K. Valavanis, N.A.
[3] P. Kulkarni, Reinforcement and Systemic Machine Mouravliansky, A. Nikita, K.S. Nikita, DIAGNOSIS: a
Learning for Decision Making, 1, John Wiley & Sons, 2012. telematics-enabled system for medical image archiv-
[4] B.C. Patel, G. Sinha, Abnormality detection and classi- ing, management, and diagnosis assistance, IEEE
fication in computer-aided diagnosis (CAD) of breast Trans. Instrum. Meas. 58 (7) (2009) 21132120.
cancer images, J. Med. Imaging Health Inform. 4 (6) [18] B. Gopinath, N. Shanthi, Computer-aided diagnosis
(2014) 881885. system for classifying benign and malignant thyroid
[5] V. N. P. Raj, T. Venkateswarlu, Denoising of medical nodules in multi-stained FNAB cytological images,
images using undecimated wavelet transform, 2011, Australas. Phys. Eng. Sci. Med. 36 (2) (2013)
pp. 483488. 219230.
[6] D.-H. Trinh, M. Luong, F. Dibos, J.-M. Rocchisani, C.- [19] S.S. Kumar, R.S. Moni, J. Rajeesh, An automatic
D. Pham, T.Q. Nguyen, Novel example-based method computer-aided diagnosis system for liver tumours on
for super-resolution and denoising of medical images, computed tomography images, Comput. Electr. Eng.
IEEE Trans. Image Process. 23 (4) (2014) 18821895. 39 (5) (2013) 15161526. Available from: https://doi.
[7] J.E. Fowler, The redundant discrete wavelet transform org/10.1016/j.compeleceng.2013.02.008. Jul.
and additive noise, IEEE Signal. Process. Lett. 12 (9) [20] A. Vasuki, S. Govindaraju, Deep neural networks for
(2005) 629632. image classification, Deep Learning for Image
[8] A. Choubey, G. Sinha, S. Choubey, A hybrid filtering Processing Applications, 31, IOS Press, 2017, p. 27.
technique in medical image denoising: Blending of neural [21] S. Shalev-Shwartz, S. Ben-David, Understanding
network and fuzzy inference, in: IEEE 3rd International Machine Learning: From Theory to Algorithms,
Conference on Electronics Computer Technology. Cambridge University Press, 2014.
Kanyakumari, India, 2011, vol. 1, pp. 170177. [22] K.P. Murphy, Machine Learning: A Probabilistic
[9] H.R. Sheikh, A.C. Bovik, Image information and visual Perspective, MIT press, 2012.
quality, IEEE Trans. Image Process 15 (2) (2006) 430444. [23] S. Theodoridis, A. Pikrakis, K. Koutroumbas, D.
[10] D. Bhonsle, G. Sinha, V.K. Chandra, Medical image Cavouras, Introduction to Pattern Recognition: A
de-noising using combined Bayes shrink and total var- MATLAB Approach, Academic Press, 2010.
iation techniques, Artificial Intelligence and Machine [24] S. Tatiraju, A. Mehta, Image segmentation using k-
Learning in 2D/3D Medical Image Processing, CRC means clustering, EM and normalized cuts, Dep.
Press, 2020, pp. 3152. EECS 1 (2008) 17.
[11] B.I. Choi, The current status of imaging diagnosis of [25] X. Zheng, Q. Lei, R. Yao, Y. Gong, Q. Yin, Image
hepatocellular carcinoma, Liver Transpl. 10 (S2) (2004) segmentation based on adaptive K-means algo-
S20S25. rithm, EURASIP J. Image Video Process 2018 (1)
[12] R.B. Freeman, et al., Optimizing staging for hepatocel- (2018) 68.
lular carcinoma before liver transplantation: a retro- [26] J. Gaura, E. Sojka, M. Krumnikl, Image segmentation
spective analysis of the UNOS/OPTN database, Liver based on k-means clustering and energy-transfer prox-
Transpl. 12 (10) (2006) 15041511. imity, in: ISVC’11: Proc. 7th International Conference
[13] K. Rawal, G. Sethi, D. Ghai, Medical imaging in on Advances in Visual Computing - Volume Part II,
healthcare applications, Artificial Intelligence and 2011, pp. 567577.
Machine Learning in 2D/3D Medical Image [27] J. Han, J. Pei, M. Kamber, Data Mining: Concepts and
Processing, CRC Press, 2020, pp. 97106. Techniques, Elsevier, 2011.
[14] G. Sethi, B.S. Saini, D. Singh, Segmentation of cancerous [28] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone,
regions in liver using an edge-based and phase congru- Classification and Regression Trees, Hall/CRC, Boca
ent region enhancement method, Comput. Electr. Eng. Raton, 1984.
53 (2016) 244262. Available from: https://doi.org/ [29] K. Gra˛bczewski, , 1 Meta-Learning in Decision Tree
10.1016/j.compeleceng.2015.06.025. Jul. Induction, Springer, 2014.
[15] G. Sethi, B. Saini, Segmentation of abdomen diseases [30] I.H. Witten, E. Frank, M.A. Hall, C.J. Pal, Data Mining:
using active contour models in CT images, Biomed. Practical Machine Learning Tools and Techniques,
Eng. Appl. Basis Commun. 27 (05) (2015) 1550047. Morgan Kaufmann, 2016.

Applications of Artificial Intelligence in Medical Imaging


References 49
[31] M. Hall, I. Witten, E. Frank, Data Mining: Practical [37] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning
Machine Learning Tools and Techniques, Kaufmann. for image recognition, 2016 IEEE Conference on
Burlingt. (2011). Computer Vision and Pattern Recognition (CVPR),
[32] T. Chen, C. Guestrin, Xgboost: a scalable tree boosting 2016, pp. 770778.
system, 2016, pp. 785794. [38] A.G. Howard, et al., Mobilenets: efficient convolu-
[33] S. Bhattacharya, et al., A novel PCA-firefly based tional neural networks for mobile vision applications,
XGBoost classification model for intrusion detec- ArXiv Prepr. ArXiv170404861 (2017).
tion in networks using GPU, Electronics 9 (2) [39] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi,
(2020) 219. Inception-v4, inception-ResNet and the impact of
[34] J.D. Kelleher, Deep Learning, Mit Press, 2019. residual connections on learning, 2017, vol. 31, no. 1.
[35] C.C. Aggarwal, Neural Networks and Deep Learning, [40] F. Chollet, Xception: deep learning with depthwise
Springer, 2018. separable convolutions, 2017, pp. 12511258.
[36] K. Simonyan, A. Zisserman, Very deep convolutional [41] G. Huang, Z. Liu, L. Van Der Maaten, K.Q.
networks for large-scale image recognition, ArXiv Weinberger, Densely connected convolutional net-
Prepr. ArXiv14091556 (2014). works, 2017, pp. 47004708.

Applications of Artificial Intelligence in Medical Imaging


This page intentionally left blank
C H A P T E R

2
Lung cancer detection from
histopathological lung tissue images using
deep learning
Aayush Rajput1 and Abdulhamit Subasi2,3
1
Indian Institute of Technology, Kharagpur, India 2Institute of Biomedicine, Faculty of Medicine,
University of Turku, Turku, Finland 3Department of Computer Science, College of Engineering,
Effat University, Jeddah, Saudi Arabia

O U T L I N E

2.1 Introduction 51 2.4.2 Dimension reduction 59


2.4.3 Prediction and classification 59
2.2 Literature review 53
2.4.4 Experimental data 61
2.3 Artificial intelligence models 55 2.4.5 Performance evaluation measures 61
2.3.1 Artificial neural networks 55 2.4.6 Experimental results 62
2.3.2 Deep learning 56
2.5 Discussion 72
2.3.3 Convolutional neural networks 56
2.6 Conclusion 72
2.4 Lung cancer detection using artificial
intelligence 58 References 72
2.4.1 Feature extraction using deep learning 58

2.1 Introduction (trachea), then air is divided through smaller


pipes known as bronchi, further, it is divided
Cancer is the term used for a collection of into finer tubes called bronchioles. Alveoli are
similar types of diseases in which the tissues of the tiny sacs present at the ending of bronch-
a body organ start growing abnormally. When ioles. Lung cancer starts mainly from the
we inhale air, it goes through the windpipe bronchioles, alveoli, and bronchi. Lung cancer

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00008-6 51 © 2023 Elsevier Inc. All rights reserved.
52 2. Lung cancer detection from histopathological lung tissue images using deep learning

is a deadly disease with a 5-year survival rate during that phase. Thus by quitting smoking,
of (18.6%) lower than other cancers such as person can reduce the danger of lung cancer to a
breast and prostate [1]. It is the most common large extent. Keeping a minimum distance from
type of cancer, specifically in the United States, the people diagnosed with cancer is also a way
responsible for 154,050 deaths there alone, to protect ourselves from the cancer-causing
around a quarter of all the deaths are due to agents. Taking a balanced and healthy diet also
cancer in the country [2]. Lung cancer can be helps in reducing the risk factor of lung cancer.
broadly classified into nonsmall-cell lung can- But the reduction in risk factors is way less than
cer (NSCLC) and small-cell lung cancer the increase in it due to smoking. So, first of all,
(SCLC). The NSCLC is the most common type quitting smoking should be the first step.
of lung cancer, which is further classified However, the risk from lung cancer cannot be
into three main categories: adenocarcinoma, zero for anyone as there are instances of people
squamous-cell carcinoma, and large-cell carci- having lung cancer who do not have any high-
noma. The adenocarcinoma begins from the risk factors. The early detection of lung cancer is
mucus-producing cell. Adenocarcinoma is found the most common factor among all the successful
in the outer part of the lungs and can be detected lung cancer treatments. Some types of lung can-
before its spread. It is a common type in other cer start showing the symptoms in the early
types of cancer such as breast, colorectal, and stage of its development, through which it can
prostate. The second one, squamous-cell carci- be detected. The symptoms of lung cancer are
noma, begins its growth from the squamous cells. most likely to be caused by other reasons, but it
It is primarily found in the people who smoke. In is essential to see the doctor so that if lung cancer
the lungs, it affects the central part near the bron- is causing the symptoms, it can be detected and
chus. The third type, large-cell carcinoma cancer, treated in its early phases. The most common
does not involve a particular area in the lungs. It symptoms of lung cancer include a last-longing
can grow in any region and is difficult to diag- cough, rust color phlegm, blood coughing, loss
nose with a faster growth, which makes it more of appetite, breath shortness, the feeling of being
complex to treat. NSCLC has some other types, tired, chest pain, persistent infections such as
but they are not common compared to these pneumonia, and wheezing. Widespread lung
three types, for example, adenosquamous carci- cancer can also affect the other parts of the body,
noma and sarcomatoid carcinoma. SCLC covers which can also cause some symptoms such as
only 10%15% of all lung cancers [3]. bone pain, headache, balance problems or sei-
There are cases in which cancer starts from zures, and yellowing of skin and eyes [5]. These
one part of the body and, after spreading, affects all result from lung cancer spreading to the other
another part of the body, but the cancer type will parts such as the liver and nervous system.
be based on the organ that is affected the first. There are several tests a doctor can use for
For example, breast cancer can sometimes affect testing lung cancer. Tests include imaging
the lungs, but it will not be called lung cancer tests, sputum cytology, and biopsy [6]. An
and the treatment will be to cure breast cancer. X-ray image is used in an imaging test to detect
Till now, no one knows how to prevent the dan- the abnormal mass present in the lung tissues.
ger of lung cancer completely. But there are CT scan can be used to see the more minor
some ways through which the risk factor of can- details which X-ray might not notice. In spu-
cer can be reduced. The first and foremost way tum cytology, the sputum is used and
is to avoid smoking [4]. In the early stage of lung observed under the microscope. The sputum of
cancer, the lung tries to repair the tissue, but the a person with lung cancer contains lung
lung will not improve if the person smokes cancer cells that can be detected under the

Applications of Artificial Intelligence in Medical Imaging


2.2 Literature review 53
microscope. A sample of abnormal cells is members and friends of the patient must help the
taken from the potentially infected areas for patient emotionally. Deep learning (DL) is a sub-
diagnosis in a tissue sample or biopsy. field of artificial intelligence (AI) that enables com-
The treatment of lung cancer also has some puters to make decisions based on the data. DL
possible side effects. Different persons react in can speed up the process of lung cancer detection
different ways to the treatment process. through images to a vast extent. Transfer learning
The person should be fully prepared for these is a technique through which the prediction accu-
effects before the treatment. Some of the side racy can be improved to a greater extent. The pre-
effects of the treatment process are anemia as trained DL models trained on a huge dataset for
the chemotherapy lowers blood counts, nau- weeks can be directly used through transfer learn-
sea, vomiting, diarrhea, constipation, hair ing without wasting time in its training.
loss, changes in sexual functioning, and effects In this chapter, we have used different DL
on fertility [7]. The most severe side effects are techniques and combined transfer learning with
lowering white blood cell counts, increasing classical machine learning models such as sup-
the infection, and heart disfunctioning. port vector machine (SVM), Random Forest,
Patients who have a current heart problem or k-nearest neighbors (kNN), XGBoost, AdaBoost,
a history of heart problems should consult the and others to make it possible model to give
doctor about it to avoid further complications. accurate results. Different metrics are used so
Palliative or supportive care is a way to ease that the actual performance of the model can
the path of treatment for the patient and fam- be known. The images are passed through the
ily. The care is not for treating specifically pretrained models, and the essential features are
lung cancer; instead, it can be given to anyone extracted and trained the classical machine learn-
of any age and stage of cancer. It is provided ing models on these features.
along with the curative treatment. This care
improves the quality of life of the patient and
provides an extra shield of protection.
The process of biopsy is often time-consuming 2.2 Literature review
and required very skilled practitioners or pathol-
ogists for the detection of the type of lung cancer. Atsushi et al. [8] used the deep convolutional
The human detection of the kind of lung cancer neural network model for automating the pro-
sometimes leads to a wrong prediction which can cess of lung cancer detection using the patch-
cause a massive cost to the patient in the form of based segmentation of malignant regions in
side effects and financial burden, and sometimes images. The images were taken from the camera
even life. People suffered from it have a possibil- attached with the microscope of a 40 3 objective
ity of cancer coming back. Doctors advise regular lens. Original images were of 224 3 224-pixel res-
check-ups and therapies such as radiotherapy olution. A total of 306 benign and 315 malignant
and chemotherapy to control and minimize the images were used to generate 60,000 patch
likelihood of cancer attacking back. The side images using data augmentation using techni-
effects of the treatment can last very long, and it ques such as flipping, rotation, filtering, and
is not easy for the patient to learn to live with color adjustment. The model used for the predic-
them. Any symptoms or difficulty should be tion was the deep convolutional neural network
immediately consulted with the doctor as soon model based on fine-tuned VGG16 model with
as possible to avoid further complications. three layers having 1024, 256, and 2 units repla-
Sometimes the patient can feel very depressed cing the fully connected layers of the original
facing all the complex situations. The family VGG16 model. This method achieved an area

Applications of Artificial Intelligence in Medical Imaging


54 2. Lung cancer detection from histopathological lung tissue images using deep learning

under the ROC curve (AUC) score of 0.872 for Chakravarthy et al. [11] used CT lung Dicom
the patch-based classification using augmented images from Lung Image Database Consortium
images. (LIDC) for the study. The MATLAB 2013a soft-
Ausawalaithong et al. [9] used DL and ware was used for the preprocessing. The images
transfer learning to classify lung cancer using were filtered by using the fast type of filter [12]
chest X-ray images. The proposed model pro- for reducing the noise present in the images.
vides a heat map for identifying the location of First, the grayscale lung image was transformed
the lung nodule. The dataset used in the study to binary image type by replacing the more sig-
was taken from more than one source. They nificant than a threshold value to 1 and the rest
used JSRT (Japanese Society of Radiology to 0. Then after the gray level co-occurrence
Technology) datasets consisting of 247 frontal matrices (GLCMs) feature extraction was done.
chest X-ray images with 154 images with lung Then chaotic crow search algorithm (CCSA) was
nodules and 93 images without lung nodules. used to select the GLCM features. A probabilistic
All images have a resolution of 2048 3 2048 neural network was used for training on the
pixels. The data was also taken from the Chest selected features by CCSA. The study results
X-ray14 Dataset containing 112,120 frontal achieved the score of 95%, 85%, 90%, 86.36%, and
chest X-ray images. Every image has a resolu- 94.44% for sensitivity, specificity, accuracy, preci-
tion of 1024 3 1024 pixels. However, the data- sion, and negative predictive value, respectively.
set does not contain any lung cancer images. Sasikala et al. [13] used the CNN for the clas-
This dataset was used to compensate the lung sification of tumors as malignant or benign. The
cancer data that has only 100 cases by first images used were chest CT images taken from
training to recognize nodules, using nodule the LIDC and Image Database Resource
cases as positive and all remaining cases as Initiative (IDRI) [14], consisting of 1000 scans of
negative. The 121-layered densely connected malignant and benign lung tumors. The lung
convolution network (DenseNet121) was used region was first extracted from the images, and
to replace the last fully connected layer with a then slices were segmented to get the tumor
single sigmoid node to get the output probabil- region. In the preprocessing, the median filter
ity. The transfer learning was applied twice in was used. Backpropagation algorithms train the
the study, first for classification as “with nod- model with a Rectified Linear Unit (ReLU) acti-
ule” or “without nodule” and second for vation function and a softmax layer as the out-
classification as “with malignant nodule” and put layer. The data normalization step was not
“without malignant nodule.” The model has performed on the images while giving input to
given the accuracy score, specificity, and sensi- the model for training. The whole study was
tivity score of 74.43% 6 6.01%, 74.96% 6 9.85%, done using MATLAB software. The model gave
and 74.68% 6 15.33%. specificity, sensitivity, and accuracy scores of 1,
Hatuwal et al. [10] used the images from 0.875, and 0.96, respectively.
LC25000 lung histopathological images. Serj et al. [15] also used CNN for lung can-
Images were resized to 180 3 180-pixel resolu- cer diagnosis. They proposed a new DL model
tion, and the pixels were normalized. They for better accuracy and low variance in binary
trained a convolutional neural network (CNN) classification tasks. The dataset used was taken
model having a neural network with three hid- from the Kaggle Data Science Bowl 2017
den layers, an input, and one fully connected (KDSB17) [16]. The proposed model has two
layer. Their model achieved a training accuracy max-pooling layers, a fully body convolution
score and validation accuracy score of 96.11% layer, and one fully connected layer with two
and 97.2%, respectively. softmax units. The input size of the image used

Applications of Artificial Intelligence in Medical Imaging


2.3 Artificial intelligence models 55
was 120 3 120 pixels. The activation function 2.3 Artificial intelligence models
used was the ReLU to produce output. The
loss function used was cross-entropy which 2.3.1 Artificial neural networks
aims to maximize the multinomial logistic
regression objective. Sensitivity, specificity, An artificial neural network (ANN) is a sim-
and F1 metrics were used to evaluate the per- ple deep network closely representing the
formance of the model. The complete process arrangement of neurons inside the human
was completed using MATLAB software, and brain. In the linear model, the input layer is
the result obtained sensitivity, specificity, and directly mapped to the output layers through a
F1 scores of 0.87, 0.991, and 0.95, respectively. simple affine transformation. Still, the linear
Masood et al. [17] proposed a computer- model has the problem of underfitting, and
assisted diagnostic system for lung cancer often their performance is not very good with
detection using a novel DL-based model. They the complex data. So, to avoid the limitations
used the DFCNet [18] model based on the of the linear model, extra stacks of hidden
deep, fully CNN and CNN to classify detected layers are introduced into the input and output
nodules into four lung cancer stages. The com- layer of a linear model. This architecture is
parison was made with other existing CNN called ANN or multilayer perceptron [41]. In
techniques. They proposed an IoT-based ANN, the initial layer is the input layer. Each
approach in which sensors are used to collect unit or neuron is connected to every unit of the
the information from the patient body. The next layer, and every layer except the initial
data from sensors is then collected through a layer is also connected to each unit of the pre-
computer and then fed to the model to classify vious layer. Every neuron of one layer influ-
the nodules. The images were resized to ences every neuron in the next layer in the
100 3 100 to input into the DFCNet with the fully connected layer. The cost of parameteriza-
Gabor filter for preprocessing and enhanced tion of ANN is very high.
pixels for segmentation of region of interest. The overall structure of an ANN consists of
The CT scan images of 18 patients from the three parts: input layer, hidden layers, and out-
Shanghai Hospital were taken out of which 11 put layer. Every layer also has a bias neuron
were used for training and rest for testing. The and is connected to every neuron of next layer.
LIDC-IDRI, RIDER, SPIE challenge dataset, ANN having more than or equal to 2 layers is
LUNA16, and Lung CT-Diagnosis dataset called a deep neural network (DNN). The first
were also used for evaluating the model. The method to train an ANN is described by
images were analyzed patch wise. The CNN Rumelhart et al. [19] in the article proposing
architecture includes 7 convolution layers the backpropagation algorithm. The complete
with Parametric ReLU activation function, 7 training data is passed to the model, and the
max-pooling layers, 7 batch normalization result is calculated for each instance, then error
layers, and 2 dense layers having Leaky between predicted and true result is calculated.
Rectified Linear Units. All layers used 3 3 3- This initializes the backpropagation step. The
sized filters except the last three convolutional contribution to error due to each neuron is cal-
layers, 7 3 7-sized filter was used for them. culated, and the sum of error for every instance
The main model performance was evaluated of data is minimized. For the proper working
by the dice score. The result of achieved of the backpropagation algorithm, activation
median dice score was 91.34% and accuracy functions are used. Popular activation func-
score of CNN and DFCNet score was 77.6% tions include hyperbolic tangent function
and 84.58%. tanh, logistic function, and ReLU function.

Applications of Artificial Intelligence in Medical Imaging


56 2. Lung cancer detection from histopathological lung tissue images using deep learning

The output range in tanh function is from 21 another classical machine learning model for
to 1, that’s why the output from it is normal- better results. One of the most significant
ized [42]. In the logistic function, the output is advantages of DL is that the DL model can
from 0 to 1, while in the ReLU function, the also be trained using analog data such as
output is continuous. Practically ReLU is faster images with pixel and audio files. The techni-
to compute than the other two mentioned acti- ques used in DL are achieving great success.
vation functions. Tasks that take a lot of time to complete
The ANN is often used for the classifica- through human efforts can be done in little
tion task where the n neurons in the output time using DL techniques. DL is making a
layer represent n classes, and the value of significant amount of contribution toward
each neuron represents the probability class. the betterment of the quality of human life.
The activation used for the output layer will Many companies use DL to show personal-
be softmax. The ANN model can detect the ized recommendations to the user, making
complex features present in the data; hence, the user experience better and making huge
it is more helpful than other models. It has profits. DL makes the training and prediction
been observed that an ANN with only one of the model with significantly less manual
hidden layer with more neurons can model effort. After training the whole process
the most complex functions. One of the becomes completely automatic.
essential features of using DNN is that a pre-
trained model can be used as the initial layers
of another model with the same weights to
extract the input features and thus make the
2.3.3 Convolutional neural networks
model’s training faster and perform better. In CNNs have emerged from the study of the
DL, the last part of the convolution neural brain’s visual cortex. They are used for image rec-
network consists of the ANN layer [43]. ognition since the 1980s. In 1989 LeCun et al. [20]
proposed a method to recognize the handwritten
digits using CNN. With the increase in computa-
tional power and data, CNN can perform very
2.3.2 Deep learning well on image recognition tasks. Even for com-
DL is a subset of machine learning. plex tasks, CNNs can be trained to get fast and
Nowadays, computers are fast enough to accurate results. CNN is powering image search
train these vast networks. A large amount of services, self-driving cars, and more.
data is used to train DL models and increase The most basic block of CNN is the convolu-
the performance of the model. Generally, tion layer. Unlike ANN in CNN, each neuron
with the increase in the quantity of data, the of a layer is not connected to every neuron of
performance of the model increases. DL is the next layer. They are connected to only the
used in many fields such as face recognition, neurons of their respective fields. Through this
speech recognition, etc. In the medical field, architecture, CNN can concentrate on low-level
the use of DL is increasing very rapidly [44]. features in the first hidden layer, and then the
Practitioners use the DL model for the detec- last layers are concentrated on high-level fea-
tion of tumors from X-rays or CT scan tures. This is the reason why CNN performs
images. DL models can automatically extract the task of image recognition so well. A neuron
features from the data, which can train located in the ith row and jth column of a layer

Applications of Artificial Intelligence in Medical Imaging


2.3 Artificial intelligence models 57
is connected to the previous layer neuron pres- using the convolutional neural network is that it
ent in rows i to i 1 fh1 and columns j to requires a huge memory in RAM during train-
j 1 fw1 where fh and fw are the height and ing. During the backpropagation step, all the
width of the respective field. For a layer to intermediate values calculated during the for-
have the same height and width as previous warding pass are required. To shirk, the output
layer, zeros are added around the layer, known of a convolutional layer pooling layer is used.
as zero padding. To connect a large convolu- The pooling is done to reduce the computational
tion layer to a small convolution layer, the dis- load, several parameters, and memory usage.
tance between the respective fields is Reducing also makes the neural network tolerate
increased, called strides. The weights in CNN image shift. In pooling layer, a neuron is con-
are represented using the matrix known as fil- nected to fixed limited neurons of the previous
ters. During the training of CNNs, the weights layer. It does not have any parameters. To define
of these filters are changed so that the predic- a CNN, its size, stride, padding type, and pool-
tion error is reduced. The filters detect the fea- ing layer are defined manually. A typical CNN
ture from the image and transmit it to the architecture has a few convolution layers with
subsequent layers. A good CNN model can any activation function such as ReLU followed
detect even the tiny features present in the by a pooling layer, then repeating the same. As a
image. So basically, the training of CNN aims CNN gets deeper and deeper, the size of the
to make the weights of a filter such that they CNN layer decreases, and the number of chan-
can detect all the features present in an image. nels of the convolutional layer increases [46].
CNN, which can detect most features, gives Some pretrained CNN architectures are trained
accurate results [45]. The general framework on the high-speed processor for weeks and are
for a CNN model is given in Fig. 2.1. very good at extracting features from images.
Many convolution layers are stacked in to They can be directly used without training for
make the CNN able to detect the features present any image classification task. The process is
in the image. The final layer of the CNN layer is called transfer learning [47]. Alex Net (2012),
flattened (not in a fully connected neural net- Google Net (2014), and ResNet (2015) are some
work) so that the ANN can be attached to the examples of pretrained CNN models and win-
model and the last layer is the softmax layer in ners of the ILSVRC ImageNet challenge. A sim-
the classification task. However, limitation of ple Python for CNN model is given below.

Benign Tissue

Adenocarcinoma

Squamous Cell
Carcinoma

Histopathological CNN Model


Lung Tissue

FIGURE 2.1 The general framework for a CNN model. CNN, Convolutional neural network.

Applications of Artificial Intelligence in Medical Imaging


58 2. Lung cancer detection from histopathological lung tissue images using deep learning

from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout, Input


from keras.layers.normalization import BatchNormalization
from keras.models import Model, Sequential
inp = Input(shape = (128, 128, 3))
model = BatchNormalization()(inp)
model = Conv2D(filters = 64, kernel_size = (3, 3), padding = 'same', activation='relu')(model)
model = BatchNormalization()(model)
model = Conv2D(filters = 128, kernel_size = (3, 3), padding = 'same', activation='relu')(model)
model = MaxPooling2D()(model)
model = Dropout(0.2)(model)
model = Conv2D(filters = 128, kernel_size = (3, 3), padding = 'same', activation='relu')(model)
model = MaxPooling2D()(model)
model = Dropout(0.2)(model)
model = Conv2D(filters = 64, kernel_size = (3, 3), padding = 'same', activation='relu')(model)
model = MaxPooling2D()(model)
model = Dropout(0.2)(model)
model = Flatten()(model)
output = Dense(units = len(train_it.class_indices), activation = 'softmax')(model)
model = Model(inputs=inp, outputs=output)

2.4 Lung cancer detection using artificial


intelligence extraction is a technique used for dimensionality
reduction. In fields other than image processing
2.4.1 Feature extraction using deep learning the feature extraction is also used. Features can
also be extracted from the image without using
As described in the previous section, pre- the pretrained model. Each image is stored in the
trained CNN models can be used to extract the computer’s memory in the form of pixels. Each
feature on any dataset to improve the perfor- image is a three-dimensional array with each
mance of our CNN model. Using feature extrac- value in the range of 0255, generally three chan-
tion through transfer learning makes classification nels. The first value shows the concentration of
tasks for even a model with few parameters to red, the second of green, and the last of blue. To
achieve a very accurate result. This is the tech- reduce the dimension of the image, the average
nique used to train models in our experiments. values of the image’s channel can be taken. Image
Each filter has the weights which are used to can be loaded in grayscale also to reduce the
extract a particular feature from the image. After dimension of data without losing any critical
extracting the feature using pretrained models, information. Preweighted filters can be applied to
the data for training reduces to a significantly an image to extract information about the edges
smaller size. Now the training of the model can and sudden color changes. The simple Python
be done in significantly less time. So feature code for deep feature extraction is given below.

def get_features(base_model, train, validate):


X_train = base_model.predict(train)
y_train = train.classes

X_val = base_model.predict(validate)
y_val = validate.classes

X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size = 0.5,


shuffle=True)
return (X_train, X_val, X_test, y_train, y_val, y_test)

Applications of Artificial Intelligence in Medical Imaging


2.4 Lung cancer detection using artificial intelligence 59

2.4.2 Dimension reduction accurate prediction process has to be repeated


multiple times. Another technique used for
Sometimes the data used for training the dimensionality reduction is LLE. It does not
machine learning models have a lot of features. use the projection of data on a hyperplane like
It is cumbersome to deal with the data with the previous algorithm; instead, it computes
millions of features, and not everyone has how each training instance is linearly related
enough resources such as RAM to training a to the closest neighbors. It reduces the dimen-
model on such a large dataset. The training sions where the relation with its closest neigh-
time also increases very drastically while using bors is maintained. This technique can also
such a large dataset for training. To deal with be implemented using Scikit-Learn’s class
this problem, various methods are used for LocallyLinearEmbedding [26]. A simple Python
dimensionality reduction, including principal code for PCA is given below.
component analysis (PCA) [21], locally linear
embedding (LLE) [22], and generalized dis-
criminant analysis [23]. The most commonly from sklearn.decomposition import PCA
used method is PCA. The basic principle of pca = PCA(n_components=7000)
X_train = pca.fit_transform(X_train)
PCA is that the hyperplane which lies closest X_val = pca.transform(X_val)
to the data point is selected, and data is then model.fit(X_train, y_train)
projected on the hyperplane. After projecting val_pred = model.predict(X_val)
the data on the selected hyperplane, the result-
ing data should have the variance as significant
as possible. The data with large variance is less
likely to lose the information [48].
Another way to choose the best hyperplane
2.4.3 Prediction and classification
is to calculate the mean square error for every Machine learning classification is the task of
data point, and the plane having the lowest predicting the class in which any instance of
mean square error is selected. Scikit-Learn data belongs. Classification can have binary or
provides PCA [24,25] class to reduce data multiple classes. The classification task is a
dimension quickly. The user has to give the type of supervised learning in which the origi-
number of dimensions to which dimension of nal class or label is given, and various models
data should be reduced. After reduction, the are used to find the best mapping to data to
dimension data takes significantly less mem- the classes. For the classification task using
ory. After applying PCA on the data, data can ANNs, softmax layers are added as the output
be restored to the previous form. Still, data layer. The number of units is equal to the num-
will not be identified as data is lost while ber of classes in the input data. The value of
doing dimension reduction using PCA. There each unit shows the possibility of the data
are also some disadvantages to using dimen- instance being in that class—the sum of every
sion reduction like PCA to find a linear corre- unit value equal to one. There are two classifi-
lation between variables, which may not be cations: multiclass classification and multilabel
the case every time. PCA is not an excellent classification; in this case, the problem is multi-
choice to reduce the dimensionality of data class classification. In multiclass classification,
when mean and covariance cannot define the a particular instance can only belong to one
dataset. To decide the number of features for class, while in multilabel classification, one

Applications of Artificial Intelligence in Medical Imaging


60 2. Lung cancer detection from histopathological lung tissue images using deep learning

instance can belong to more than one class. such as multilabel decision trees, multilabel ran-
Example of multiclass classification includes dom forests, and multilabel gradient boosting, is
plant species classification and tumor detection. used for this task. Another approach to solving
Algorithms such as KNN [27], decision tree [28], the multilabel classification problem is to use dif-
Naı̈ve Bayes [29], and gradient boosting [30] can ferent models for each class to predict whether
be used for multiclass classification. The general the instance belongs to that class. In this way, the
framework is given in Fig. 2.2. multilabel classification becomes like a binary
In multilabel classification, the algorithms classification, and every model can be used for
used for multiclass classification cannot be used the task. A simple Python code for the classifica-
directly. The modified version of the algorithm, tion is given below.

from keras.models import Sequential


from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from xgboost import XGBClassifier

def get_models():
ANN = Sequential()
ANN.add(Dense(128, input_dim = X_train.shape[1], activation = 'relu'))
ANN.add(BatchNormalization())
ANN.add(Dropout(0.2))
ANN.add(Dense(64, activation='relu'))
ANN.add(Dense(32, activation='relu'))
ANN.add(Dense(16, activation='relu'))
ANN.add(Dense(8, activation='relu'))
ANN.add(Dense(len(train_it.class_indices), activation='softmax'))
ANN.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics =['accuracy'])

KNN = KNeighborsClassifier()

SVM = SVC(kernel = 'linear')

RF = RandomForestClassifier(n_estimators = 50)

ADB = AdaBoostClassifier()

XGB = XGBClassifier(n_estimators = 50, use_label_encoder=False)

return (ANN, KNN, SVM, RF, ADB, XGB)

def fit_model(model, X_train, y_train):


model.fit(X_train, y_train)
return model

Applications of Artificial Intelligence in Medical Imaging


2.4 Lung cancer detection using artificial intelligence 61

ANN
K-NN
SVM
RF
AdaBoost
XGBoost

Histopathological
Lung Tissue

Dimension
Deep Feature Extraction Classification
Reduction

FIGURE 2.2 The general framework for histopathological image classification using deep feature extraction.

2.4.4 Experimental data 2.4.5 Performance evaluation measures


The experimental data used in this study was Various evaluation metrics are available to
generated from an original sample of HIPAA track how well a model can fit the data and give
compliant and validated sources, consisting of accurate results. Accuracy is not always the best
750 total images of lung tissue (250 benign lung metrics to evaluate the performance of a machine
tissue, 250 lung adenocarcinomas, and 250 lung learning model. Sometimes, the model can predict
squamous-cell carcinomas) augmented to 15,000 every sample of data to one class and if the num-
using augmenter package. There are three classes ber of samples of that class is high, then its accu-
in the dataset, each with 5000 images, (1) lung racy score will also be high. To avoid such a
benign tissue, (2) lung adenocarcinoma, and (3) situation, evaluation metrics such as precision,
lung squamous-cell carcinoma. The dataset is recall, and F1 score are used. In this study, evalua-
available on the Kaggle as lung and colon cancer tion metrics calculate for each method are training
histopathological images [31,32]. Each image is accuracy, validation accuracy, test accuracy, F1
in jpeg file format and has a size of 768 3 768 score, Cohen Kappa score, recall, precision, and
pixels. For reducing the size of data, images are AUC score. F1 score and AUC score are generally
decreased to a size of 128 3 128 pixels. better evaluation metrics than accuracy score.

Applications of Artificial Intelligence in Medical Imaging


62 2. Lung cancer detection from histopathological lung tissue images using deep learning

from sklearn.metrics import f1_score, roc_auc_score, cohen_kappa_score, precision_score,


recall_score, accuracy_score, confusion_matrix
def get_accuracy_metrics(model, X_train, y_train, X_val, y_val, X_test, y_test):
print("Train accuracy Score------------>")
print ("{0:.3f}".format(accuracy_score(y_train, model.predict(X_train))*100), "%")
print("Val accuracy Score--------->")
val_pred = model.predict(X_val)
print("{0:.3f}".format(accuracy_score(y_val, val_pred)*100), "%")

predicted = model.predict(X_test)
print("Test accuracy Score--------->")
print("{0:.3f}".format(accuracy_score(y_test, predicted)*100), "%")

print("F1 Score--------------->")
print("{0:.3f}".format(f1_score(y_test, predicted, average = 'weighted')*100), "%")

print("Cohen Kappa Score------------->")


print("{0:.3f}".format(cohen_kappa_score(y_test, predicted)*100), "%")

print("Recall-------------->")
print("{0:.3f}".format(recall_score(y_test, predicted, average = 'weighted')*100), "%")

print("Precision-------------->")
print("{0:.3f}".format(precision_score(y_test, predicted, average = 'weighted')*100), "%")

cf_matrix_test = confusion_matrix(y_test, predicted)


cf_matrix_val = confusion_matrix(y_val, val_pred)

plt.figure(figsize = (12, 6))


plt.subplot(121)
sns.heatmap(cf_matrix_val, annot=True, cmap='Blues')
plt.title("Val Confusion matrix")

plt.subplot(122)
sns.heatmap(cf_matrix_test, annot=True, cmap='Blues')
plt.title("Test Confusion matrix")

plt.show()

2.4.6 Experimental results


and ResNet101 [33]. Classical machine learning
Different DL models are trained on the dataset, models used are ANN, KNN, SVM [40], Random
including the pretrained models. CNNs are also Forest, AdaBoost, and XGBoost).
used for training. Pretrained models are also used From the Table 2.1, it can be seen that pre-
for feature extraction and those features are used trained models are performing very well and
to train classical machine learning models. giving very accurate results as compared to the
Pretrained models used in the study are ResNet50 CNN models, the accuracy of all the self-defined
[33], VGG16 [34], VGG19[34], InceptionV3 [35], CNN model is always approximately around
MobileNet [36], DenseNet169 [37], DenseNet121 33% which could be due to the model that is
[37], InceptionResNetV2 [38], MobileNetV2 [39], always predicting every image of the same class.

Applications of Artificial Intelligence in Medical Imaging


2.4 Lung cancer detection using artificial intelligence 63
Among the pretrained models, VGG16, VGG19, score of 98.709%, Cohen Kappa score of 98.067%,
and ResNet50 are giving the best results. The and AUC Score 99.821%.
input layer has the shape of 128 3 128 3 3 in all Pretrained models are used for extraction
of the CNN models, and at last, the output features and then trained using classical
through models is flattened and then connected machine learning models. The results are
to 3-unit softmax output layer, one for each class. shown in Tables 2.22.11.
The best performing model is ResNet50 in the VGG16 is used as feature extractor for the lung
above table with a test accuracy of 98.711%, F1 tissue images and the model is trained using clas-

TABLE 2.1 Performance of different convolutional neural network (CNN) models.


Classifier Training accuracy Validation accuracy Test accuracy F1 measure KAPPA AUC

CNN 2 Layer 0.30171 0.29467 0.28978 0.21283 -0.06434 0.4

CNN 3 Layer 0.35962 0.35156 0.36222 0.27079 0.03874 0.46776

CNN 4 Layer 0.33333 0.33511 0.33156 0.16511 0 0.5

CNN 5 Layer 0.42429 0.41867 0.43689 0.39193 0.1478 0.72973

CNN 6 Layer 0.372 0.37733 0.37333 0.3033 0.05282 0.70512

CNN 7 Layer 0.33 0.35 0.31956 0.15477 0 0.50139

CNN 8 Layer 0.31771 0.31689 0.32178 0.1692 -0.01475 0.5071

VGG16 0.99676 0.97289 0.97467 0.97467 0.962 0.99439

ResNet50 0.99914 0.98578 0.98711 0.98709 0.98067 0.99821

VGG19 0.99486 0.972 0.97556 0.97553 0.96332 0.99133

Inception_v3 0.84695 0.82311 0.82844 0.83234 0.7432 0.90396

MobileNet 0.85105 0.83378 0.84489 0.83623 0.76673 0.98124

DenseNet169 0.94543 0.93556 0.93644 0.93629 0.90466 0.9674

DenseNet121 0.91971 0.90533 0.90978 0.90953 0.86504 0.9679

InceptionResNetV2 0.42733 0.43778 0.43778 0.40157 0.16098 0.59807

MobileNetV2 0.9281 0.88889 0.89822 0.8974 0.84733 0.97831

ResNet101 0.98943 0.97022 0.97333 0.9733 0.96 0.99661

TABLE 2.2 Performance of VGG16 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.9981 0.98311 0.97822 0.97821 0.96733 0.97822 0.97833

KNN 0.915 0.88578 0.86889 0.86517 0.80361 0.86889 0.88749

SVM 1 0.97867 0.97422 0.97422 0.96133 0.97422 0.97425

Random Forest 1 0.95822 0.95378 0.9537 0.93065 0.95378 0.95421

AdaBoost 0.90476 0.90667 0.91156 0.91137 0.86739 0.91156 0.91451

XGBoost 1 0.97111 0.964 0.96404 0.94599 0.964 0.96425

Applications of Artificial Intelligence in Medical Imaging


64 2. Lung cancer detection from histopathological lung tissue images using deep learning

TABLE 2.3 Performance of VGG19 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.99848 0.97778 0.97378 0.9738 0.96066 0.97378 0.97396


KNN 0.913 0.83378 0.83467 0.83424 0.7522 0.83467 0.84989
SVM 1 0.97733 0.96667 0.96667 0.95 0.96667 0.96668
Random Forest 1 0.95111 0.94 0.93979 0.90998 0.94 0.9403
AdaBoost 0.87305 0.86844 0.86578 0.86452 0.79883 0.86578 0.87284
XGBoost 1 0.97689 0.96356 0.96348 0.94533 0.96356 0.96344

TABLE 2.4 Performance of ResNet50 deep feature extraction with different machine learning models.

Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.99714 0.98489 0.98622 0.98623 0.97933 0.98622 0.98624

KNN 0.936 0.86667 0.87956 0.88036 0.81938 0.87956 0.8895


SVM 1 0.988 0.988 0.988 0.982 0.988 0.988
Random Forest 1 0.95778 0.95956 0.95953 0.93933 0.95956 0.95952
AdaBoost 0.88181 0.87022 0.88844 0.8875 0.83282 0.88844 0.89912
XGBoost 1 0.97733 0.97644 0.97643 0.96466 0.97644 0.97642

TABLE 2.5 Performance of ResNet101 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.99714 0.97733 0.97156 0.97151 0.95734 0.97156 0.97194


KNN 0.92 0.85467 0.84978 0.84846 0.775 0.84978 0.87629

SVM 1 0.984 0.98578 0.98578 0.97866 0.98578 0.98578


Random Forest 1 0.95289 0.95511 0.95503 0.93266 0.95511 0.95497
AdaBoost 0.85124 0.83511 0.84356 0.83991 0.76568 0.84356 0.86475
XGBoost 1 0.97333 0.96622 0.96619 0.94933 0.96622 0.96618

Applications of Artificial Intelligence in Medical Imaging


2.4 Lung cancer detection using artificial intelligence 65
TABLE 2.6 Performance of MobileNetV2 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.84019 0.81422 0.82533 0.81459 0.73737 0.82533 0.87165


KNN 0.888 0.85156 0.84889 0.84842 0.77344 0.84889 0.8514
SVM 1 0.88444 0.87333 0.87337 0.80998 0.87333 0.87342
Random Forest 0.9999 0.88578 0.89067 0.89013 0.83593 0.89067 0.89121
AdaBoost 0.76648 0.752 0.75156 0.72932 0.62836 0.75156 0.79502
XGBoost 1 0.91111 0.90089 0.90067 0.85131 0.90089 0.90053

TABLE 2.7 Performance of MobileNet deep feature extraction with different machine learning models.

Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.94619 0.91689 0.912 0.91149 0.86788 0.912 0.92001

KNN 0.913 0.87467 0.85378 0.8516 0.78092 0.85378 0.86667


SVM 1 0.91022 0.90933 0.90938 0.864 0.90933 0.90947
Random Forest 1 0.90356 0.89244 0.89312 0.8386 0.89244 0.89495
AdaBoost 0.80324 0.79822 0.80133 0.78819 0.70266 0.80133 0.83618
XGBoost 1 0.92622 0.90978 0.91008 0.86464 0.90978 0.91059

TABLE 2.8 Performance of InceptionV3 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.87114 0.85022 0.85511 0.85808 0.78237 0.85511 0.87945


KNN 0.899 0.82756 0.83289 0.83133 0.74954 0.83289 0.84141
SVM 1 0.92089 0.91733 0.91741 0.87598 0.91733 0.91752
Random Forest 1 0.89511 0.88533 0.88512 0.82797 0.88533 0.8854
AdaBoost 0.81848 0.80667 0.804 0.79994 0.70617 0.804 0.80063

XGBoost 1 0.92089 0.92 0.91982 0.88 0.92 0.91971

Applications of Artificial Intelligence in Medical Imaging


66 2. Lung cancer detection from histopathological lung tissue images using deep learning

TABLE 2.9 Performance of InceptionResNetV2 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.45714 0.46044 0.45822 0.39109 0.18746 0.45822 0.54051


KNN 0.82 0.70578 0.69422 0.69687 0.54147 0.69422 0.70166
SVM 0.7979 0.72711 0.72356 0.72421 0.58538 0.72356 0.7253
Random Forest 0.9999 0.73422 0.72844 0.72997 0.59249 0.72844 0.73276
AdaBoost 0.69552 0.66444 0.684 0.68581 0.52603 0.684 0.68812
XGBoost 0.984 0.74756 0.73333 0.73544 0.60001 0.73333 0.73812

TABLE 2.10 Performance of DenseNet169 deep feature extraction with different machine learning models.

Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.96629 0.95778 0.95156 0.95152 0.92736 0.95156 0.9533

KNN 0.97 0.94756 0.94489 0.94477 0.91737 0.94489 0.94856


SVM 1 0.97644 0.97378 0.97379 0.96067 0.97378 0.97381
Random Forest 0.9999 0.94578 0.952 0.95208 0.92799 0.952 0.95233
AdaBoost 0.80629 0.80533 0.81067 0.80332 0.71653 0.81067 0.83895
XGBoost 1 0.97244 0.968 0.96801 0.952 0.968 0.96803

TABLE 2.11 Performance of DenseNet121 deep feature extraction with different machine learning models.

Model Train accuracy Val accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.9641 0.94711 0.94444 0.94394 0.9167 0.94444 0.94627


KNN 0.977 0.94844 0.94356 0.9435 0.91535 0.94356 0.94428
SVM 1 0.97156 0.96356 0.96356 0.94533 0.96356 0.96356
Random Forest 1 0.94844 0.93822 0.93826 0.90732 0.93822 0.93847
AdaBoost 0.8039 0.79689 0.804 0.79248 0.70248 0.804 0.84563
XGBoost 1 0.96311 0.96222 0.96223 0.94332 0.96222 0.96248

Applications of Artificial Intelligence in Medical Imaging


2.4 Lung cancer detection using artificial intelligence 67
sical machine learning (ML) models. The results on test data is achieved using ANN with a value
are improved to a large extent as compared to the of 97.82% and the VGG16 is performing most
CNN models in Table 2.1. The highest accuracy poorly with KNN with test accuracy of 86.88%.
def get_models():
ANN = Sequential()
ANN.add(Dense(128, input_dim = X_train.shape[1], activation = 'relu'))
ANN.add(BatchNormalization())
ANN.add(Dropout(0.2))
ANN.add(Dense(64, activation='relu'))
ANN.add(Dense(32, activation='relu'))
ANN.add(Dense(16, activation='relu'))
ANN.add(Dense(8, activation='relu'))
ANN.add(Dense(len(train_it.class_indices), activation='softmax'))
ANN.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

KNN = KNeighborsClassifier()

SVM = SVC(kernel = 'linear')

RF = RandomForestClassifier(n_estimators = 50)

ADB = AdaBoostClassifier()

XGB = XGBClassifier(n_estimators = 50, use_label_encoder=False)

return (ANN, KNN, SVM, RF, ADB, XGB)

def reshape_data(X_train, X_val, X_test):


X_train = X_train.reshape(X_train.shape[0], -1)
X_val = X_val.reshape(X_val.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)
return (X_train, X_val, X_test)

def fit_ANN(model, X_train, y_train, X_val, y_test):


es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5)
history = model.fit(X_train, y_train, validation_data=(X_val, y_test), epochs=10, verbose=1,
callbacks=[es])
return model

def fit_model(model, X_train, y_train):


model.fit(X_train, y_train)
return model

base_model = VGG16(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


model.summary()

X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)


X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()

ANN = fit_ANN(ANN, X_train, y_train, X_val, y_val)


KNN = fit_model(KNN, X_train_scaled, y_train)

Applications of Artificial Intelligence in Medical Imaging


68 2. Lung cancer detection from histopathological lung tissue images using deep learning

VGG19 and VGG16 are similar models, 50 layers, so it has very good weights which
VGG19 has 19 layers, whereas VGG16 has 16. are able to extract many features of an image.
So results are also similar to what obtained Combining it with classical ML models is
using VGG16. Here, best results are obtained giving very good result. The SVM models is
using ANN with test accuracy of 97.38% and
worst are with KNN with value of 83.42%.

base_model = VGG19(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


model.summary()

X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)


X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()

ANN = fit_ANN(ANN, X_train, y_train, X_val, y_val)


KNN = fit_model(KNN, X_train_scaled, y_train)

ResNet50 is trained on over a million giving highest accuracy score of 98.8% and
images on the ImageNet database and it has worst score is with KNN with test accuracy
score of 87.95%.

base_model = ResNet50(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)
X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()

KNN = fit_ANN(KNN, X_train, y_train, X_val, y_val)


SVM = fit_model(SVM, X_train_scaled, y_train)

Applications of Artificial Intelligence in Medical Imaging


2.4 Lung cancer detection using artificial intelligence 69
ResNet101 is the upgraded version of MobileNet is the previous version of
ResNet50. Like ResNet50, ResNet101 is also giving MobileNetV2 model and are quite similar to
good results. Here, the maximum test accuracy one another. The results are also similar to the
score achieved is 98.57% with SVM and worst test MobileNetV2 model. Here, the best result is
accuracy is given by KNN with value of 84.9%. obtained by training the ANN model on the

base_model = ResNet101(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)
X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()

KNN = fit_model(KNN, X_train_scaled, y_train)


SVM = fit_model(SVM, X_train_scaled, y_train)

MobileNetV2 is 53-layer deep and is also features extracted by MobileNet with a test
trained on the ImageNet database. The resultant accuracy value of 90.97% and worst score is
score obtained using MobileNetV2 are lower than obtained using AdaBoost with a value of
the scores of VGG and ResNet models. Here, the 80.13%.
best score is given by the XGBoost with a value of
90.08% and worst score is given by AdaBoost
with value of 75.15%.

base_model = MobileNetV2(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)
X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()

ADB = fit_model(ADB, X_train, y_train)


XGB = fit_model(XGB, X_train, y_train)

Applications of Artificial Intelligence in Medical Imaging


70 2. Lung cancer detection from histopathological lung tissue images using deep learning

base_model = MobileNet(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)
X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()

ANN = fit_ANN(ANN, X_train, y_train, X_val, y_val)


ADB = fit_model(ADB, X_train, y_train)

InceptionV3 is a model of the family of


Inception models. It consists of 48 layers. The not good enough to extract important features
best test accuracy score obtained is 92% with required for the correct classification of this dataset.
XGBoost model and worst score is 80.4% with The best accuracy is given with XGBoost with the
AdaBoost model.

base_model = InceptionV3(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)
X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()
ADB = fit_model(ADB, X_train, y_train)
XGB = fit_model(XGB, X_train, y_train)

InceptionResNetV2 is a network which is value of 73.33% and the worst accuracy is given
164-layer deep. But the weights of this model are with ANN with a value of 45.82%.

base_model = InceptionResNetV2(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3),


weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)
X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()

ANN = fit_ANN(ANN, X_train, y_train, X_val, y_val)


XGB = fit_model(XGB, X_train, y_train)

Applications of Artificial Intelligence in Medical Imaging


2.4 Lung cancer detection using artificial intelligence 71
DenseNet169 is also trained on ImageNet data- epochs. For training the KNN model, PCA is
base and is a member of DenseNet family. The used to reduce data features as the training time
performance of this model is good on the dataset. grows exponentially with the number of data
The best score achieved is with SVM with the features. For the XGBoost and Random Forest
value of 97.37% and the worst accuracy score is model number of estimators is taken as 50. For
given with AdaBoost with the value of 81.06%. all other models, default parameters are used.

base_model = DenseNet169(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)
X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()

SVM = fit_model(SVM, X_train, y_train)


ADB = fit_model(ADB, X_train, y_train)

DenseNet121 is another member of DenseNet


The best model in Table 2.2 is ResNet50 with
family. The results obtained using this are simi-
SVM giving test accuracy, F1 score, Cohen
lar to the results of DenseNet169. The best test
Kappa score, Recall, and Precision of 98.8%,
accuracy score is given with SVM with a value
98.8%, 98.2%, 98.8%, and 98.8%, respectively.
of 96.35% and the worst score is given by using
For Tables 2.22.11, pretrained networks are
AdaBoost with the value of 80.4%.
used. As training, these huge networks required

base_model = DenseNet121(include_top=False, input_shape=(SIZE_X, SIZE_Y, 3), weights='imagenet')

for layer in base_model.layers:


layer.trainable = False

model = Model(inputs=base_model.input, outputs=base_model.layers[-1].output)


X_train, X_val, X_test, y_train, y_val, y_test = get_features(model, train_it, validate_it)
X_train, X_val, X_test = reshape_data(X_train, X_val, X_test)
ANN, KNN, SVM, RF, ADB, XGB = get_models()
SVM = fit_model(SVM, X_train, y_train)
ADB = fit_model(ADB, X_train, y_train)

From the above tables, it can be seen that very huge data, computation power, and time.
VGG16, VGG19, ResNet50, and ResNet101 can These models are developed by a team of
extract essential features from the images, and researchers. These models are able to extract
ANN gives the best result among all other mod- every small feature of an image. So these models
els. The ANN used here has the shape of are used to extract import features and then clas-
128 3 64 3 32 3 16 3 8 and is trained for 10 sical ML models ANN, KNN, SVM, Random

Applications of Artificial Intelligence in Medical Imaging


72 2. Lung cancer detection from histopathological lung tissue images using deep learning

Forest, AdaBoost, and XGBoost are trained on AI is helping the medical field to improve
the extracted features to give the output. The and give faster and better results. As we can
output given by pretrained models is flattened to see in the result, the accuracy of the machine
give as input to the ML models. The best test learning model is almost equal to 1. Today,
accuracy score achieved using this method is we have the data and resources to use machine
98.8% which is given by ResNet50 and SVM. learning to automate lung cancer detection.
Through AI, it will be possible for each and
every one to detect cancer in almost no time.
2.5 Discussion Early diagnosis of cancer can be lifesaving in
the initial stages, and the lung is more curable.
The abovementioned results show that all
transfer learning models achieved good perfor-
mance. All the pretrained models are better References
than the self-designed CNN. Pretrained mod- [1] U.S. National Institute of Health, National Cancer
els are also reducing the time to train a model Institute. SEER Cancer Statistics Review, 19752015.
[2] Centres for Disease Control and Prevention, National
on such a vast dataset. Some models such as
Centre for Health Statistics. CDC WONDER On-Line
VGG16, VGG19, ResNet50, and ResNet101 are Database, Compiled from Compressed Mortality File
performing better than other models. The best 19992016 Series 20 No. 2V, 2017.
results have the F1 score of 98.8% given by [3] What is lung cancer? | Types of lung cancer. https://
ResNet50 and SVM Classifier. Models such as www.cancer.org/cancer/lung-cancer/about/what-is.
html (accessed 14.05.21).
Random Forest and XGBoost can fit very well
[4] Facebook and Facebook, 10 Tips for Preventing Lung
on training data. But they are not giving good Cancer, Verywell Health. ,https://www.verywell-
results on test and validation data. SVM can health.com/tips-for-lung-cancer-prevention-2249286.
give more accurate results after training on the (accessed 14.05.21).
features extracted through a pretrained model [5] Healthline, Effects of lung cancer on the body,
,https://www.healthline.com/health/lung-cancer/
than other classical machine learning models.
effects-on-body., May 09, 2017 (accessed 14.05.21).
KNN, AdaBoost, and Random Forest are not [6] How to detect lung cancer | Lung cancer tests.
be able to give accurate results in most cases. ,https://www.cancer.org/cancer/lung-cancer/
Pretrained models are extracting important fea- detection-diagnosis-staging/how-diagnosed.html.
tures with their weights which is enabling sim- (accessed 14.05.21).
[7] Cancer Support Community, Coping with side effects
ple classical ML models to give very good
of lung cancer treatment, ,https://www.cancersup-
results on the complex dataset. For making portcommunity.org/article/coping-side-effects-lung-
own designed CNN model able to make accu- cancer-treatment. (accessed 14.05.21).
rate predictions, a lot of computation power [8] T. Atsushi, T. Tetsuya, K. Yuka, F. Hiroshi,
and time is required so it is always easy and Automated classification of lung cancer types from
cytological images using deep convolutional neural
better to just load the pretrained models.
networks, BioMed. Res. Int. 2017 (2017) 16. Available
from: https://doi.org/10.1155/2017/4067832.
[9] W. Ausawalaithong, A. Thirach, S. Marukatat, T.
2.6 Conclusion Wilaiprasitporn, Automatic lung cancer prediction from
chest X-ray images using the deep learning approach, in:
Lung cancer is a very deadly disease, and 2018 11th Biomedical Engineering International
everyone must take precautions to avoid every Conference (BMEiCON), Nov. 2018, pp. 15. Available
from: https://doi.org/10.1109/BMEiCON.2018.8609997.
possibility of having lung cancer. Smoking [10] B. Hatuwal, H. Thapa, Lung cancer detection using
should be avoided, and a healthy and balanced convolutional neural network on histopathological
diet should be taken. images, Int. J. Comput. Trends Technol. 68 (2020)

Applications of Artificial Intelligence in Medical Imaging


References 73
2124. Available from: https://doi.org/10.14445/ Comput. Harmon. Anal. 30 (1) (2011) 4768. Available
22312803/IJCTT-V68I10P104. from: https://doi.org/10.1016/j.acha.2010.02.003.
[11] S.C. S R, H. Rajaguru, Lung cancer detection using [25] N. Halko, P.G. Martinsson, J.A. Tropp, Finding struc-
probabilistic neural network with modified crow- ture with randomness: probabilistic algorithms for
search algorithm, Asian Pac. J. Cancer Prev. APJCP constructing approximate matrix decompositions,
20 (7) (2019) 21592166. Available from: https://doi. SIAM Rev. 53 (2) (2011) 217288. Available from:
org/10.31557/APJCP.2019.20.7.2159. https://doi.org/10.1137/090771806.
[12] S.S. Chakravarthy, S.A. Subhasakthe, Adaptive [26] sklearn.manifold.LocallyLinearEmbedding—Scikit-
median filtering with modified BDND algorithm for Learn 0.24.2 documentation. ,https://scikit-learn.
the removal of high-density impulse and random org/stable/modules/generated/sklearn.manifold.
noise, Int. J. Comput. Sci. Mob. Comput. 4 (2015). LocallyLinearEmbedding.html. (accessed 14.05.21).
[13] S. Sasikala, M. Bharathi, B.R. Sowmiya, Lung can- [27] G. Guo, H. Wang, D. Bell, Y. Bi, KNN model-based
cer detection and classification using deep CNN, approach in classification, Aug. 2004.
Int. J. Eng. Innov. Technol. Explo. Eng. 8 (25) (2018) [28] H. Patel, P. Prajapati, Study and analysis of decision
259262. tree based classification algorithms, Int. J. Comput.
[14] Lung Image Database Consortium, The Cancer Imaging Sci. Eng. 6 (2018) 7478. Available from: https://doi.
Archive (TCIA) Public Access - Cancer Imaging Archive org/10.26438/ijcse/v6i10.7478.
Wiki. ,https://wiki.cancerimagingarchive.net/dis- [29] I. Rish, An empirical study of the Naı̈ve Bayes classi-
play/Public/Lung 1 Image 1 Database 1 Consortium. fier, IJCAI 2001 workshop on empirical methods in
(accessed 14.05.21). artificial intelligence, 3, Jan2001.
[15] M.F. Serj, B. Lavi, G. Hoff, D.P. Valls, A deep convolu- [30] A. Natekin, A. Knoll, Gradient boosting machines, a
tional neural network for lung cancer diagnostic, tutorial, Front. Neurorobot. 7 (2013) 21. Available from:
(2018). arXiv preprint arXiv:1804.08170. https://doi.org/10.3389/fnbot.2013.00021. Dec.
[16] Kaggle, Data Science Bowl 2017. ,https://kaggle.com/ [31] A.A. Borkowski, M.M. Bui, L.B. Thomas, C.P. Wilson,
c/data-science-bowl-2017., 2017 (accessed 14.05.21). L.A. DeLand, S.M. Mastorides, Lung and colon cancer
[17] A. Masood, et al., Computer-assisted decision support histopathological image dataset (LC25000), ArXiv
system in pulmonary cancer detection and stage classifi- Prepr. ArXiv191212142, 2019.
cation on C.T. images, J. Biomed. Inform. 79 (2018). [32] Kaggle, Lung and colon cancer histopathological
Available from: https://doi.org/10.1016/j.jbi.2018.01.005. images. , https://kaggle.com/andrewmvd/lung-
[18] Q. Qiu, X. Cheng, R. Calderbank, G. Sapiro, DCFNet: and-colon-cancer-histopathological-images.
deep neural network with decomposed convolutional (accessed 14.05.21).
filters, ArXiv180204145 Cs Stat, ,http://arxiv.org/ [33] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for
abs/1802.04145., Jul. 2018 (accessed14.05.21). image recognition, ArXiv151203385 Cs, ,http://arxiv.
[19] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning org/abs/1512.03385., Dec. 2015 (accessed 14.05. 21).
representations by back-propagating errors, Nature [34] K. Simonyan, A. Zisserman, Very deep convolutional
323 (6088) (1986). Available from: https://doi.org/ networks for large-scale image recognition, ArXiv
10.1038/323533a0. Art. no. 6088, Oct. Prepr. ArXiv14091556, 2014.
[20] Y. LeCun, et al., Backpropagation applied to handwrit- [35] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z.
ten zip code recognition, Neural Comput. 1 (4) (1989) Wojna, Rethinking the inception architecture for com-
541551. puter vision, in: Proceedings of the IEEE conference
[21] S. Mishra, et al., Principal component analysis, Int. J. on computer vision and pattern recognition, 2016,
Livest. Res. 1 (2017). Available from: https://doi.org/ pp. 28182826.
10.5455/ijlr.20170415115235. Jan. [36] A.G. Howard et al., MobileNets: efficient convolutional
[22] W. Chojnacki, M. Brooks, A note on the locally lin- neural networks for mobile vision applications,
ear embedding algorithm, IJPRAI 23 (2009) ArXiv170404861 Cs, ,http://arxiv.org/abs/1704.04861.,
17391752. Available from: https://doi.org/ Apr. 2017 (accessed 14.05.21).
10.1142/S0218001409007752. Dec. [37] G. Huang, Z. Liu, L. van der Maaten, K.Q.
[23] G. Baudat, F. Anouar, Generalized discriminant analy- Weinberger, Densely connected convolutional net-
sis using a kernel approach, Neural Comput. 12 (2000) works, ArXiv160806993 Cs, ,http://arxiv.org/abs/
23852404. Available from: https://doi.org/10.1162/ 1608.06993. Jan. 2018 (accessed 14.05.21).
089976600300014980. [38] Papers with Code, Inception-ResNet-v2 explained.
[24] P.-G. Martinsson, V. Rokhlin, M. Tygert, A randomized ,https://paperswithcode.com/method/inception-
algorithm for the decomposition of matrices, Appl. resnet-v2. (accessed 14.05.21).

Applications of Artificial Intelligence in Medical Imaging


74 2. Lung cancer detection from histopathological lung tissue images using deep learning

[39] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. [44] S. Nagpal, M. Kumar, M.R. Maruthi, Ayyagari,
Chen, MobileNetV2: inverted residuals and linear bot- Kumar, A survey of deep learning and its applica-
tlenecks, ArXiv180104381 Cs, ,http://arxiv.org/abs/ tions: a new paradigm to machine learning, Arch.
1801.04381., Mar. 2019 (accessed 14.04.21). Comput. Methods Eng. (2019). Jul.
[40] S. Ghosh, A. Dasgupta, A. Swetapadma, A study on [45] R. Khandelwal, Convolutional neural network: feature
support vector machine based linear and non-linear map and filter visualization, Medium (2020).
pattern classification, in: 2019 International Available from: https://towardsdatascience.com/con-
Conference on Intelligent Sustainable Systems (ICISS), volutional-neural-network-feature-map-and-filter-
Feb. 2019, pp. 2428. Available from: https://doi. visualization-f75012a5a49c. May 18.
org/10.1109/ISS1.2019.8908018. [46] L. Alzubaidi, et al., Review of deep learning: concepts,
[41] M.-C. Popescu, V. Balas, L. Perescu-Popescu, N. CNN architectures, challenges, applications, future
Mastorakis, Multilayer perceptron and neural net- directions, J. Big Data 8 (1) (2021) 53. Available from:
works, WSEAS Trans. Circuits Syst. 8 (2009). Jul. https://doi.org/10.1186/s40537-021-00444-8.
[42] S. Sharma, S. Sharma, A. Athaiya, Activation functions [47] K. Weiss, T.M. Khoshgoftaar, D. Wang, A survey of
in neural networks, Int. J. Eng. Appl. Sci. Technol. 04 transfer learning, J. Big Data 3 (1) (2016) 9.
(2020) 310316. Available from: https://doi.org/ Available from: https://doi.org/10.1186/s40537-
10.33564/IJEAST.2020.v04i12.054. May. 016-0043-6.
[43] E. Grossi, M. Buscema, Introduction to artificial [48] I.T. Jolliffe, J. Cadima, Principal component analy-
neural networks, Eur. J. Gastroenterol. Hepatol. 19 sis: a review and recent developments, Philos.
(Jan. 2008). Available from: https://doi.org/ Trans. R. Soc. Math. Phys. Eng. Sci. 374 (2065)
10.1097/MEG.0b013e3282f198a0. (2016) 20150202.

Applications of Artificial Intelligence in Medical Imaging


C H A P T E R

3
Magnetic resonance imagining-based
automated brain tumor detection using
deep learning techniques
Abhranta Panigrahi1 and Abdulhamit Subasi2,3
1
Department of Biotechnology and Medical Engineering, National Institute of Technology Rourkela,
Rourkela, India 2Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland
3
Department of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

3.1 Introduction 75 3.4.2 Transfer learning 85


3.4.3 Prediction and classification 88
3.2 Literature survey 76
3.4.4 Experimental data 88
3.3 Deep learning for disease detection 78 3.4.5 Experimental setup 88
3.3.1 Artificial neural networks 78 3.4.6 Performance evaluation metrics 92
3.3.2 Deep learning 79 3.4.7 Experimental results 93
3.3.3 Convolutional neural networks 80
3.5 Discussion 95
3.4 Disease detection using artificial
3.6 Conclusion 106
intelligence 82
3.4.1 Feature extraction 82 References 106

3.1 Introduction brain and the spinal cord. An approximated


24,530 adults consisting of 13,840 men and
Central nervous system tumor or a CNS 10,690 women in the United States will be
tumor refers to excessive growth or accumula- diagnosed with CNS cancer. An estimated
tion of cells in the CNS that consists of the 3460 children under the age of 15 will also be

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00012-8 75 © 2023 Elsevier Inc. All rights reserved.
76 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

diagnosed with brain or CNS tumors in 2021. classification task [6]. The success of convolu-
Brain and nervous system cancer is the 10th tional neural networks (CNNs) led the way to
leading cause of death for men and women. their extensive use in diagnosis via medical
An estimated 18,600 adults will die from pri- imaging [911]. Since training a CNN from
mary CNS cancer [1]. Due to the severity of scratch to achieve state of the art performance
the problem, the early diagnosis of a brain tumor requires high computational resources, the
is extremely important for its treatment. Since usage of transfer learning has increased expo-
the invention of X-rays in 1895 by Wilhelm nentially. This has resulted in the adoption of
Roentgen, various medical imaging techniques transfer learning techniques to detect and seg-
have been used to diagnose brain tumors. ment brain tumors [12] and classify [13] them
German neurosurgeon, Fedor Krause used for faster and more efficient diagnosis. In this
X-rays extensively to detect brain tumors. Over chapter, we try to provide a detailed assess-
time, many sophisticated and advanced diagno- ment of various deep learning methods for the
sis techniques were invented. These included detection of brain tumors and evaluate the var-
imaging techniques such as computed tomogra- ious pros and cons of each method, which will
phy (CT) scan [2], positron emission tomography enable faster diagnosis of brain tumors.
scan [3], and various invasive methods such as Overview of the MRI-based brain tumor detec-
biopsy. One of the most significant break- tion is given in Fig. 3.1.
throughs in brain tumor detection came with the
development of magnetic resonance imaging
(MRI). It is a noninvasive medical imaging tech- 3.2 Literature survey
nique that uses strong radio magnetic waves to
generate an accurate image of the required Extensive work has been done in the field of
organ. MRI generates a more detailed picture of deep learning and big data analysis and their
the brain and does not involve radiation. applications in the diagnosis of brain tumors.
Intravenous gadolinium-enhanced MRI is typi- Most of the approaches involve the segmenta-
cally used to help create a clearer picture of the tion of the brain tumor via CNNs. Özyurt et al.
brain tumor [4]. With the advent of MRI technol- [14] conducted a study, which proposed a
ogy, different variations of MRI imaging started hybrid method using neutrosophy and convo-
to play a vital role in the image-based diagnosis lutional neural network. It aimed to classify
of brain tumors. Some of the techniques of MRI brain tumors as benign or malignant. CNNs
imaging that are prevalent in modern medicine were used to extract features from the seg-
are fluid attenuated inversion recovery, mented brain tumor images and then the
T1-weighted precontrast, T1-weighted postcon- tumors were classified as malignant or benign
trast (Gd), and T2-weighted. using support vector machine (SVM) and
With the advent of technology, various K-nearest neighbor (KNN) classifiers. Jalali and
computational techniques proved to be useful Kaur [15] gave a detailed comparative sum-
in medical diagnosis. The advent of deep learn- mary of various methods and works for auto-
ing [5] has revolutionized the field of computer matic brain tumor detection using medical
vision and natural language processing. imaging. Their work involved the description
Various models such as AlexNet [6], ResNet of various methods such as deep belief net-
[7], and GoogleNet [8] have given near human- works, recurrent neural networks, KNNs,
like accuracy in the ImageNet image SVMs, and many others. Deb and Roy [16]

Applications of Artificial Intelligence in Medical Imaging


3.2 Literature survey 77

FIGURE 3.1 Overview of the pipelines.

proposed a novel segmentation and classifica- the BRATS dataset and performed a binary classi-
tion algorithm for brain tumor detection. They fication on the MRI scans and predicted which
proposed a system that uses an adaptive fuzzy scans had tumors and which did not.
deep neural network with frog leap optimiza- Vallabhaneni and Rajesh [19] worked on auto-
tion to detect abnormalities in an image and matic brain tumor detection in noise-corrupted
then the abnormal image was segmented using images. Noise in medical imaging can seriously
an adaptive flying squirrel algorithm. Ambily hamper the ability of automated algorithms to
and Suresh [17] proposed an integrated model correctly detect abnormalities. To tackle this prob-
of CNN and transfer learning automated lem, they performed denoising using Edge
binary classification of MRI images. They used Adaptive Total Variation Denoising Technique.
a 16-layer pretrained network to distinguish This technique preserves the edges of an image
normal and abnormal images. while denoising. The denoised images were then
Amin et al. [18] proposed a deep learning segmented using mean shift clustering and fea-
model to predict input MR slices as unhealthy/ ture extraction was performed using gray-level
healthy based on the presence of tumors. Multiple cooccurrence matrix. A multiclass SVM classifier
image processing techniques were used to make was then used to detect the tumor present in the
the tumors more prominent. The MR slices were images.
segmented by applying optimal thresholds to clus- Rai et al. [20] proposed a hybrid deep CNN
ter the similar pixels together. These segmented model for an automated prediction and segmen-
slices were then sent to a two-layer stacked sparse tation of brain tumors from MRI images. They
autoencoder model. They trained the model on used a dataset with 3929 images, including 1373

Applications of Artificial Intelligence in Medical Imaging


78 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

images with tumors and 2556 images without The research goal here was to use a range of
tumors. The CNN model was evaluated using techniques to classify MRI scans into two catego-
Jaccard index, DICE score, F1 score, accuracy, ries—scans with tumors and scans without
precision, and recall. It was benchmarked against tumors.
UnetResnet-50 and vanilla U-Net and other
state-of-the-art techniques. They were able to
show excellent results with 99.7% accuracy. 3.3 Deep learning for disease detection
As it can be seen from the discussion above,
most of the papers focused on the segmentation 3.3.1 Artificial neural networks
of tumors use algorithms such as clustering or
autoencoders. Some of the authors have used CT Artificial neural networks (ANNs) are function
scans while the most commonly used imaging approximators that were designed to resemble a
technique is MRI. In this chapter, we aim to pro- primitive idea of the human brain. They are very
vide an exhaustive study of several techniques useful while predicting extremely nonlinear rela-
ranging from transfer learning to deep feature tions. They consist of layers of artificial neurons.
extraction for the detection of brain tumors. Fig. 3.2 shows a simple artificial neuron.
The layers of neurons that accept the input
data are called as the input layers (Fig. 3.3A).
The layers of neurons that give the final predic-
tion are called as the output layers (Fig. 3.3C),
and all the layers in between are called as the
hidden layers (Fig. 3.3B). They consist of neu-
rons and synapses. Each neuron has some data
(x) and some bias (B). Each synapse has a
weight (W) and each layer of neurons has an
activation function (f). The output (o) of each
layer is governed by:
o 5 fðx  W 1 BÞ
FIGURE 3.2 Artificial neuron.

FIGURE 3.3 Overview of a four-layered ANN. ANN, Artificial neural network.

Applications of Artificial Intelligence in Medical Imaging


3.3 Deep learning for disease detection 79
Leshno et al. [21] showed that multilayer • The data vector is fed through the network,
feedforward networks with a polynomial and it is multiplied with the weight matrix
activation function can approximate any and a bias term is added to it.
function. • After passing through the activation
The nonlinear activation function of the functions, the networks give a prediction.
neurons is very crucial for an ANN. These dif- 2. Calculation of loss:
ferentiable functions are the main reason why • The value of the prediction is compared
neural networks can fit to extremely nonlinear to the actual label of the data and a loss
data. There are many such activation function term is calculated.
choices such as sigmoid, tanh, ReLU, and • There are various types of loss such as
many more. mean squared error (MSE) loss, cross-
entropy loss, etc.
• Each different type of loss is used in
3.3.2 Deep learning different types of problems. For example,
MSE loss is mainly used in regression
Deep learning refers to a type of machine
problems, whereas cross-entropy loss is
learning algorithm which uses ANNs to learn.
mainly used in classification problems.
Deep learning consists of steps such as forward
3. Backpropagation of loss:
propagation, calculation of loss, and backpro-
• The information of the loss propagates
pagation. This method was formalized by
through all the layers starting from the
LeCun et al. [5]. ANNs learn by constantly
output layer and moving back all the way
updating the weights and biases of the synap-
up to the input layer; hence, the name
ses on an ANN. The weights and biases are
“backpropagation.”
updated to decrease the loss. The steps
• The gradient of the loss is calculated with
involved in deep learning (Fig. 3.4) are:
respect to every layer in the network
1. Forward propagation of data: through chain rule of differentiation.

FIGURE 3.4 Deep learning training process.

Applications of Artificial Intelligence in Medical Imaging


80 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

4. Gradient descent: vision to computer, LeCun et al. [22] provided


• The motive of the training step is to the description of a CNN (Fig. 3.5) for hand-
minimize the loss. For this purpose, the written digit recognition. The popularity of
“gradient descent” algorithm is used. CNNs started after Krizhevsky et al. [6] and
• The parameters are updated in order to created the AlexNet, which was a state-of-the-
reach the global minima of the loss curve; art image classification network in the
hence, the name “gradient descent.” ImageNet [23] challenge. Since then, networks

from sklearn.neural_network import MLPClassifier


# Create the Model
mlp = MLPClassifier(hidden_layer_sizes=(100, ), learning_rate_init=0.001,
alpha=1, momentum=0.9,max_iter=1000)
# Train the Model with Training dataset
mlp.fit(Xtrain,ytrain)
# Test the Model with Testing dataset
ypred = mlp.predict(Xtest)

3.3.3 Convolutional neural networks


such as GoogleNet [8] and ResNet [7] have per-
Vision is a unique property of life. We rely formed exceedingly well in image classification
heavily on out ability of vision to interact with tasks.
the world and perform daily tasks like walking CNNs are a type of ANN which implements
and driving. In order to give the power of the convolution operation on images in each

FIGURE 3.5 Overview of a convolutional neural network.

Applications of Artificial Intelligence in Medical Imaging


3.3 Deep learning for disease detection 81

FIGURE 3.6 Convolution operation in matrices.

The extracted feature maps are then flat-


tened and fed to a series of dense layers that
perform the final prediction. CNNs take the
information about a pixel and its neighboring
pixels as well, whereas dense networks or fully
connected networks flatten the image. This
results in loss of the spatial information of an
image in case of a dense neural network. Due
to this reason, CNNs are better suited for
image tasks when compared to densely con-
nected networks. ANNs and deep learning are
still in their infancy but the rapid rate of
research in these fields has ensured their adop-
tion in various domains. In health care, deep
FIGURE 3.7 Pooling operation in matrices. learning techniques have done wonders. In
recent years, researchers at Deep-mind have
layer. The networks consist of convolution been able to solve the age-old problem of pro-
(Fig. 3.6) layers and pooling (Fig. 3.7) layers tein folding using their model called as
and each layer progressively extract the fea- Alphafold [24]. Similarly, many other state-of-
tures of an image. The values of the convolu- the-art algorithms such as U-Net [25] have
tion filters are the parameters that get updated aided the medical community by providing
with training. segmented images for image-based diagnoses.

Applications of Artificial Intelligence in Medical Imaging


82 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

#Creating a basic 2 layered CNN with Keras


model_2 = Sequential()

#Adding a convolution layer, batch-normalization layer, ReLU


activation and #maxpooling layer
model_2.add(Conv2D(16,(3,3),padding='same',input_shape=
X_train_prep[0].shape))
model_2.add(BatchNormalization())
model_2.add(Activation('relu'))
model_2.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))

#Adding a convolution layer, batch-normalization layer, ReLU


activation and #maxpooling layer
model_2.add(Conv2D(32,(3,3),padding='same'))
model_2.add(BatchNormalization())
model_2.add(Activation('relu'))
model_2.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))

#Flattening and adding a FC layer with sigmoid Activation


model_2.add(Flatten())
model_2.add(Dense(nClasses,activation='sigmoid'))

3.4 Disease detection using artificial


intelligence
used to extract the features from the input
image which are then flattened and used as the
3.4.1 Feature extraction
training data for machine learning algorithms.
Feature extraction is an inherent property of In this chapter, a total of 11 pretrained CNN
neural networks. In Convolutional Neural models have been used for feature extraction
Networks (CNN), the feature maps of an image (Fig. 3.9), namely, VGG16 [26], VGG19 [26],
are extracted in each layer. After each convolu- ResNet50 [7], ResNet101 [7], MobileNet-V2
tional layer, features of an image such as edge [27], MobileNet [28], InceptionNet-V3 [29],
information, gradient information, etc. are InceptionResNet-V2 [30], DenseNet169 [31],
retrieved. These features are then learnt by the DenseNet121 [31], and XceptionNet [32]. All
network for the required classification task. the models were pretrained on the ImageNet
This is how a computer “sees” (Fig. 3.8). dataset. The MRI scans were passed through
Machine learning algorithms such as KNN the pretrained networks to get a feature map.
and XGBoost are excellent algorithms for pre- A dropout layer was added with a dropout
dicting something from a given set of features. probability of 0.5. This feature map was then
They perform well on tabular data where the flattened and converted into a 64 3 1 embed-
feature vectors are well defined. In case of ding/latent vector. This latent vector was then
images where there are a huge number of not passed through a batch normalization layer.
well-defined features, implementing these algo- The resulting feature embedding was used as
rithms can be difficult. To solve that problem, the training features for six machine learning
deep feature extraction is used where CNNs are algorithms: Multi layer perceptron (MLP)

Applications of Artificial Intelligence in Medical Imaging


3.4 Disease detection using artificial intelligence 83

FIGURE 3.8 How a computer “sees”? Visualizing the feature maps of an image retrieved from an ResNet50 model. Fig. 3.8A
shows the input image. Fig. 3.8B shows some of the features extracted in the very first layer of the network. Fig. 3.8C shows the
features extracted in the 12th layer of the network. Fig. 3.8D shows the features extracted in the 25th layer of the network.
Fig. 3.8E shows the features extracted in the last convolutional layer of the network. This visualization clearly shows that the edge
information are the most commonly extracted features in CNNs. As the number of layers increase, the features get more abstract.

FIGURE 3.9 Overview of deep feature extraction for classification of MRI scans. MRI, Magnetic resonance imagining.

Applications of Artificial Intelligence in Medical Imaging


84 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

classifier/ANN, KNN, support vector AdaBoost classifier, and XGBoost classifier.


machines (SVM), Random Forest classifier, The results were then recorded.

#Defining a pretrained ResNet50 model as the feature extractor


#The ResNet model was changed to other models for performing further
#experiments
base_model= ResNet50(input_shape=(224,224,3), weights='imagenet',
include_top=False)

#Applying dropout, flatten and batch-normalization to the extracted features


x = base_model.output
x = Dropout(0.5)(x)
x = Flatten()(x)
x = BatchNormalization()(x)
x = Dense(64,kernel_initializer='he_uniform')(x)
x = BatchNormalization()(x)
predictions = Activation('relu')(x)

model_feat = Model(inputs=base_model.input,outputs=predictions)

#Extracting the train, test and validation features from the respective data
train_features = model_feat.predict(X_train_prep)
val_features=model_feat.predict(X_val_prep)
test_features=model_feat.predict(X_test_prep)

#Creating a pipeline to train various classifiers on the features extracted #by


the pretrained networks.
names = ["K Nearest Neighbour Classifier",'SVM',"Random Forest Classifier",
"AdaBoost Classifier", "XGB Classifier","ANN Classifier"]
classifiers = [
KNeighborsClassifier(),
SVC(probability = True),
RandomForestClassifier(),
AdaBoostClassifier(),
XGBClassifier(),
MLPClassifier()
]
zipped_clf = zip(names,classifiers)
def classifier_summary(pipeline, X_train, y_train, X_val,
y_val,X_test,y_test):
sentiment_fit = pipeline.fit(X_train, y_train)
y_pred_train= sentiment_fit.predict(X_train)
y_pred_val = sentiment_fit.predict(X_val)
y_pred_test = sentiment_fit.predict(X_test)
y_pred_train = [1 if x>0.5 else 0 for x in y_pred_train]
y_pred_val = [1 if x>0.5 else 0 for x in y_pred_val]
y_pred_test = [1 if x>0.5 else 0 for x in y_pred_test]

Applications of Artificial Intelligence in Medical Imaging


3.4 Disease detection using artificial intelligence 85
#Calculating the performance metrics on the training data
train_accuracy = np.round(accuracy_score(y_train, y_pred_train),4)*100
train_precision = np.round(precision_score(y_train, y_pred_train,
average='weighted'),4)
train_recall = np.round(recall_score(y_train, y_pred_train,
average='weighted'),4)
train_F1 = np.round(f1_score(y_train, y_pred_train,
average='weighted'),4)
train_kappa = np.round(cohen_kappa_score(y_train, y_pred_train),4)

#Calculating the performance metrics on validation data


val_accuracy = np.round(accuracy_score(y_val, y_pred_val),4)*100
val_precision = np.round(precision_score(y_val, y_pred_val,
average='weighted'),4)
val_recall = np.round(recall_score(y_val, y_pred_val,
average='weighted'),4)

val_F1 = np.round(f1_score(y_val, y_pred_val, average='weighted'),4)


val_kappa = np.round(cohen_kappa_score(y_val, y_pred_val),4)

#Calculating the performance metrics on test data


test_accuracy = np.round(accuracy_score(y_test, y_pred_test),4)*100
test_precision = np.round(precision_score(y_test, y_pred_test,
average='weighted'),2)
test_recall = np.round(recall_score(y_test, y_pred_test,
average='weighted'),2)
test_F1 = np.round(f1_score(y_test, y_pred_test,
average='weighted'),2)
test_kappa = np.round(cohen_kappa_score(y_test, y_pred_test),2)
test_roc_auc = metrics.roc_auc_score(y_test,y_pred_test,
multi_class='ovo', average='weighted')

3.4.2 Transfer learning


research and, hence, it has resulted in faster
Transfer learning is a machine learning adaptation of deep learning. Transfer learning
technique in which models trained on a huge is a type of feature extraction where a pre-
dataset for a particular task can be repurposed trained deep network is used to extract fea-
on a second but related task. It is a technique tures from an image (Fig. 3.10). In a CNN, the
in which the learning capabilities of deep neu- filters in the beginning learn low-level fea-
ral nets are exploited to improve generaliza- tures such as edges and contour, which has
tion in another setting. Since training a deep the filters at the end, learn higher level
neural network from scratch to achieve state- features. Hence, the top filters of a deep
of-the-art performance requires huge compu- ConvNet trained on a large dataset can gener-
tational resources, transfer learning has been alize very well. Hence, in transfer learning,
gaining popularity. It has enabled faster paced the weights of the top layers of a pretrained

Applications of Artificial Intelligence in Medical Imaging


86 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

FIGURE 3.10 Visualization of the features extracted for totally different images in the first layer of ResNet50 model. It
shows the features extracted for an MRI scan of a brain. As it can be clearly observed, the first layers of the ConvNet
mainly extract the basic features such as edge and contour information. Hence, the parameters of the first few layers of a
deep neural network can be frozen and the features extracted from these can be used to train smaller and more task-
specific neural network to achieve near state-of-the-art performance in various tasks. MRI, Magnetic resonance imaging.

FIGURE 3.11 Overview of transfer learning.

ConvNet are frozen and then a smaller neural networks were frozen and a dense neural net
network is trained from scratch on the fea- was created that accepted the flattened output of
tures extracted by the frozen layers to give the the pretrained CNN. The dense network con-
final result (Fig. 3.11). sisted of a dense layer of dimension 256 3 1. The
In this chapter, 11 different pretrained CNNs output of the dense network was then passed
were used for transfer learning. These are the through a batch normalization layer which was
same networks that were mentioned in then passed through an ReLU activation func-
Section 3.4.1. The parameters of the layers of the tion. After a dropout layer, the final layer

Applications of Artificial Intelligence in Medical Imaging


consisted of 1 neuron with a sigmoid activation scans based on the presence or absence of
function to perform binary classification of MRI tumors. The pipeline is shown in Fig. 3.12.

FIGURE 3.12 Pipeline for the experimentation with transfer learning.

#Considering a ResNet50 model as the base model


base_Neural_Net= ResNet50(input_shape=(224,224,3),
weights='imagenet', include_top=False)
model=Sequential()

#Adding trainable head to the base network


model.add(base_Neural_Net)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1,activation='sigmoid'))

#Freezing the parameters of the base network


for layer in base_Neural_Net.layers:
layer.trainable = False

#Compiling the Neural Network


model.compile(loss='binary_crossentropy',optimizer='adam',
metrics=['accuracy' , 'AUC'])

#Defining the Early stopping parameters


EPOCHS = 30
es = EarlyStopping(monitor='val_acc', mode='max',patience=6)

#Training the model


history = model.fit_generator(train_generator,steps_per_epoch=50,
epochs=EPOCHS,
validation_data=validation_generator,
validation_steps=25,callbacks=[es])
88 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

3.4.3 Prediction and classification of various departments ranging from two-layered


CNNs to eight-layered CNNs were used for clas-
In this chapter, we have performed binary sification. ReLU activation was used for all the
classification on MRI scans of human brain on hidden layers and a sigmoid neuron was used as
the basis of the presence or absence of tumor. the final layer for classification. Appropriate batch
Since this was a binary classification problem, normalization layers and pooling layers were
a sigmoid activation function was used on the added to increase the accuracy of the models.
last neuron of an ANN which returned either
0 or 1 as the output.
Three different experiments are carried out. In 3.4.4 Experimental data
the first experiment, deep feature extraction was
used to extract features from the images. As The dataset that was chosen for the experiment
mentioned earlier in Section 3.4.1, 11 state-of-the- was a publicly available dataset [33]. It consisted
art CNN models were used to extract features of 3000 MRI scans with 1500 of them having a
from the MRI scans. These features were then brain tumor and 1500 of them without a brain
flattened and used as training features for vari- tumor. The data had compressed JPEG version of
ous machine learning algorithms such as MLP the MRI scans. The data was divided into three
classifier, KNN classifier, XGBoost classifier, etc. main folders—Train, Test, and Validation with
After the classification, the performance of the each of them containing two subfolder—Yes and
models was monitored using several perfor- No. The Yes folder contained the images with the
mance evaluation metrics, namely, training accu- brain tumor and the No folder consisted of the
racy, validation accuracy, testing accuracy, F1 images without the brain tumor. The dataset was
score, Cohen’s Kappa score, recall, precision, and then divided into Train, Test, and Validation sets
area under the ROC curve (AUC) score. These with the Test set containing 300 images of each
performance metrics have been discussed more class, the Train set containing 960 images of each
elaborately in Section 3.4.5. class and the Validation set containing 240 images
The second experiment was used for the experi- of each class. Due to the low number of training
ments was transfer learning. Here, the CNN mod- data, various image augmentation techniques
els as described in Section 3.4.1 were used to were used which are described in Section 3.4.5.
extract features from the image. The parameters of Since the images were of different sizes, resizing
these networks were frozen, and a shallow ANN them was necessary to train the neural networks.
was attached to the output of the last layer of the To avoid distortions due to resizing, a few prepro-
pretrained CNN. The parameters of the shallow cessing steps were taken which are discussed in
network were then updated with the help of back- Section 3.4.5. Sample MRI images are shown in
propagation and gradient descent. The loss func- Fig. 3.13.
tion used to train this network was the binary
cross-entropy loss. This loss was used as the prob-
3.4.5 Experimental setup
lem in hand was the binary classification problem.
After multiple experimentations, The Adam opti- The pretrained models that were used in the
mizer was found to give the best result in terms of experiments, accepted 224 3 224 RGB images.
loss and the monitored performance evaluation So, these accepted 224 3 224 3 3 tensors. Since
metrics which was accuracy and AUC score. More the images in the original dataset were of vary-
on this method has been discussed in Section 3.4.2. ing dimensions (Fig. 3.14), they needed to be
The third experiment carried out to tackle the resized. To avoid distortions, some preproces-
problem of training a CNN from scratch. CNNs sing steps were required before resizing them.

Applications of Artificial Intelligence in Medical Imaging


3.4 Disease detection using artificial intelligence 89

FIGURE 3.13 (A) shows the images without brain tumor and (B) shows the images with brain tumor.

FIGURE 3.14 (A) shows a batch of images without tumor and (B) shows a batch of images with tumors. This
figure shows the varying dimension of the images in the dataset and highlights the requirement of resizing the images.

Applications of Artificial Intelligence in Medical Imaging


90 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

Before resizing the images, the images were found from these images. After finding the
cropped to avoid distortions. As Fig. 3.15 contours, the extreme points in the contours
clearly shows, the brain occupied varying areas were calculated to get the edge of the brain in
for different images, that is, the black areas the image. Then a bounding box was con-
were different across all images. For this rea- structed with the middle of each side being the
son, the images were cropped to make sure calculated extreme point of the contour. The
that the models performed well. The first step image was cropped according to the edges of
for cropping the images was to find the con- the bounding box. These steps are clearly
tours of the brain in the MRI scan. To do this, shown in Fig. 3.15.
first, a Gaussian blur was applied on the image This step was important to make sure that
to reduce the noise. Then appropriate thresh- the brain occupies maximum area in the scan.
olding and dilation was used to remove small After this step, the images were resized to be
regions of noisy scans. Then contours were of dimension 224 3 224.

def crop_imgs(set_name, add_pixels_value=0):


"""
Finds the extreme points on the image and crops the rectangular out of
them
"""
set_new = []
for img in set_name:
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
# threshold the image, then perform a series of erosions +
# dilations to remove any small regions of noise
thresh = cv2.threshold(gray, 45, 255, cv2.THRESH_BINARY)[1]
thresh = cv2.erode(thresh, None, iterations=2)
thresh = cv2.dilate(thresh, None, iterations=2)
# find contours in thresholded image, then grab the largest one
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
c = max(cnts, key=cv2.contourArea)
# find the extreme points
extLeft = tuple(c[c[:, :, 0].argmin()][0])
extRight = tuple(c[c[:, :, 0].argmax()][0])
extTop = tuple(c[c[:, :, 1].argmin()][0])
extBot = tuple(c[c[:, :, 1].argmax()][0])
ADD_PIXELS = add_pixels_value
new_img = img[extTop[1]-ADD_PIXELS:extBot[1]+ADD_PIXELS,
extLeft[0]-ADD_PIXELS:extRight[0]+ADD_PIXELS].copy()
set_new.append(new_img)

return np.array(set_new)

Applications of Artificial Intelligence in Medical Imaging


3.4 Disease detection using artificial intelligence 91

FIGURE 3.15 Overview of the cropping step.

FIGURE 3.16 Sample of an image before augmentation (A) and the resulting images after a few image augmentations
are applied (B). This figure shows how image augmentation helps to generate more data which leads to robust models.

Since there are only 960 samples available and horizontally (Fig. 3.16). All of these aug-
for training, various image augmentations mentations were performed to make the
were used to increase the number of training models more robust. A series of experimen-
samples. The augmentations that were per- tations were also performed to compare the
formed on the image were: Random rotation performance of the models on the aug-
in a 15-degree range, shifting the image mented and nonaugmented images to study
along the width and the length, rescaling the the effect of number of training samples on
image, shearing, varying the brightness of the machine learning algorithms. All the
the image, and flipping the image vertically observations are showed in Section 3.4.7.

Applications of Artificial Intelligence in Medical Imaging


92 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

train_datagen = ImageDataGenerator(rotation_range=15, width_shift_range=0.1,


height_shift_range=0.1, shear_range=0.1,
brightness_range=[0.5, 1.5], horizontal_flip=True,
vertical_flip=True,preprocessing_function=preprocess_input)

train_generator = train_datagen.flow_from_directory(TRAIN_DIR,
color_mode='rgb', target_size=IMG_SIZE, batch_size=32,
class_mode='binary', seed=RANDOM_SEED)

3.4.6 Performance evaluation metrics are predicted correctly, it is called as a true nega-
Performance evaluation metrics are very tive or TN. When positive labels are misclassi-
important while dealing with the problems fied as negative labels or vice-versa, it is called as
related to health care. Since different perfor- false negative (FN) and false positive (FP),
mance metrics mean different things and respectively. These values give us a very clear
choosing a wrong performance metric to evalu- idea about the performance of the model.
ate the model can result in false claims, which After getting the confusion, the accuracy of the
can ultimately lead to a health disaster. Hence, prediction was calculated. This is defined as the
in this chapter, the models were evaluated on ratio of correct prediction to the total number of
the most common performance metrics to get predictions. Since the data was balanced, accuracy
an idea about their robustness and is a good indicator of the performance of the
performance. model. To get an even greater insight to the per-
First, a confusion matrix was plotted for all formance, the precision, recall, F1 score, Cohen’s
the models to know the number of correct pre- Kappa score, and the AUC score were calculated
diction and wrong prediction. When positive and reported. All these metrics, combined
labels are predicted correctly, it is called as a true together, gave a clear indication about the perfor-
positive or TP. Similarly, when negative labels mance of the models.

from sklearn import metrics

print('Accuracy score is :', np.round(metrics.accuracy_score(y_test,


predictions),4))
print('Precision score is :', np.round(metrics.precision_score(y_test,
predictions, average='weighted'),4))
print('Recall score is :', np.round(metrics.recall_score(y_test,
predictions, average='weighted'),4))
print('F1 Score is :', np.round(metrics.f1_score(y_test, predictions,
average='weighted'),4))
print('ROC AUC Score is :', np.round(metrics.roc_auc_score(y_test,
prob_pred,multi_class='ovo', average='weighted'),4))
print('Cohen Kappa Score:', np.round(metrics.cohen_kappa_score(y_test,
predictions),4))
print('\t\tClassification Report:\n', metrics.classification_report(y_test,
predictions))

Applications of Artificial Intelligence in Medical Imaging


3.4 Disease detection using artificial intelligence 93

3.4.7 Experimental results Cohen’s Kappa score, and AUC score when
the features were extracted by VGG19 [26]
While experimenting, many different pre- model. The training accuracy of the XGBoost,
trained architectures were used to perform the Random Forest, and ANN was 100%, while the
classification. But, to check the performance on test accuracy was around 85%. This showed
CNNs trained from scratch, various CNNs that the classifiers were not generalizing well
with different number of layers were trained as on the unseen data when the features were
mentioned in Section 3.4.3. As Table 3.1 clearly being extracted by the VGG19 [26] model
shows that the overall performance of the mod- (Table 3.5).
els was higher when image augmentations As Table 3.5 shows, an ANN classifier per-
were used to increase the amount of data. It formed the best when an ResNet50 [7] model
was also seen that larger models fit better to was used for extracting the features. Overall
larger datasets as the CNN with seven layers test accuracy, F1 score, Kappa score, and AUC
performed the best when image augmentations score were higher when the amount of training
were applied, whereas the CNN with just three data was increased with the help of augmenta-
layers performed the best when no image aug- tions. In spite of performing the best, these
mentations were applied. The overall accuracy, models failed to generalize well and ended up
F1 score, and Kappa score are higher when the overfitting on the training data (Table 3.6).
number of training samples is higher. As observed from Table 3.6, ANN/MLP
Next series of experiments were with trans- classifier performed the best when the features
fer learning. Since the shallow network used to were extracted using an ResNet101 [7] model.
fine-tune the pretrained networks were the When the number of training samples was
same for all the models, the performance dif- increased by augmentations, there was a clear
ference was solely based on the pretrained increase in the performance of the classifiers
model’s abilities to extract features from the with the test accuracy being over 1% higher.
MRI scans (Table 3.2). When augmentations were not used, SVM clas-
As observed, ResNet101 [7] model performed sifier performed similar to the ANN/MLP clas-
the best with and without the image augmenta- sifier (Table 3.7).
tions. Since the training accuracy was 100%, it When MobileNet-v2 [27] was used to extract
indicated to possible overfitting, but as the test the features for classification, it was observed
accuracy was also above 99.5%, it was concluded that the classifiers gave a better result on the
that the models were generalizing well. nonaugmented data. Although the difference
For classification using deep feature extrac- was very small, ANN/MLP classifier per-
tion, as mentioned in Section 3.4.1, 11 different formed the best on nonaugmented data with
neural networks and 6 different classifiers MobilNet-v2 feature extractor (Table 3.8).
were used (Table 3.3). As observed from Table 3.8, ANN/MLP
When the features were extracted using classifier again emerged as the winner in terms
VGG16 [26] model, ANN classifier performed of test accuracy when the features were
the best, but the SVM classifier had the highest extracted from the augmented data using a
test accuracy. However, with the nonaugmented pretrained MobileNet [28]. When the features
data, XGBoost and Random Forest performed were extracted form a smaller nonaugmented
equally well with similar test accuracies and F1 dataset; however, Random Forest had the high-
score (Table 3.4). est test accuracy. It was also observed that all
As observed from Table 3.4, XGBoost classi- classifiers ended up overfitting on the training
fier had the highest test accuracy, F1 score, data (Table 3.9).

Applications of Artificial Intelligence in Medical Imaging


94 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

TABLE 3.1 Details of performance of the convolutional neural networks (CNNs) that were trained from scratch on
this data.

(A)

Classifier Training accuracy Validation accuracy Test accuracy F1 measure KAPPA AUC

CNN 2 Layer 0.74 0.72 0.76 0.7479 0.51 0.8671


CNN 3 Layer 0.81 0.79 0.78 0.7803 0.5666 0.8905
CNN 4 Layer 0.5 0.5 0.5 0.33 0 0.5

CNN 5 Layer 0.5 0.5 0.5 0.33 0 0.5


CNN 6 Layer 0.5 0.5 0.5 0.33 0 0.5
CNN 7 Layer 0.97 0.96 0.96 0.9599 0.92 0.99438
CNN 8 Layer 0.94 0.94 0.94 0.9308 0.8766 0.98753
(B)
CNN 2 Layer 1 0.93 0.9288 0.9282 0.8566 0.9776

CNN 3 Layer 1 0.94 0.94 0.9399 0.88 0.984


CNN 4 Layer 0.5 0.5 0.5 0.33 0 0.5
CNN 5 Layer 0.74 0.71 0.695 0.6673 0.39 0.9236
CNN 6 Layer 0.93 0.86 0.8783 0.8767 0.7566 0.9877
CNN 7 Layer 0.96 0.90 0.93166 0.9314 0.8633 0.99301
CNN 8 Layer 0.92 0.88 0.87166 0.8701 0.7433 0.97942

Table 3.1A shows the performance result of the CNNs on the data after image augmentations were applied and Table 3.1B shows the
perforce on the smaller dataset without image augmentations.

When the features of the data are extracted seen in Table 3.9 and Table 3.10. The introduction
using an Inception-V3 [29] network, Random of residual connections increased the overall per-
Forest classifier performed the best on almost formance of the models (Table 3.11).
all the metrics. The test accuracy was barely XGBoost classifier emerged as the clear win-
higher when augmentations are not performed. ner in terms of all the metrics when the features
It was also observed that all the models tended are extracted from the augmented data using a
to overfit on the training data and hence this DenseNet169 [31] model. When nonaugmented
led to poor generalization (Table 3.10). data was used, the overall test accuracy of all the
Random Forest classifier performed the best in classifiers was decreased by over 1%. With a
all the parameters when the features were training accuracy of 100% and a test accuracy of
extracted using Inception-ResNet-v2 [30] from 82.33%, it was very clear that the model was
the augmented data. With a test accuracy of overfitting on the training features (Table 3.12).
85.83%, it was higher by 2%. While ANN/MLP When DenseNet-121 [31] was used to extract
classifiers performed the best in ResNets, the features from the augmented data, XGBoost
introduction of Inception modules in the network model performed the best with 82.17% test accu-
tilted the balance toward tree-based models as racy and an F1 score of 0.82. When image

Applications of Artificial Intelligence in Medical Imaging


3.5 Discussion 95
TABLE 3.2 Results of the experiments that were conducted with transfer learning.
(A)

Classifier Training accuracy Validation accuracy Test accuracy F1 measure KAPPA AUC

ResNet50 1 0.99 0.99 0.9933 0.9867 0.9986

VGG16 1 0.99 0.99 0.9933 0.9867 0.9998


VGG19 1 0.98 0.99 0.9883 0.9766 0.9992
Inception_v3 0.84 0.84 0.8033 0.8032 0.6066 0.883
MobileNet 0.98 0.97 0.97 0.9716 0.9433 0.9972
DenseNet169 0.94 0.93 0.94 0.9328 0.8766 0.9808
DenseNet121 0.96 0.95 0.95 0.9516 0.9033 0.9925

InceptionResNet-V2 0.76 0.76 0.73 0.7253 0.4533 0.8237


MobileNet-V2 0.97 0.95 0.95 0.9516 0.9033 0.9932
ResNet101 1 0.98 0.995 0.9949 0.99 0.9998
(B)
ResNet50 1 0.97 0.9817 0.9817 0.9633 0.9982
VGG16 1 0.97 0.9567 0.9567 0.9133 0.9907

VGG19 0.99 0.95 0.9733 0.9733 0.9466 0.9869


Inception_v3 0.99 0.88 0.885 0.885 0.77 0.9485
MobileNet 1 0.96 0.9733 0.9733 0.9467 0.9957
DenseNet169 0.94 0.88 0.8967 0.8964 0.7933 0.9657
DenseNet121 0.95 0.9 0.9117 0.9111 0.8233 0.9765
InceptionResNet-V2 0.88 0.8 0.8117 0.8117 0.6233 0.8924

MobileNet-V2 0.97 0.95 0.95 0.9516 0.9033 0.9932


ResNet101 1 0.98 0.995 0.9949 0.99 0.9998

Table 3.2A contains the performance of the networks on the dataset after image augmentations were applied and Table 3.2B contains the
performance of the models on the dataset without any image augmentations.

augmentations were not used, the performance somewhat increased when image augmenta-
of all the models saw a significant drop and tions were not used.
ANN/MLP classifier performed the best among
all with an 80% test accuracy (Table 3.13).
When a network with depth separable con- 3.5 Discussion
volution was used, ANN/MLP classifier per-
formed the best with and without any data The series of experiments revealed the perfor-
augmentations. Test accuracy of the models mance of various machine learning algorithms

Applications of Artificial Intelligence in Medical Imaging


96 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

TABLE 3.3 The performance of various classifiers when the features were extracted using VGG16 [26].
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 100% 84.33% 84.67% 0.85 0.69 0.85 0.85 0.8466

KNN 89.94% 78.83% 80.50% 0.8 0.61 0.8 0.83 0.805

SVM 90.06% 81.00% 8883.00% 0.83 0.66 0.83 0.83 0.8283

Random Forest 100% 81.67% 82.33% 0.82 0.65 0.82 0.82 0.8233

AdaBoost 81.11% 75.15% 73.50% 0.73 0.47 0.74 0.74 0.735

XGBoost 100% 82.17% 83.67% 0.84 0.67 0.84 0.84 0.83666

(B)

ANN 100% 86.50% 85.67% 0.86 0.71 0.86 0.86 0.8566

KNN 89.94% 82.00% 82.33% 0.82 0.65 0.82 0.85 0.8233

SVM 84.50% 85.33% 0.85 0.71 0.85 0.85 0.8533

Random Forest 100% 84.83% 86.67% 0.87 0.73 0.87 0.87 0.8666

AdaBoost 83.61% 78.67% 75.83% 0.76 0.52 0.76 0.76 0.7583

XGBoost 100% 86.17% 86.67% 0.87 0.87 0.73 0.87 0.8666

Table 3.3A shows the performance when data augmentation was used, and Table 3.3B shows the performance when data augmentation was not used.

TABLE 3.4 Performance metrics of classifiers on the features extracted by a pretrained VGG19 [26] network.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 100% 87.17% 85.33% 0.85 0.71 0.85 0.85 0.8533

KNN 87.33% 78.33% 79.17% 0.79 0.58 0.79 0.81 0.7916

SVM 91.72% 86.50% 84.33% 0.84 0.69 0.84 0.84 0.8433

Random Forest 100% 85.00% 85.33% 0.85 0.71 0.85 0.85 0.8533

AdaBoost 84.06% 76.00% 77.33% 0.77 0.55 0.77 0.77 0.7733

XGBoost 100% 85.67% 86.00% 0.86 0.72 0.86 0.86 0.86

(B)

ANN 100% 83.67% 82.33% 0.82 0.65 0.82 0.82 0.8233

KNN 85.28% 78.50% 80.00% 0.8 0.6 0.8 0.83 0.8

SVM 91.78% 81.83% 82.33% 0.82 0.65 0.82 0.82 0.8233

Random Forest 100% 82.17% 81.33% 0.81 0.63 0.81 0.82 0.8133

AdaBoost 81.67% 71.50% 75.00% 0.75 0.5 0.75 0.75 0.7499

XGBoost 100% 84.33% 83.33% 0.83 0.67 0.83 0.83 0.8333

Table 3.4A shows the performance of the classifiers on the dataset with augmentations and Table 3.4B shows the performance on the dataset without augmentations.

Applications of Artificial Intelligence in Medical Imaging


3.5 Discussion 97
TABLE 3.5 Performance of the classifiers based on the features extracted by ResNet50 [7] model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 100% 88.33% 89.00% 0.89 0.78 0.89 0.89 0.89

KNN 92.11% 85.17% 85.33% 0.85 0.71 0.85 0.86 0.8533

SVM 92.78% 87.33% 85.00% 0.85 0.7 0.85 0.85 0.85

Random Forest 100% 86.67% 84.50% 0.84 0.69 0.84 0.85 0.845

AdaBoost 85.50% 81.33% 78.33% 0.79 0.58 0.79 0.79 0.7833

XGBoost 100% 87.33% 85.50% 0.85 0.71 0.86 0.86 0.855

(B)

ANN 100% 90.50% 87.83% 0.88 0.76 0.88 0.88 0.8783

KNN 92.89% 86.33% 87.00% 0.87 0.74 0.87 0.87 0.87

SVM 93.89% 87.83% 85.33% 0.85 0.71 0.85 0.85 0.8533

Random Forest 100% 85.50% 82.83% 0.83 0.66 0.83 0.83 0.8283

AdaBoost 85.67% 82.17% 77.17% 0.77 0.54 0.77 0.77 0.7716

XGBoost 100% 87.33% 85.50% 0.85 0.71 0.86 0.86 0.855

Table 3.5A shows the performance of the classifiers when the number of training samples were increased using image augmentations. Table 3.5B shows the performance of
the classifiers when image augmentations were not used.

TABLE 3.6 Performance of various classifiers trained on the features extracted by an ResNet101 [7] model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 100% 89.83% 88.83% 0.89 0.78 0.89 0.89 0.8883

KNN 92.89% 87.50% 88.50% 0.88 0.77 0.88 0.89 0.8849

SVM 93.33% 86.17% 87.00% 0.87 0.74 0.87 0.87 0.87001

Random Forest 100% 84.83% 84.33% 0.84 0.69 0.84 0.84 0.8433

AdaBoost 85.06% 77.33% 77.50% 0.77 0.55 0.78 0.78 0.775

XGBoost 100% 88.33% 85.00% 0.85 0.7 0.85 0.85 0.8499

(B)

ANN 100% 88.50% 87.33% 0.87 0.75 0.87 0.87 0.8733

KNN 92.17% 85.83% 84.67% 0.85 0.69 0.85 0.86 0.8466

SVM 93.11% 87.50% 87.33% 0.87 0.75 0.87 0.88 0.8733

Random Forest 100% 86.83% 86.50% 0.86 0.73 0.86 0.87 0.865

AdaBoost 86.57% 77.50% 75.00% 0.75 0.5 0.75 0.75 0.75

XGBoost 100% 85.50% 85.50% 0.85 0.71 0.86 0.86 0.855

Table 3.6A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.6B shows the
performance on the data without any image augmentations.

Applications of Artificial Intelligence in Medical Imaging


98 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

TABLE 3.7 Performance of various classifiers trained on the features extracted by a MobileNet-v2 [27] model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 100% 81.67% 77.50% 0.77 0.55 0.78 0.78 0.775

KNN 84.72% 75.33% 74.67% 0.74 0.49 0.75 0.76 0.7466

SVM 89.61% 79.50% 76.33% 0.76 0.53 0.76 0.76 0.7633

Random Forest 100% 75.83% 76.67% 0.77 0.53 0.77 0.77 0.766

AdaBoost 79.39% 74.00% 69.00% 0.69 0.38 0.69 0.69 0.69

XGBoost 100% 78.00% 79.00% 0.79 0.58 0.79 0.79 0.7899

(B)

ANN 100% 79.83% 79.83% 0.8 0.6 0.8 0.8 0.7983

KNN 84.67% 76.00% 77.70% 0.77 0.55 0.78 0.79 0.7766

SVM 91.39% 83.50% 78.17% 0.78 0.56 0.78 0.78 0.7816

Random Forest 100% 80.83% 76.50% 0.76 0.53 0.76 0.77 0.765

AdaBoost 81.89% 73.50% 71.17% 0.71 0.42 0.71 0.714 0.71166

XGBoost 100% 79.33% 76.33% 0.76 0.53 0.76 0.76 0.7633

Table 3.7A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and
Table 3.7B shows the performance on the data without any image augmentations.

TABLE 3.8 Performance of various classifiers trained on the features extracted by a MobileNet [28] model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 100% 78.83% 79.83% 0.8 0.6 0.8 0.8 0.7983

KNN 83.44% 73.00% 72.67% 0.72 0.45 0.73 0.74 0.7266

SVM 88.89% 77.50% 78.50% 0.78 0.57 0.78 0.79 0.7849

Random Forest 100% 76.17% 77.00% 0.77 0.54 0.77 0.77 0.77

AdaBoost 77.06% 68.83% 67.67% 0.68 0.35 0.38 0.68 0.6766

XGBoost 100% 77.50% 76.67% 0.77 0.53 0.77 0.77 0.7666

(B)

ANN 100% 82.67% 717.00% 0.77 0.54 0.77 0.7 0.7716

KNN 83.83% 76.50% 75.67% 0.75 0.51 0.76 0.79 0.7566

SVM 87.61% 83.33% 76.50% 0.76 0.53 0.76 0.77 0.765

Random Forest 100% 80.67% 77.50% 0.77 0.55 0.78 0.78 0.775

AdaBoost 79.56% 73.17% 70.17% 0.7 0.4 0.7 0.7 0.7016

XGBoost 100% 77.83% 77.00% 0.77 0.54 0.77 0.77 0.77

Table 3.8A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.8B shows the
performance on the data without any image augmentations.

Applications of Artificial Intelligence in Medical Imaging


3.5 Discussion 99
TABLE 3.9 Performance of various classifiers trained on the features extracted by an Inception-V3 [29] model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 100% 74.17% 76.83% 0.77 0.54 0.77 0.77 0.7683

KNN 85.33% 71.83% 76.17% 0.76 0.52 0.76 0.76 0.7616

SVM 84.78% 73.83% 74.83% 0.75 0.5 0.75 0.75 0.74833

Random Forest 100% 73.83% 77.83% 0.78 0.56 0.78 0.78 0.7783

AdaBoost 75.61% 60.67% 63.83% 0.64 0.28 0.64 0.64 0.6383

XGBoost 100% 73.67% 75.33% 0.75 0.51 0.75 0.75 0.7533

(B)

ANN 100% 77.00% 77.33% 0.77 0.55 0.77 0.77 0.7733

KNN 83.83% 74.83% 74.50% 0.74 0.49 0.74 0.75 0.745

SVM 84.78% 74.33% 72.00% 0.72 0.44 0.72 0.72 0.72

Random Forest 100% 76.67% 78.00% 0.78 0.56 0.78 0.78 0.78

AdaBoost 76.72% 64.67% 60.83% 0.61 0.22 0.61 0.61 0.6083

XGBoost 100% 75.83% 73.83% 0.74 0.48 0.74 0.74 0.7383

Table 3.9A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.9B shows the
performance on the data without any image augmentations.

TABLE 3.10 Performance of various classifiers trained on the features extracted by an Inception-ResNet-v2 [30]
model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 80.61% 74.67% 73.00% 0.73 0.46 0.73 0.73 0.73

KNN 83.44% 75.33% 74.33% 0.74 0.49 0.74 0.78 0.7433

SVM 70.11% 68.50% 65.67% 0.65 0.31 0.66 0.67 0.6555

Random Forest 100% 82.33% 85.83% 0.86 0.72 0.86 0.86 0.8583

AdaBoost 79.39% 70.67% 69.50% 0.69 0.39 0.7 0.7 0.695

XGBoost 100% 81.33% 83.83% 0.84 0.68 0.84 0.84 0.8383

(B)

ANN 79.78% 72.17% 73.00% 0.73 0.46 0.73 0.73 0.73

KNN 83.22% 72.00% 72.17% 0.72 0.44 0.72 0.74 0.7216

SVM 67.89% 65.67% 64.00% 0.64 0.28 0.64 0.64 0.64

Random Forest 100% 80.50% 81.17% 0.81 0.62 0.81 0.81 0.8116

AdaBoost 76.78% 70.17% 68.67% 0.69 0.37 0.69 0.69 0.6866

XGBoost 100% 80.67% 83.00% 0.83 0.66 0.83 0.83 0.8299

Table 3.10A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.10B shows the
performance on the data without any image augmentations.

Applications of Artificial Intelligence in Medical Imaging


100 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

TABLE 3.11 Performance of various classifiers trained on the features extracted by a DenseNet169 [31] model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 99.39% 81.67% 81.17% 0.81 0.62 0.81 0.82 0.8116

KNN 86.39% 77.17% 81.00% 0.81 0.62 0.81 0.83 0.81

SVM 80.00% 76.17% 76.00% 0.76 0.52 0.76 0.76 0.76

Random Forest 100% 81.00% 82.17% 0.82 0.64 0.82 0.82 0.8216

AdaBoost 79.27% 69.83% 72.50% 0.72 0.45 0.72 0.73 0.7249

XGBoost 100% 82.17% 82.83% 0.83 0.66 0.83 0.83 0.8283

(B)

ANN 99.61% 80.67% 79.83% 0.81 0.6 0.8 0.81 0.7983

KNN 86.89% 78.50% 78.33% 0.78 0.57 0.78 0.79 0.7833

SVM 79.78% 74.67% 71.83% 0.72 0.44 0.72 0.72 0.7183

Random Forest 100% 79.50% 80.17% 0.8 0.6 0.8 0.8 0.8016

AdaBoost 79.06% 68.33% 69.00% 0.69 0.38 0.69 0.69 0.69

XGBoost 100% 79.17% 81.17% 0.81 0.62 0.81 0.81 0.8116

Table 3.11A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.11B shows the
performance on the data without any image augmentations.

TABLE 3.12 Performance of various classifiers trained on the features extracted by a DenseNet121 [31] model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 99.94% 83.00% 81.17% 0.81 0.62 0.81 0.82 0.8116

KNN 87.50% 81.50% 81.17% 0.81 0.62 0.81 0.82 0.8116

SVM 79.83% 74.83% 74.17% 0.74 0.48 0.74 0.75 0.7416

Random Forest 100% 81.33% 81.50% 0.81 0.63 0.82 0.82 0.815

AdaBoost 78.06% 67.67% 69.50% 0.69 0.39 0.7 0.7 0.695

XGBoost 100% 81.50% 82.17% 0.82 0.64 0.82 0.82 0.8216

(B)

ANN 100% 82.50% 80.50% 0.8 0.61 0.8 0.81 0.805

KNN 87.33% 74.00% 78.67% 0.78 0.57 0.79 0.8 0.7866

SVM 82.44% 78.50% 72.67% 0.73 0.45 0.73 0.723 0.7266

Random Forest 100% 81.33% 79.67% 0.8 0.59 0.8 0.8 0.7966

AdaBoost 80.50% 69.83% 69.50% 0.69 0.39 0.7 0.7 0.695

XGBoost 100% 83.17% 80.00% 0.8 0.6 0.8 0.8 0.8

Table 3.12A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.12B shows the
performance on the data without any image augmentations.

Applications of Artificial Intelligence in Medical Imaging


3.5 Discussion 101
TABLE 3.13 Performance of various classifiers trained on the features extracted by a XceptionNet [32] model.
(A)

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision AUC

ANN 100% 81.33% 81.33% 0.81 0.63 0.81 0.81 0.8133

KNN 86.39% 76.17% 75.50% 0.75 0.51 0.76 0.77 0.7549

SVM 86.11% 79.00% 76.50% 0.76 0.53 0.76 0.77 0.765

Random Forest 100% 79.50% 77.50% 0.77 0.55 0.78 0.78 0.7749

AdaBoost 80.56% 73.50% 71.00% 0.71 0.42 0.71 0.71 0.71

XGBoost 100% 80.83% 81.00% 0.81 0.62 0.81 0.81 0.8099

(B)

ANN 100% 79.83% 83.50% 0.83 0.67 0.84 0.84 0.835

KNN 84.56% 75.50% 77.50% 0.77 0.55 0.78 0.79 0.775

SVM 86.56% 77.83% 78.50% 0.78 0.57 0.78 0.79 0.785

Random Forest 100% 79.50% 83.17% 0.83 0.66 0.86 0.83 0.8316

AdaBoost 78.50% 69.83% 76.00% 0.76 0.52 0.76 0.76 0.76

XGBoost 100% 78.67% 81.00% 0.81 0.62 0.81 0.81 0.8099

Table 3.13A shows the performance of the classifiers when the features are extracted from the dataset after image augmentations are applied and Table 3.13B shows the
performance on the data without any image augmentations.

and deep learning techniques while classifying accuracy was higher when image augmentations
normal (without tumor) and abnormal (with were used to increase the number of training
tumor) brain MRI images. From the observa- samples. This goes on to show how important
tions, it is very clear that on an average, the algo- data is for deep learning models to learn.
rithms perform better for augmented data. This Without sufficient data, the models fail to extract
is because, higher number of data points lead to meaningful patterns and hence the performance
higher knowledge of the model about the data. decreases. This trend is clearly shown in Fig. 3.18.
This is because machine learning algorithms are The test accuracy of IncpetionNet-v3 and
very data centric and require a huge amount of InceptionResnetv-2 is higher when data aug-
information to perform tasks. mentation is not performed, but their overall
As evident from Fig. 3.17, all the machine performance is lower when compared to other
learning classifiers had 1% higher test accuracy models. These can be outlier cases as in
on an average when the features were collected machine learning, it is a general rule that high-
from the bigger dataset which was created er amount of data leads to better generalization
after applying image augmentations. Since the of models (Fig. 3.19).
dataset used was balanced, the “test accuracy” When inception modules are introduced in
metric holds the most importance. If the data- the network, the test accuracy of the model
set were unbalanced, then F1 and AUC scores drops to 70%80%. A similar drop in perfor-
would be of more importance as the accuracy mance was also observed for all the other perfor-
would give an illusion of performance. mance metrics. This shows that InceptionNets
When transfer learning was used to classify are not well suited for problem. It could also be
the MRI scans, it was observed that the test concluded that further tuning and a larger

Applications of Artificial Intelligence in Medical Imaging


102 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

80

70

60
Test Accuracy

50

augmented data
40 no augmentations used

30

20

10

0
ann knn svm rf ada xgb

FIGURE 3.17 Comparison of the performance of the models on the features from augmented and nonaugmented data.

1.0 augmented data


no augmentations used

0.8

0.6
Test Accuracy

0.4
augmented data
no augmentations used

0.2

0.0
50 G16 G19 -V3 et 9 1 2 V2 101
et G G n i l eN t16 t 12 et-v et- t-
N o e e
Re
s V V
p ob eN eN esN ileN sNe
In
ce M ns ns nR Mob Re
De De po
ce
In
FIGURE 3.18 Comparison of various models for transfer learning on augmented and nonaugmented data.

Applications of Artificial Intelligence in Medical Imaging


3.5 Discussion 103
1.00

0.95

0.90
acc

0.85

0.80

aug
0.75 augmented data
no augmentations used

0 6 9 V3 et 9 1 2
t-V
2
01
t5 G1 G1 n- eN 16 12 t-v
sN
e
VG VG o b i l et
N et
Ne N e
et-1
Re p o eN e
Re
s ile N
ce M ns ens n ob R es
In D e D o M
cep
In
(A)
1.00

0.95

0.90
f1

0.85

0.80

aug
0.75 augmented data
no augmentations used

50 6 9 v3 et 69
et G1 G1 n- 21 t-v
2
t-v
2 01
N VG G o i l eN et1 e t1 e e et-1
Re
s V p ob se
N
se
N sN leN sN
nce M e n en nRe o bi Re
I D D o M
p
nce
I
(B)
FIGURE 3.19 Comparison of the test accuracies of various models trained by transfer learning. (A) shows the test accu-
racies of various models and (B) shows the F1 score of various models.

Applications of Artificial Intelligence in Medical Imaging


104 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

dataset could have improved the performance of completely dependent on the features them-
inception nets. ResNets and VGG Nets per- selves. The features were a mapping of a vector
formed the best with ResNet50 having a test in the high-dimensional space (the image) to a
accuracy of 99% and an F1 score of 0.9933 and vector in a lower dimensional space (the latent
VGG16 having a test accuracy of 99% and an F1 representation / features). It was observed that
score of 0.9933. Similarly, the bigger ResNet101 the classifiers had the highest average test
and VGG19 have test accuracies of 99.5% and accuracy on the features extracted by ResNets.
99% and F1 score of 0.9949 and 0.9883, respec- The average test accuracy of classifiers trained
tively. So, networks with residual connections on the features extracted by ResNet101 from
ended up performing the best in transfer learn- the augmented dataset is 85.193%, whereas the
ing on this dataset. test accuracy of the classifiers trained on the
Deep feature extraction helped in determin- features extracted by ResNet50 from the aug-
ing which models preserve the maximum data mented dataset is 84.61%. Fig. 3.20 also shows
about the original image in their latent repre- that the features extracted form a smaller non-
sentation. This meant that we could figure out augmented dataset. The classifiers still perform
which network maps the original 224 3 224 3 3 the best on the features extracted by ResNets
dimensional space to a smaller 64 3 1 dimen- with ResNet50 having an average of 84.27%
sional space. Since the same classifiers with the and ResNet101 having an average of 84.38%
exact same hyperparameters were used to (Fig. 3.21).
classify the images from the extracted features, When CNNs of varying depths were trained
the difference in accuracies completely was on the data from scratch, it was observed that

84

82
Test Accuracy

80

78

76

aug
74
augmented data
no augmentations used

0 6 9 3 t 21 2
Ne
9

01

et
2

t5 G1 G1 -V t-V
16

-v

t1
nN

e n e
-1

l Ne
et

sN G G o i e
et

et

V V ob eN
o
sN

Re ile
eN

ep
sN

ns
Re

ep

c M ob
ns

Re

In e
on

Xc

D
De

M
p
ce
In

Network
FIGURE 3.20 Average test accuracies of all the classifiers on the features extracted by various deep learning models.
This plot shows which network represents the original data the best.

Applications of Artificial Intelligence in Medical Imaging


3.5 Discussion 105
the bigger networks (with more convolutional worst and showed no agreement between the
layers) performed better on the larger dataset test data and labels which was evident from
(with augmentations). The models of four, five, the extremely low Cohen’s Kappa score. The
and six convolutional layers performed the maximum test accuracy of 94% was shown by

0.9
Test Accuracy

0.8

0.7

0.6
aug
augmented data
0.5 no augmentations used
N

N
N

N
C

C
d

d
re

re

re

re

re

re

re
ye

ye

ye

ye

ye

ye

ye
la

la

la

la

la

la

la
2

8
Network

FIGURE 3.21 Test accuracies of various CNNs that were trained from scratch on the dataset. CNNs, Convolutional
neural networks.

82

80

78
acc

76

74
aug
augmented data
72 no augmentations used

n n m rf a b
an kn sv ad xg
algo

FIGURE 3.22 Average test accuracies of the machine learning classifiers on the features extracted by various convolu-
tional neural networks.

Applications of Artificial Intelligence in Medical Imaging


106 3. Magnetic resonance imagining-based automated brain tumor detection using deep learning techniques

the eight-layered CNN on augmented data. References


The two-layered CNN shows the maximum
[1] R.L. Siegel, K.D. Miller, H.E. Fuchs, A. Jemal, Cancer sta-
accuracy of 92.8% on the smaller nonaugmen- tistics, 2021, CA: A Cancer J. Clinicians 71 (1) (2021) 733.
ted data. It was evident that it would take Available from: https://doi.org/10.3322/caac.21654.
much more data to train the CNNs from [2] Cancer.Net, Computed tomography (CT) scan, 2012.
scratch and hence, transfer learning and deep ,https://www.cancer.net/navigating-cancer-care/diag-
nosing-cancer/tests-and-procedures/computed-tomogra-
feature extraction were superior methods to
phy-ct-scan. (accessed 14.05.21).
handle the problem of brain tumor detection. [3] Cancer.Net, Positron emission tomography and com-
ANN/MLP classifier and tree-based classifiers puted tomography (PET-CT) scans, 2012. ,https://
perform the best in terms of test accuracies on www.cancer.net/navigating-cancer-care/diagnosing-can-
features extracted by pretrained convolution cer/tests-and-procedures/positron-emission-tomography-
and-computed-tomography-pet-ct-scans. (accessed
neural networks. ANN classifier performed the
14.05.21).
best with 81.69% test accuracy on the augmented [4] Cancer.Net, Brain tumor - diagnosis, 2012. ,https://
data and it was followed by XGBoost classifier www.cancer.net/cancer-types/brain-tumor/diagnosis.
with 81.9% test accuracy. As shown in Fig. 3.22, (accessed 14.05.21).
AdaBoost classifier had the least average accu- [5] Y. LeCun, Y. Bengio, G. Hinton, Deep learning,
Nature 521 (7553) (2015) 436444. Available from:
racy on the test set. The training accuracy of
https://doi.org/10.1038/nature14539.
ANN, XGBoost, and Random Forest was 100% [6] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet
for most of the features. This indicated that these classification with deep convolutional neural net-
models tend to overfit on the training data and works, Commun. ACM 60 (6) (2017) 8490. Available
require a larger dataset to generalize well. from: https://doi.org/10.1145/3065386.
[7] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning
for image recognition, arXiv:1512.03385 [cs], Dec. 2015,
,http://arxiv.org/abs/1512.03385. (accessed 15.05. 21).
3.6 Conclusion [8] C. Szegedy et al., Going deeper with convolutions,
arXiv:1409.4842 [cs], Sep. 2014, ,http://arxiv.org/
This chapter deals with the problem of auto- abs/1409.4842. (accessed 15.05.21).
[9] D.R. Sarvamangala, R.V. Kulkarni, Convolutional neu-
mated brain tumor detection using various
ral networks in medical image understanding: a sur-
deep learning techniques. For this, a plethora vey, Evol. Intell. 15 (2021). Available from: https://
of deep learning techniques such as transfer link.springer.com/article/10.1007/s12065-020-00540-3
learning, deep feature extraction, and training (accessed 18.05.21).
CNNs from scratch were used. Multiple [10] A.S. Lundervold, A. Lundervold, An overview of
deep learning in medical imaging focusing on MRI,
experiments were conducted to evaluate the
Zeitschrift für Medizinische Physik 29 (2019).
performance of these techniques. This chapter https://www.sciencedirect.com/science/article/pii/
tried to show the importance of machine learn- S0939388918301181(accessed May 18, 2021).
ing in detection of brain tumor and their effi- [11] M.I. Razzak, S. Naz, A. Zaib, Deep learning for medi-
ciency of doing that. Detailed comparative cal image processing: overview, challenges and future,
arXiv:1704.06825 [cs], Apr. 2017, ,http://arxiv.org/
studies were carried out to determine the best
abs/1704.06825. (accessed 18.05.21).
possible models and techniques. Multiple per- [12] S. Ahuja, B.K. Panigrahi, T. Gandhi, Transfer learning
formance metrics were used in order to avoid based brain tumor detection and segmentation using
falling in accuracy traps and other performance superpixel technique, in: 2020 International Conference
metrics related biases. The findings have on Contemporary Computing and Applications (IC3A),
Feb. 2020, pp. 244249. Available from: https://doi.org/
revealed that CNNs are able to detect brain
10.1109/IC3A48958.2020.233306.
tumors with 99% accuracy. We are also focused [13] R. Mehrotra, M.A. Ansari, R. Agrawal, R.S. Anand,
on creating novel ways to solve the problem of A Transfer Learning approach for AI-based classifi-
MRI-based automated brain tumor detection. cation of brain tumors, Mach. Learn. Appl. 2 (2020)

Applications of Artificial Intelligence in Medical Imaging


References 107
100003. Available from: https://doi.org/10.1016/j. IEEE 86 (11) (1998) 22782324. Available from:
mlwa.2020.100003. https://doi.org/10.1109/5.726791.
[14] F. Özyurt, E. Sert, E. Avci, E. Dogantekin, Brain tumor [23] IEEE, ImageNet: a large-scale hierarchical image data-
detection based on Convolutional Neural Network with base. ,https://ieeexplore.ieee.org/document/5206848.,
neutrosophic expert maximum fuzzy sure entropy, 2009 (accessed 15.08.21).
Measurement 147 (2019) 106830. Available from: https:// [24] A.W. Senior, et al., Improved protein structure predic-
doi.org/10.1016/j.measurement.2019.07.058. Dec. tion using potentials from deep learning, Nature 577
[15] V. Jalali, D. Kaur, A study of classification and feature (7792) (2020). Available from: https://doi.org/
extraction techniques for brain tumor detection, Int. J. 10.1038/s41586-019-1923-7. Art. no. 7792, Jan.
Multimed. Info Retr. 9 (4) (2020) 271290. Available [25] O. Ronneberger, P. Fischer, T. Brox, U-Net: convo-
from: https://doi.org/10.1007/s13735-020-00199-7. lutional networks for biomedical image segmenta-
[16] D. Deb, S. Roy, Brain tumor detection based on hybrid tion, arXiv:1505.04597 [cs], ,http://arxiv.org/abs/
deep neural network in MRI by adaptive squirrel 1505.04597., May 2015 (accessed 26.07.21).
search optimization, Multimed. Tools Appl. 80 (2) [26] K. Simonyan, A. Zisserman, Very deep convolu-
(2021) 26212645. Available from: https://doi.org/ tional networks for large-scale image recognition,
10.1007/s11042-020-09810-9. arXiv:1409.1556 [cs], ,http://arxiv.org/abs/
[17] N. Ambily, K. Suresh, Classification of brain MRI images 1409.1556., Apr. 2015 (accessed 27.06.21).
using convolution neural network and transfer learning, [27] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C.
in: 2020 11th International Conference on Computing, Chen, MobileNetV2: inverted residuals and linear bot-
Communication and Networking Technologies (ICCCNT), tlenecks, arXiv:1801.04381 [cs], ,http://arxiv.org/
Jul. 2020, pp. 16. Available from: https://doi.org/ abs/1801.04381., Mar. 2019, (accessed 27.07.21).
10.1109/ICCCNT49239.2020.9225504. [28] A.G. Howard et al., MobileNets: efficient convolu-
[18] J. Amin, et al., Brain tumor detection by using stacked tional neural networks for mobile vision applica-
autoencoders in deep learning, J. Med. Syst. 44 (2) tions, arXiv:1704.04861 [cs], ,http://arxiv.org/abs/
(2020) 32. Available from: https://doi.org/10.1007/ 1704.04861., Apr. 2017 (accessed 27.07.21).
s10916-019-1483-2. [29] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z.
[19] R.B. Vallabhaneni, V. Rajesh, Brain tumour detection Wojna, Rethinking the inception architecture for com-
using mean shift clustering and GLCM features with puter vision, arXiv:1512.00567 [cs], ,http://arxiv.
edge adaptive total variation denoising technique, org/abs/1512.00567., Dec. 2015 (accessed 27.07.21).
Alex. Eng. J. 57 (4) (2018) 23872392. Available from: [30] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi,
https://doi.org/10.1016/j.aej.2017.09.011. Inception-v4, Inception-ResNet and the impact of
[20] H.M. Rai, K. Chatterjee, S. Dashkevich, Automatic and residual connections on learning, arXiv:1602.07261
accurate abnormality detection from brain MR images [cs], ,http://arxiv.org/abs/1602.07261., Aug. 2016
using a novel hybrid UnetResNext-50 deep CNN (accessed 27.07.21).
model, Biomed. Signal. Process. Control. 66 (2021) [31] G. Huang, Z. Liu, L. van der Maaten, K.Q.
102477. Available from: https://doi.org/10.1016/j. Weinberger, Densely connected convolutional net-
bspc.2021.102477. works, arXiv:1608.06993 [cs], ,http://arxiv.org/
[21] M. Leshno, V.Ya Lin, A. Pinkus, S. Schocken, abs/1608.06993., Jan. 2018 (accessed 27.07.21).
Multilayer feedforward networks with a nonpolyno- [32] F. Chollet, Xception: deep learning with depthwise sepa-
mial activation function can approximate any func- rable convolutions, arXiv:1610.02357 [cs], ,http://arxiv.
tion, Neural Netw. 6 (6) (1993) 861867. Available org/abs/1610.02357., Apr. 2017 (accessed 27.07.21).
from: https://doi.org/10.1016/S0893-6080(05)80131-5. [33] Hamada, Br35H: brain tumor detection 2020.
[22] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient- ,https://kaggle.com/ahmedhamada0/brain-tumor-
based learning applied to document recognition, Proc. detection. (accessed 28.07. 21).

Applications of Artificial Intelligence in Medical Imaging


This page intentionally left blank
C H A P T E R

4
Breast cancer detection from
mammograms using artificial intelligence
Abdulhamit Subasi1,2, Aayush Dinesh Kandpal3, Kolla Anant Raj4
and Ulas Bagci5
1
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 2Department of
Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia 3Department of
Metallurgical and Materials Engineering, National Institute of Technology Rourkela, Rourkela, Odisha,
India 4Department of Electrical Engineering, Indian Institute of Technology Kharagpur, Kharagpur,
West Bengal, India 5Northwestern University, Chicago, IL, United States

O U T L I N E

4.1 Introduction 109 4.4.1 Feature extraction using deep


learning 115
4.2 Background and literature review 111
4.4.2 Prediction and classification 116
4.3 Artificial intelligence techniques 112 4.4.3 Experimental data 119
4.3.1 Artificial neural networks 112 4.4.4 Performance evaluation measures 123
4.3.2 Deep learning 112 4.4.5 Experimental results 124
4.3.3 Convolutional neural networks 114
4.5 Discussion 133
4.4 Breast cancer detection using artificial
4.6 Conclusion 134
intelligence 115
References 135

4.1 Introduction tumors are categorized as benign or malignant:


the tumor is malignant if the cells can grow into
Breast cancer occurs when cells in the breast (invade) surrounding tissues or spread (metasta-
region begin to grow out of control. These cells size) to distant areas of the body, otherwise called
most commonly form a tumor that can often be benign. As of 2017, 1 out of every 800 women
seen on an X-ray film or felt as a lump. These was reported to have breast cancer in the United

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00005-0 109 © 2023 Elsevier Inc. All rights reserved.
110 4. Breast cancer detection from mammograms using artificial intelligence

States. As of 2020, female breast cancer has sur- Deep learning models can make it possible to
passed lung cancer as the most commonly diag- work with highly specific regions in mammo-
nosed cancer, with an estimated 2.3 million new grams and avoid these. However, even deep
cases (it constitutes 11.7% of all forms of cancer). learning models are not error free and can often
It is the fifth leading cause of cancer mortality misclassify the benign and malignant classes for
worldwide, with 685,000 deaths. Among women, one another. The use of deep learning models is
breast cancer accounts for 1 in 4 cancer cases and governed by the amount of good-quality data
1 in 6 cancer deaths, ranking first for incidence in that is available for training. Due to the high
the vast majority of countries [1]. specificity of the application, the data required
Because of the medical importance of breast for model training is to be of high quality.
cancer screening, CAD (computer-aided detec- Very often, it is observed that the datasets com-
tion) methods for detecting anomalies such piled by institutions and organizations is highly
as calcifications, masses, architectural distortion, skewed and imbalanced. This can lead to the
and bilateral asymmetry have been created [2]. propagation of unwanted class-specific bias
One reason why detection of breast cancer is dif- through the model while in the training process.
ficult is because mammography results are Through the help of deep learning this can
highly dependent on the age of the patient, be dealt with by using the help of data aug-
breast density, and the type of lesion present. mentation. This allows researchers to increase
The density can lead to differences in the con- the amount of data in different classes by
trast of the malignant regions and could lead to using image manipulation techniques such
wrong conclusion. This field of research is highly as flipping, rotation of image through certain
sensitive to details in the mammograms of each angles, applying geometric transformations,
individual and hence requires accurate and fas- color manipulation, and images. One such
ter methods of processing large amounts of data. approach of increasing images per class in imbal-
Deep learning techniques are revolutioniz- anced datasets is using GANs (generative adver-
ing the field of medical image analysis, and sarial networks). Through GANs similar images
hence in this study, we utilized convolutional are formed by the introduction of random/spe-
neural networks (CNNs) for early detection cific noise vectors to the images. Some new-age
of breast cancer to minimize the overheads of techniques such GANs, neural style transfer, and
manual analysis [3]. Deep learning has been meta-learning make it possible to artificially
instrumental in dealing with large amounts improve the class distributions and helped
of data with relative ease. Preprocessing meth- remove bias for specific classes to some extent.
ods, such as image augmentation, make it pos- Mammography is the mostly used breast
sible to train models on selective ROI (region cancer screening technique. It is a type of imag-
of interest) and extract region-specific infor- ing that uses a low-dose X-ray system to exam-
mation for further model building. In general, ine the breast. It is considered the most reliable
complications may arise due to presence of cal- method for screening breast abnormalities
cium deposits which are also known as calcifi- before they become clinically palpable [2,4,5].
cations and microcalcification. These granular Transfer learning has been used extensively for
deposits present themselves in irregular shapes, this purpose, as shown in work proposed by a
lines. The presence of such deposits may lead paper in 2018 [6]. In this chapter, we aim to eval-
to the magnified and highlighted dark white uate and assess the impact of artificial intelli-
spots in the mammograms. These white spots gence (AI)-based techniques that can help health
could be mistaken for abnormal breast lesions professionals screen and diagnose breast cancer
and lead to false conclusions. early and help save lives [7] (Fig. 4.1).

Applications of Artificial Intelligence in Medical Imaging


4.2 Background and literature review 111

FIGURE 4.1 Overview of the proposed CAD system. CAD, Computer-aided detection.

4.2 Background and literature review practice in many cases has a high chance of
propagating human errors and ultimately lead
Extensive work has been done in this field in to false conclusions. Now with the help of deep
the last few decades. However, in recent years learning models, outlines and regions can be
deep learning has played a pivotal role in early highlighted automatically. The primary concern
breast cancer detection through classification that exists with regards to manual outlining of
and segmentation as object detection models cancerous regions is the fact that radiologists
evolve. Segmentation of abnormal masses and may sometimes not be able to completely ana-
classification of these masses into different clas- lyze the density of different regions in the mam-
ses has been the most crucial objective. Deep mograms and this could lead to false labels
learning has made it possible to use large pre- and outlines of the cancerous regions. There are
trained architectures to retrain mammography many techniques to generate 3D scans of breasts
data through transfer learning. Recently, the use to analyze the images and detect cancerous
of an encoderdecoder architecture to classify developments, and mammography data has
mammograms into different classes has been been widely used compared to other methods.
proposed. Some researchers have employed One advantage of mammography over other
deep learning in the literature to detect suspi- scanning methods is that mammography
cious breast lesions to improve classification exposes patients to relatively low amounts of
accuracy. Traditional CAD-based systems that radiation. Recent studies in this field have been
have been implemented over the past few dec- observed to focus on three crucial steps, namely,
ades still required radiologist to specifically segmentation/feature extraction, data augmenta-
manually outline the cancerous regions. This tion, and classification. The segmentation/feature

Applications of Artificial Intelligence in Medical Imaging


112 4. Breast cancer detection from mammograms using artificial intelligence

extraction process helps in extracting relevant pattern recognition, and predictive modeling
information (specific features) from the mam- and have proved to deliver excellent perfor-
mograms and includes identifying and labeling mance on such data. ANNs have been fast
the ROIs for further processing. In the next growing for their ability to adapt to different
step, various manipulation techniques could be kinds of data. This is done with the combina-
used to clean the data and increase the data tion of external networks, deep networks, and
artificially using techniques such as GAN [8] hyperparameter optimization [15]. Our research
and neural style transfer [9]. In the last step, involves training a classification model to
that is, classification, CNN is used to identify increase the depth and then tuning the hyper-
the mammograms either as benign or malig- parameters to achieve optimal results. ANNs,
nant. The segmentation process has seen the in particular, have been extensively used in
rise of CNN architectures especially for medical disease classification. ANNs have been used for
image segmentation tasks such as U-Net and the classification and segmentation of diseases
U^2-Net [10,11]. Some of the examples that such as Alzheimer’s disease, breast cancer, lung
have been mentioned below are great examples cancer, and brain tumors, among other well-
of how deep learning has helped improve known diseases [1619]. A comparative study
breast cancer detection through segmentation presented in 2020 focused on the strengths
and classification in recent times. In 2019 a and weaknesses of some ANN models such as
method was proposed using the patch-based ML-NN, SNN, SADE, DBN, and PCA-Net. The
CNN method to detect breast lesions in full- study reveals that ANN has been very widely
field digital mammograms [12]. In one study, used, and the performance achieved by ANN-
transfer learning was used to pretrain their based models is auspicious and is comparable
CNN model using an extensive public database with the results of state-of-the-art CNN archi-
of digitized mammograms (i.e., CBIS-DDSM tecture [20]. The results achieved by ANNs
dataset), and then it was tested the trained in disease classification look promising, and its
model using the INbreast dataset [13]. The wide use in this field is rapidly on the rise.
study evaluated breast detection using VGG16,
ResNet50, and InceptionV3 as in-depth feature
extractors. It was concluded that the model
4.3.2 Deep learning
based on InceptionV3 achieved promising The swift advancement of deep learning con-
results with a true positive rate (TPR) of 98% tinues to aid the medical imaging community
[12]. In 2018 a CAD system based on the Faster in applying advanced techniques to improve
R-CNN detected and classified breast lesions as the accuracy of cancer screening. Many CAD-
benign or malignant. The study evaluated and based screening techniques have been exten-
tested the model using the breast dataset, sively researched since their inception in the
achieving an overall classification accuracy of 1990s. Deep learning has vastly helped in
95% in area under the ROC curve (AUC) [14]. improving accuracy in early detection of breast
cancer through mammograms. In recent years
deep learning has revolutionized fields con-
cerned with object detection and classification,
4.3 Artificial intelligence techniques pattern recognition, and many other domains
[21]. A breakthrough in the field of image pro-
4.3.1 Artificial neural networks
cessing was achieved in the year 2012 when a
Artificial neural networks (ANNs) have been deep learning model (convolution neural net-
used in many fields for classification tasks, work) was able to outperform all other models

Applications of Artificial Intelligence in Medical Imaging


4.3 Artificial intelligence techniques 113
in the ImageNet Large-Scale Visual Recognition researchers at DeepMind [25,26]. Deep learning
Challenge [22]. Deep learning models have has been very useful to the medical imaging
two primary advantages over classical machine community in not only recognition and detec-
learning algorithms that do not use distributed tion task but also in segmentation tasks. The
representations. First, learning distributed U-Net and U^2-Net architectures have been
representations enable generalization to new used extensively in segmentation tasks and
combinations of the values of learned features allow for efficient use of available data in specif-
beyond those seen during training. Second, ically biomedical imaging applications [10,11].
deep learning models can learn highly specific Overall, it can be said that the use of deep learn-
information with an increase in depth [23]. ing is still in its infancy stage. The adoption
Some of the health-care applications of deep of advanced techniques such as deep learning
learning include biosignal analysis (ECG), the models is at different stages in different parts of
prediction of cardiac arrests, sudden seizures, the world. However, deep learning models have
drug discovery, and analysis of electronic health shown promising results in the research works
records [24]. Recently, a 50-year-old protein fold- and studies done so far. These techniques have
ing problem was solved using deep learning helped the health industry in multiple ways and
models. These models were used in accurately continue to evolve at a rapid rate [27]. A simple
predicting protein structures by a group of deep learning Python code is given below.

#Defining our Deep Neural Network Model


# Initiating DNN Model
DNN_model=Sequential()

# Adding Batch Normalization, Dropout and Dense Layers


DNN_model.add(Dense(16, input_dim=128, kernel_initializer = 'uniform',
activation = 'relu'))
DNN_model.add(BatchNormalization())
DNN_model.add(Dropout(0.2))
DNN_model.add(Dense(32, kernel_initializer = 'uniform', activation = 'relu'))

# Adding Batch Normalization, Dropout and Dense Layers


DNN_model.add(BatchNormalization())
DNN_model.add(Dropout(0.2))
DNN_model.add(Dense(64, kernel_initializer = 'uniform', activation = 'relu'))

# Adding Batch Normalization, Dropout and Dense Layers


DNN_model.add(BatchNormalization())
DNN_model.add(Dropout(0.2))
DNN_model.add(Dense(64, kernel_initializer = 'uniform', activation = 'relu'))

# Output Layer
DNN_model.add(Dense(1,activation='sigmoid'))

# Model Summary
DNN_model.summary()

Applications of Artificial Intelligence in Medical Imaging


114 4. Breast cancer detection from mammograms using artificial intelligence

4.3.3 Convolutional neural networks information specific to that image that helps
classify images into different categories, such
CNNs are specifically designed to perform as malignant or benign in our case. Among
better on image data. In the past few years, other techniques that examine the breast, the
deep learning has achieved excellent perfor- mammogram is a widely utilized and depend-
mance in various fields, such as visual recogni- able screening innovation, and in our work,
tion, speech recognition, and natural language we have used various CNN architectures to
processing. Among different types of deep detect cancerous tissues in the Mammographic
neural networks, CNNs have been most exten- Image Analysis Society (MIAS) [28,29] and
sively studied. Leveraging on the rapid growth CBIS-DDSM [30,31] datasets. During a con-
in the amount of the annotated data and volution, the original image size is reduced.
the significant improvements in the strengths To maintain the image size, various padding
of graphics processor units, the research on techniques are used. A CNN consists primarily
CNNs has been emerged swiftly and achieved of three steps: Feature extraction, dimensional-
state-of-the-art results on various tasks. CNNs ity reduction, and classification. The convo-
can do this by analyzing the images with a lutional layers perform the task of feature
grid-like structure (Fig. 4.2). Multiple convolu- extraction, pooling layers perform feature size
tional layers create successive feature maps. reduction, and finally, the SoftMax layer classi-
Various filters are used through which the fies the image. CNNs tend to outperform dense
image is passed to retrieve relevant informa- neural networks (feed-forward networks) as
tion such as edge detection, pixel correlations, they can extract image-specific information
and pixel-specific information, among other through the use of multiple feature maps. In
valuable features. These feature maps contain contrast, densely connected networks simply

FIGURE 4.2 Simple CNN architecture with image augmentation. CNN, Convolutional neural network.

Applications of Artificial Intelligence in Medical Imaging


4.4 Breast cancer detection using artificial intelligence 115
flatten the image and feed the flattened infor- algorithms for accurate diagnosis. This is made
mation to successive layers without any image- possible with the help of deep learning. In this
specific context. A sample Python code is given regard, it provides a basis for developing new
below. and improved algorithms that are better

# Simple CNN Model


# Initiating CNN Model
model = Sequential()

# Adding Convolution, MaxPooling and Dropout Layers


model.add(Conv2D(16, (3, 3), padding='same', activation='relu', input_shape=(100,100,3)))
model.add(MaxPooling2D(pool_size= (2, 2)))
model.add(Dropout(0.25))

# Adding Convolution, MaxPooling and Dropout Layers


model.add(Conv2D(32,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))

# Adding Convolution, MaxPooling and Dropout Layers


model.add(Conv2D(64,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))

# Adding Flattening and Dense Layer


model.add(Flatten())
model.add(Dense(16, activation='relu'))

# Final Output Layer


model.add(Dense(1, activation='sigmoid'))

4.4 Breast cancer detection using equipped to generalize data by enabling the
artificial intelligence computer to build complex insights out of sim-
ple ideas [23].
4.4.1 Feature extraction using deep The training images were passed through
learning the pretrained models to generate a feature
A total of 11 models have been used for embedding/feature vector of either 16/32/64/
feature extraction. The models used were 128 dimensions. Feature extraction techniques
pretrained on the ImageNet dataset, and the have previously been used to extract specific
corresponding weights were used (Fig. 4.3). features from image scans using deep CNNs
Although machine learning is helpful in effec- such as ResNet and VGG architectures [32].
tively extracting features for certain tasks, the After experimenting with multiple-dimensional
remaining challenge is deciding which specific outputs, the best feature embedding/feature
features should be extracted to feed into the vector was used for further processing and

Applications of Artificial Intelligence in Medical Imaging


116 4. Breast cancer detection from mammograms using artificial intelligence

model building. To obtain a good feature experiments was to classify images into two
embedding, dropout layers with appropriate classes, namely, benign and malignant. As this
dropout rates were used along with batch was a two-class classification, a sigmoid layer
normalization. Transfer learning was used to was used for making the prediction and classi-
retrain the trained models on image data and fication of an input image into the two respec-
obtain feature embeddings that contain class tive classes. The classes in the two datasets
and image-specific information. A sample Python were not balanced, and different techniques,
code is given below. such as callbacks, propagating custom weights

# Example of Extracting Features using ResNet101 as base model


base_model= ResNet101(input_shape=(100,100,3), weights='imagenet', include_top=False)
x = base_model.output
#x = Dropout(0.5)(x)
x = Flatten()(x)
x = BatchNormalization()(x)
# x = Dense(16,kernel_initializer='he_uniform')(x)
# x = BatchNormalization()(x)
# x = Activation('relu')(x)
# x = Dropout(0.5)(x)
predictions = Dense(128, activation='softmax')(x)

model_feat = Model(inputs=base_model.input,outputs=predictions)

train_features = model_feat.predict(x_train)
val_features=model_feat.predict(x_val)
test_features=model_feat.predict(x_test)

# ML Classification Algorithms are then used on the extracted features


# A custom pipeline is used to train and predict using ML Classifiers
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
names = [
"K Nearest Neighbour Classifier",
'SVM',
"Random Forest Classifier",
"AdaBoost Classifier",
"XGB Classifier",
]
classifiers = [
KNeighborsClassifier(),
SVC(),
RandomForestClassifier(),
AdaBoostClassifier(),
XGBClassifier(),
]
zipped_clf = zip(names,classifiers)

4.4.2 Prediction and classification through the network, and reducing learning
rate with epochs, among other techniques,
In this work, two datasets are utilized, were used to tackle this problem. Various
namely, MIAS and DDSM. The objective in the optimizers such as Adam, SGD, RMSprop,

Applications of Artificial Intelligence in Medical Imaging


4.4 Breast cancer detection using artificial intelligence 117
def classifier_summary(pipeline, X_train, y_train, X_val, y_val,X_test,y_test):
sentiment_fit = pipeline.fit(X_train, y_train)

y_pred_train= sentiment_fit.predict(X_train)
y_pred_val = sentiment_fit.predict(X_val)
y_pred_test = sentiment_fit.predict(X_test)

train_accuracy = np.round(accuracy_score(y_train, y_pred_train),4)*100


train_precision = np.round(precision_score(y_train, y_pred_train, average='weighted'),4)
train_recall = np.round(recall_score(y_train, y_pred_train, average='weighted'),4)
train_F1 = np.round(f1_score(y_train, y_pred_train, average ='weighted'),4)
train_kappa = np.round(cohen_kappa_score(y_train, y_pred_train),4)

val_accuracy = np.round(accuracy_score(y_val, y_pred_val),4)*100


val_precision = np.round(precision_score(y_val, y_pred_val, average='weighted'),4)
val_recall = np.round(recall_score(y_val, y_pred_val, average ='weighted'),4)
val_F1 = np.round(f1_score(y_val, y_pred_val, average ='weighted'),4)
val_kappa = np.round(cohen_kappa_score(y_val, y_pred_val),4)

test_accuracy = np.round(accuracy_score(y_test, y_pred_test),4)*100


test_precision = np.round(precision_score(y_test, y_pred_test, average='weighted'),2)
test_recall = np.round(recall_score(y_test, y_pred_test, average='weighted'),2)
test_F1 = np.round(f1_score(y_test, y_pred_test, average ='weighted'),2)
test_kappa = np.round(cohen_kappa_score(y_test, y_pred_test),2)
print('Train Set Metrics')
print("Accuracy core : {}%".format(train_accuracy))
cm=confusion_matrix(y_train,y_pred_train)
cm_plot=plot_confusion_matrix(cm,classes=['Class 1','Class 2',...,'Class n'])

print('Validation Set Metrics')


print()
print("Accuracy score : {}%".format(val_accuracy))
cm=confusion_matrix(y_val,y_pred_val)
cm_plot=plot_confusion_matrix(cm,classes=['Class 1','Class 2',...,'Class n'])

print('Test Set Metrics')


print()
print("Accuracy score : {}%".format(test_accuracy))
print("F1_score : {}".format(test_F1))
print("Kappa Score : {} ".format(test_kappa))
print("Recall score: {}".format(test_recall))
print("Precision score : {}".format(test_precision))
cm=confusion_matrix(y_test,y_pred_test)
cm_plot=plot_confusion_matrix(cm,classes=['Class 1','Class 2',...,'Class n'])

Adamax, etc., were experimented with. Adam, Finally, we put a fully connected layer at the
which showed a great convergence pattern end of the network architecture to classify the
and converged faster than another optimizer, images. We put raw images and labels into
was chosen for final model training. Hence, the CNNs with no information on the underly-
the Adam optimizer yielding the best results ing data, and it almost gives us the state-of-
was chosen for final prediction and classifica- the-art [33]. Similarly, in the case of classifiers
tion while working with convolution neural trained on features extracted from state-of-
networks and transfer learning techniques. the-art models, dense layers of appropriate

Applications of Artificial Intelligence in Medical Imaging


118 4. Breast cancer detection from mammograms using artificial intelligence

# Custom plotting Function


def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.YlOrRd):
plt.figure(figsize = (6,6))
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=90)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 2.
cm = np.round(cm,2)
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()
def classifier_comparator(X_train,y_train,X_val,y_val,X_test,y_test,classifier=zipped_clf):
result = []
for n,c in classifier:
checker_pipeline = Pipeline([('Classifier', c)])
print("Fitting {} on input_data ".format(n))
classifier_summary(checker_pipeline,X_train, y_train, X_val, y_val,X_test,y_test)

FIGURE 4.3 Deep feature extraction with pretrained model.


4.4 Breast cancer detection using artificial intelligence 119
dimensions were used with a sigmoid layer. images by extracting the ROIs. The data is
A sigmoid layer is used to map the input only stored as tfrecords files for TensorFlow. The
between zero and one. Even if the input dataset contains 55,890 training examples, of
values are out of this range, they are mapped which 14% are positive and the remaining 86%
to values between zero and one. This way negative, divided into five tfrecords files, CBIS-
of classification of input data is possible DDSM dataset [30,31].
even if input values grow exponentially. After Early detection of breast cancer on screening
the features have been extracted and selected, mammography is a challenging classification
they are fed into a classifier to categorize task because the tumors themselves occupy only
the images into malignant and benign classes. a tiny portion of the image of the entire breast
The commonly used classifiers include linear, scan [6]. We have merged the training and test-
ANN, Bayesian neural networks, tree-based ing dataset in CBIS-DDSM, and the data was fur-
classifiers, support vector machine (SVM), ther split into training, validation, and testing.
XGBoost, and AdaBoost [34]. Finally, we place The MIAS is an organization of UK research
a fully connected layer at the end of the net- groups interested in understanding mammo-
work architecture to classify the images into grams and has generated a digital mammo-
their respective classes. We put raw images gram database. The MIAS [28] dataset consists
and labels into the CNNs with no information of 322 images of size 1024 3 1024.
on the underlying data, and it almost gives us Our study primarily focuses on implement-
state-of-the-art results. ing and tuning state-of-the-art models to attain
higher accuracy. Transfer learning has played
a significant role in this field. Pretrained
4.4.3 Experimental data
weights have led to better results, and techni-
The CBIS-DDSM dataset is the most popular ques such as feature extraction make it possible
in breast cancer detection using AI techniques. to extract representation vectors/feature embed-
The dataset that we have used is publicly avail- dings from large amounts of data and maintain
able and consists of images from the DDSM [1] high accuracy when combined with machine
and CBIS-DDSM [3] datasets. The images have learning models. Below are some sample
been preprocessed and converted to 299 3 299 images (Figs. 4.44.6).

FIGURE 4.4 Sample of benign mammograms from MIAS [28,29] dataset. MIAS, Mammographic Image Analysis Society.

Applications of Artificial Intelligence in Medical Imaging


120 4. Breast cancer detection from mammograms using artificial intelligence

# Extracting Data from MIAS Dataset


# Image Augmentation
no_angles = 360
url = 'MIAS_Dataset_Kaggle'

def save_dictionary(path,data):
#open('u.item', encoding="utf-8")
import json
with open(path,'w') as outfile:
json.dump(str(data), fp=outfile)

def read_image():
info = {}
for i in range(322):
if i<9:
image_name='mdb00'+str(i+1)
elif i<99:
image_name='mdb0'+str(i+1)
else:
image_name = 'mdb' + str(i+1)
image_address= url+image_name+'.pgm'
img = cv2.imread(image_address,1)
img = cv2.resize(img, (224,224))
rows, cols,channel = img.shape
info[image_name]={}
for angle in range(0,no_angles,8):
M = cv2.getRotationMatrix2D((cols / 2, rows / 2), angle, 1)
img_rotated = cv2.warpAffine(img, M, (cols, rows))
info[image_name][angle]=img_rotated
return (info)

def read_lable():
filename = url+'Info.txt'
text_all = open(filename).read()
#print(text_all)
lines=text_all.split('\n')
info={}
for line in lines:
words=line.split(' ')
if len(words)>3:
if (words[3] == 'B'):
info[words[0]] = {}
for angle in range(0,no_angles,8):
info[words[0]][angle] = 0
if (words[3] == 'M'):
info[words[0]] = {}
for angle in range(0,no_angles,8):
info[words[0]][angle] = 1
return (info)

Applications of Artificial Intelligence in Medical Imaging


4.4 Breast cancer detection using artificial intelligence 121
import numpy as np
lable_info=read_lable()
image_info=read_image()
ids=lable_info.keys()
del lable_info['Truth-Data:']
X=[]
Y=[]
for id in ids:
for angle in range(0,no_angles,8):
X.append(image_info[id][angle])
Y.append(lable_info[id][angle])
X=np.array(X)
Y=np.array(Y)
Y=to_categorical(Y,2)
x_train, x_test1, y_train, y_test1 = train_test_split(X, Y, test_size=0.3, random_state=42)
x_val, x_test, y_val, y_test = train_test_split(x_test1, y_test1, test_size=0.3, random_state=42)

# Extracting .tfrecords (CBIS-DDSM Dataset)


images=[]
labels=[]
feature_dictionary = {
'label': tf.io.FixedLenFeature([], tf.int64),
'label_normal': tf.io.FixedLenFeature([], tf.int64),
'image': tf.io.FixedLenFeature([], tf.string)
}

def _parse_function(example, feature_dictionary=feature_dictionary):


parsed_example = tf.io.parse_example(example, feature_dictionary)
return parsed_example

def read_data(filename):
full_dataset = tf.data.TFRecordDataset(filename,
num_parallel_reads=tf.data.experimental.AUTOTUNE)
full_dataset = full_dataset.cache()
print("Size of Training Dataset: ", len(list(full_dataset)))

feature_dictionary = {
'label': tf.io.FixedLenFeature([], tf.int64),
'label_normal': tf.io.FixedLenFeature([], tf.int64),
'image': tf.io.FixedLenFeature([], tf.string)
}

Applications of Artificial Intelligence in Medical Imaging


122 4. Breast cancer detection from mammograms using artificial intelligence

full_dataset = full_dataset.map(_parse_function,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
print(full_dataset)
for image_features in full_dataset:
image = image_features['image'].numpy()
image = tf.io.decode_raw(image_features['image'], tf.uint8)
image = tf.reshape(image, [299, 299])
image=image.numpy()
image=cv2.resize(image,(100,100))
image=cv2.merge([image,image,image])
image
images.append(image)
labels.append(image_features['label_normal'].numpy())

filenames=['CBIS-DDSM_Dataset(kaggle).tfrecords']

for file in filenames:


read_data(file)

X=np.array(images)
y=np.array(labels)

# Splitting Data into Train, Test and Validation sets


x_train, x_test1, y_train, y_test1 = train_test_split(X, y, test_size=0.3, random_state=42,
shuffle=True,stratify=y)

x_val, x_test, y_val, y_test = train_test_split(x_test1, y_test1, test_size=0.3, random_state=42,


shuffle=True,stratify=y_test1)

FIGURE 4.5 Sample of malignant mammograms from MIAS [28,29] dataset. MIAS, Mammographic Image Analysis Society.

Applications of Artificial Intelligence in Medical Imaging


4.4 Breast cancer detection using artificial intelligence 123

FIGURE 4.6 Samples from DDSM [30,31] dataset.

4.4.4 Performance evaluation measures True negatives (TN)—These are the correctly
predicted negative values that mean that the
The MIAS dataset was randomly split as fol- actual class’s value is no, and value of the pre-
lows for training, validation, and testing pur- dicted class is also no.
poses (train size 5 70%, validation size 5 21%, False positive and false negative values
and test size 5 9%). Similarly, the CBIS-DDSM occur when your actual class contradicts the
dataset was split as follows (train size 5 70%, predicted class.
validation size 5 20% of train size, and test False positives (FP)—When the actual class is
size 5 30%). The splits were stratified on the no, and the predicted class is yes.
labels to maintain the class ratios throughout False negatives (FN)—When actual class is
the experimentation process. The evaluation yes, but predicted class is no.
metrics are to choose carefully when dealing Accuracy—It is the ratio of correctly predicted
with medical data. Even the slightest of errors observations to the total number of observations.
could lead to major changes in applying and
adapting deep learning models in real time since TP 1 TN
Accuracy 5
medical data is susceptible to individual classes. ðTP 1 FP 1 FN 1 TNÞ
F1 score metric, Kappa score, area under the Precision—Precision is the ratio of correctly
ROC curve (AUC), recall, and precision scores predicted positive observations to the total pre-
are some popularly known and most widely dicted positive observations.
used metrics to interpret the performance of
multiclass models built with the help of deep TP
Precision 5
learning and machine learning techniques. TP 1 FP
To evaluate the results of our classification Recall (sensitivity)—Recall is the ratio of cor-
model, we require some values such as TP rectly predicted positive observations to all
(true positive), FN (false negative), FP (false observations in the actual class.
positive), and TN (true negative).
True positives (TP)—These are the correctly TP
Recall 5
predicted positive values that mean that the TP 1 FP
actual class’s value is yes, and the value of the F1—F1 score is the weighted average of pre-
predicted class is also yes. cision and recall. Therefore this Score takes

Applications of Artificial Intelligence in Medical Imaging


124 4. Breast cancer detection from mammograms using artificial intelligence

both false positives and false negatives into (FPR) at different thresholds. The area under
account. the curve (AUC) is a measure of a classifier’s
ability to distinguish between classes and is
F1 Score 5 2  ðRecall  PrecisionÞ= used to summarize the ROC curve [38]. The
ðRecall 1 PrecisionÞ higher the value of this metric, the better the
performance of the classification model. It is
Cohen’s Kappa coefficient—This statistic is used evaluated by calculating the area under the
to measure the reliability between evaluators ROC curve [39].
(and also the reliability within evaluators) for A sample Python code for performance
qualitative (categorical) items. Cohen’s Kappa is metrics is given below.

# Performance Evaluation Metrics


from sklearn import metrics
print('Accuracy score is :', np.round(metrics.accuracy_score(y_test, y_pred),4))
print('Precision score is :', np.round(metrics.precision_score(y_test, y_pred,
average='weighted'),4))
print('Recall score is :', np.round(metrics.recall_score(y_test,y_pred, average='weighted'),4))
print('F1 Score is :', np.round(metrics.f1_score(y_test, y_pred, average='weighted'),4))
print('ROC AUC Score is :', np.round(metrics.roc_auc_score(y_test, y_pred,multi_class='ovo',
average='weighted'),4))
print('\t\tClassification Report:\n', metrics.classification_report(y_test, y_pred,
target_names=target))
print('Cohen Kappa Score:', np.round(metrics.cohen_kappa_score(y_test, y_pred),4))

based on the idea of measuring the correspon- 4.4.5 Experimental results


dence between the predicted and the actual
This section contains the experimental results.
labels, which are considered to be two random
We have worked on two datasets, namely,
categorical variables [35]. The kappa statistic
MIAS and DDSM. The experimental results
is the most widely used metric for measuring
have been presented below:
categorical data where there is no objective
way of determining the likelihood of a coinci-
dental encounter between two or more obser- 4.4.5.1 Mammographic Image Analysis
vers. Values closer to 1 are considered good, Society dataset
and values closer to 0 are uncertain. Cohen [36] It can be seen from Table 4.1 that pretrained
defined the Kappa statistic as an agreement models were able to show good results. VGG16
index and defined as the following: was able to outperform other models on all
our performance evaluation metrics. Despite
Po 2 Pe
k5 the imbalance class data, pretrained models
1 2 Pe were able to identify the test data into correct
Po has observed agreement, and Pe mea- classes accurately.
sures the agreement expected by chance [37]. It can be seen from Table 4.2 that multilay-
AUC score—ROC (receiver operating charac- ered CNN architectures outperformed the pre-
teristic) curve is an evaluation metric for binary trained models. The models appear to overfit
classification problems. It is a probability curve on the train data, but the model performs
that plots the TPR versus the false positive rate equally well on the validation and test data.

Applications of Artificial Intelligence in Medical Imaging


4.4 Breast cancer detection using artificial intelligence 125
TABLE 4.1 Pretrained networks (transfer learning).
Model Training accuracy Validation accuracy Test accuracy F1 measure KAPPA ROC area

ResNet50 97.8 94.11 93.99 0.9399 0.8777 0.9394


VGG16 98.3 96.04 95.71 0.957 0.9122 0.9545
VGG19 98.3 95.22 94.64 0.9464 0.8907 0.9456
Inception_v3 88 78.01 77.04 0.7702 0.5311 0.7651
MobileNet 97 90.71 90.99 0.9099 0.8165 0.9087
DenseNet169 89.5 77.92 78.11 0.7818 0.557 0.7805

DenseNet121 89.4 82.24 82.19 0.8226 0.6406 0.8236


InceptionResNetV2 69.6 66.42 67.81 0.6674 0.321 0.6551
MobileNetV2 96 86.38 86.48 0.8638 0.7212 0.8565
ResNet101 97.6 93.74 94.21 0.9421 0.882 0.9412

TABLE 4.2 Convolutional neural networks (CNNs) with different number of layers.

Classifier Training accuracy Validation accuracy Test accuracy F1 measure KAPPA ROC area

CNN 2 Layer 0.9699 0.9731 0.9678 0.9677 0.9344 1.0

CNN 3 Layer 0.9731 0.9719 0.971 0.9709 0.9411 0.9967


CNN 4 Layer 0.9712 0.9849 0.9836 0.9836 0.9667 0.9978
CNN 5 Layer 0.9851 0.9844 0.9932 0.9932 0.9863 0.999
CNN 6 Layer 0.9719 0.9918 0.9945 0.9945 0.9889 0.9998
CNN 7 Layer 0.99 1.00 0.999 0.999 0.998 1
CNN 8 Layer 0.993 0.9989 0.9974 0.9974 0.9948 1

As the number of layers increases, the accuracy It can be seen from Table 4.5 that the Random
of the model increases. Forest classifier achieved the highest test accu-
It can be seen from Table 4.3 that except for racy. Besides SVM, other classifiers were overfit-
the SVM classifier, all the other classifiers were ting on training data. The test accuracy, recall,
overfitting on training data. The accuracy scores and precision scores were found to be in the
were consistently in the range of 5159; similar range of 5156.
scores were seen in recall and precision. It can be seen from Table 4.6 that XGBoost
It can be seen from Table 4.4 that although classifier has the best overall performance.
XGBoost and Random Forest were overfitting While it overfits training data, it outperformed
on training data, they showed the best overall other models in all performance measures.
performance. Accuracy scores were consis- Random Forest also, while it overfits, was able
tently between 5159, and a similar trend was to perform well on test data. SVM did not over-
seen in recall and precision scores. fit on train data, unlike other classifiers.

Applications of Artificial Intelligence in Medical Imaging


126 4. Breast cancer detection from mammograms using artificial intelligence

TABLE 4.3 Performance of VGG16 deep feature extraction with different machine learning models.
Model Train accuracy Val accuracy Test accuracy F1 score Kappa Recall Precision

ANN 59.69 53.26 52.57 0.5154 0.0234 0.5258 0.5166


KNN 70.65 52.34 54.51 0.54 0.07 0.55 0.54
SVM 56.74 55.84 56.22 0.45 0.05 0.56 0.59
Random Forest 78.99 55.37 56.87 0.52 0.08 0.57 0.56
AdaBoost 60.3 54.37 53 0.5 0.01 0.53 0.51
XGBoost 99.09 55.75 51.5 0.51 0 0.52 0.51

TABLE 4.4 Performance of VGG19 deep feature extraction with different machine learning models.

Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 58.3 51.51 51.5 0.4922 20.01 0.515 0.4983

KNN 70.54 51.51 53 0.52 0.04 0.53 0.53


SVM 55.55 55.01 54.51 0.39 0 0.55 0.48
Random Forest 77.72 54.74 58.58 0.54 0.12 0.59 0.59
AdaBoost 59.53 53.54 57.5 0.55 0.11 0.58 0.57
XGBoost 98.87 56.67 57.94 0.58 0.14 0.58 0.58

TABLE 4.5 Performance of ResNet50 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 56.59 51.33 53.21 0.5229 0.0379 0.5322 0.524


KNN 70.24 52.07 54.72 0.55 0.08 0.55 0.55

SVM 55.83 55.11 54.08 0.31 -0.01 0.54 0.49


Random Forest 72.89 54.92 55.79 0.48 0.05 0.56 0.55
AdaBoost 60.35 52.16 51.07 0.48 -0.02 0.51 0.49
XGBoost 99.03 52.81 51.72 0.51 0.02 0.52 0.51

It can be seen from Table 4.7 that the It can be seen from Table 4.8 that the
Random Forest classifier overfit on train data Random Forest classifier overfit on train data
showed the best performance on all our per- showed the best performance on all our per-
formance evaluation measures. The accuracy, formance evaluation measures. The accuracy,
recall, and precision scores were consistently recall, precision scores were consistently found
found to be in the range of 5161. to be in the range of 4958.

Applications of Artificial Intelligence in Medical Imaging


4.4 Breast cancer detection using artificial intelligence 127
TABLE 4.6 Performance of ResNet101 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 59.82 52.8 53.64 0.5254 0.0443 0.5365 0.5276


KNN 71.07 54.28 52.86 0.54 0.07 0.54 0.54
SVM 55.83 54.83 54.08 0.42 0 0.54 0.49
Random Forest 78.6 56.76 59.87 0.55 0.15 0.6 0.61
AdaBoost 59.61 54.19 53.43 0.52 0.03 0.53 0.52
XGBoost 99.33 55.01 61.37 0.61 0.22 0.61 0.61

TABLE 4.7 Performance of MobileNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 59.22 53.9 56.65 0.5324 0.0864 0.5665 0.5594


KNN 70.84 51.7 52.79 0.52 0.04 0.53 0.52
SVM 58.09 55.11 56.44 0.51 0.07 0.56 0.56
Random Forest 76.89 55.47 59.87 0.56 0.15 0.6 0.61

AdaBoost 60.71 53.82 56.44 0.53 0.08 0.56 0.56


XGBoost 99.12 52.53 54.08 0.54 0.07 0.54 0.54

TABLE 4.8 Performance of MobileNet deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 easure Kappa Recall Precision

ANN 57.28 53.26 51.5 0.4845 2 0.01 0.515 0.4943


KNN 69.52 53.16 54.29 0.54 0.07 0.54 0.54

SVM 55.63 55.11 53.86 0.4 2 0.01 0.54 0.47


Random Forest 76.92 53.36 58.15 0.54 0.11 0.58 0.58
AdaBoost 60.41 54.83 56.22 0.53 0.08 0.56 0.55
XGBoost 98.95 52.71 52.58 0.52 0.03 0.53 0.52

It can be seen from Table 4.9 that all the XGBoost and KNN classifiers were overfitting
models performed equally well, and the scores on training data.
were consistently in a specific range. XGBoost, It can be seen from the Table 4.11 that
Random Forest, and KNN classifier were over- KNN, XGBoost classifier showed the best
fitting on train data. performance. Except for the SVM classifier,
It can be seen from Table 4.10 that SVM all classifiers were overfitting on training
classifier showed the best overall performance. data.

Applications of Artificial Intelligence in Medical Imaging


128 4. Breast cancer detection from mammograms using artificial intelligence

TABLE 4.9 Performance of InceptionV3 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 55.13 53.17 56.22 0.4618 0.0481 0.5622 0.5777


KNN 68.61 51.98 51.29 0.51 0.02 0.51 0.51
SVM 54.91 54.45 55.15 0.4 0.01 0.55 0.64
Random Forest 62.51 53.73 54.94 0.45 0.02 0.55 0.53
AdaBoost 57.21 53.16 54.94 0.47 0.03 0.55 0.53
XGBoost 98.26 52.99 56.44 0.56 0.12 0.56 0.56

TABLE 4.10 Performance of InceptionResNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 55.57 54.53 54.93 0.3993 0.007 0.5494 0.572


KNN 64.85 52.53 50.21 0.47 0 0.5 0.48
SVM 55.33 54.65 55.15 0.4 0.01 0.55 0.64
Random Forest 57.59 54.92 54.94 0.41 0.01 0.55 0.55

AdaBoost 55.76 54.65 54.94 0.4 0.01 0.55 0.57


XGBoost 75.65 55.57 50.43 0.48 2 0.04 0.5 0.48

TABLE 4.11 Performance of DenseNet169 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 57.26 54.51 54.5 0.5153 0.0452 0.5451 0.5318


KNN 69.49 52.44 54.29 0.54 0.07 0.54 0.54

SVM 55.66 55.2 53.43 0.4 20.02 0.53 0.43


Random Forest 80.78 55.38 55.36 0.52 0.06 0.55 0.54
AdaBoost 60.33 53.08 53 0.52 0.03 0.53 0.52
XGBoost 98.32 52.07 54.08 0.54 0.07 0.54 0.54

It can be seen from Table 4.12 that ANN, performance. XGBoost, KNN, and Random
KNN, Random Forest, XGBoost, and AdaBoost Forest classifiers were overfitting on training
classifiers were overfitting on training data. data.
KNN classifier showed the best performance
out of all. 4.4.5.2 CBIS-DDSM dataset
It can be seen from the Table 4.13 that It can be observed from Table 4.14 that
XGBoost classifier showed the best overall simple CNN models were able to achieve high

Applications of Artificial Intelligence in Medical Imaging


4.4 Breast cancer detection using artificial intelligence 129
TABLE 4.12 Performance of DenseNet121 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 58.25 51.12 49.57 0.4946 20.02 0.4957 0.4938


KNN 69.71 50.87 52.15 0.52 0.03 0.52 0.52
SVM 56.07 54.92 53.22 0.41 20.02 0.53 0.46
Random Forest 78.13 54.55 52.15 0.46 20.02 0.52 0.49
AdaBoost 60.35 54.459 50.21 0.47 20.04 0.5 0.48
XGBoost 99 51.52 48.71 0.48 20.04 0.49 0.48

TABLE 4.13 Performance of Xception deep feature extraction with different machine learning models
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

ANN 56.57 56.02 55.36 0.5544 0.1029 0.5536 0.5556


KNN 69.46 52.25 51.29 0.51 0.01 0.51 0.51
SVM 55.6 56.03 56.22 0.44 0.04 0.56 0.62
Random Forest 67.34 56.85 57.94 0.52 0.1 0.58 0.59

AdaBoost 59.94 57.41 56.87 0.52 0.08 0.57 0.56


XGBoost 98.48 58.6 60.73 0.6 0.2 0.61 0.6

TABLE 4.14 Convolutional neural network (CNN) with different number of layers.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa ROC area

CNN 2 Layer 89.29 88.52 88.53 0.8521 0.2474 0.8159


CNN 3 Layer 91.55 87.76 87.99 0.8329 0.129 0.911

CNN 4 Layer 87.3 86.5 87.01 0.8096 0 0.5


CNN 5 Layer 88.08 91.53 91.55 0.9079 0.5635 0.9221
CNN 6 Layer 93.32 91.81 91.46 0.899 0.4989 0.9286
CNN 7 Layer 92.51 92.42 92.15 0.9118 0.5739 0.9378
CNN 8 Layer 90.68 91.59 91.84 0.9086 0.5593 0.9228

performance overall. However, CNN 4 layer It can be observed from Table 4.15 that pre-
was not able to give good performance and trained networks were able to perform very
was overfitting on specific class data, which well. MobileNet of 95.73% accuracy achieved
resulted in a significant drop in ROC and the highest test accuracy, and it also has the
Kappa values. CNN 7 Layer has the best per- highest Kappa score, which indicates the reli-
formance overall in other networks. ability of the model. Overall, it was seen that

Applications of Artificial Intelligence in Medical Imaging


130 4. Breast cancer detection from mammograms using artificial intelligence

TABLE 4.15 Pretrained networks (transfer learning).


Model Training accuracy Validation accuracy Test accuracy F1 measure KAPPA ROC area

ResNet50 96.35 91.63 91.98 0.91071 0.57214 0.73626


VGG16 94.03 92.83 93.06 0.92384 0.76656 0.63551
VGG19 93.36 93.32 93.57 0.932 0.68526 0.80854
Inception_v3 97.77 91.85 92.22 0.92735 0.70546 0.91058
MobileNet 99.07 95.73 95.86 0.95665 0.8009 0.86699
DenseNet169 95.59 79.7 80.56 0.83352 0.45 0.85053

DenseNet121 98.39 90.2 90.53 0.88391 0.41763 0.64961


InceptionResNetV2 94.92 93.44 93.86 0.94036 0.74515 0.89553
MobileNetV2 99.47 85.11 85.61 0.87472 0.56102 0.90523
ResNet101 97.57 91.77 92.127 91.023 0.56419 0.72465

TABLE 4.16 Performance of VGG16 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 86.11 86.24 86.18 0.8484 0.28 0.8618 0.8408


KNN 88.97 86.17 86.28 0.83 0.14 0.86 0.82
SVM 87.41 87.08 87.16 0.82 0.06 0.87 0.84

Random Forest 100 87.25 87.67 0.84 0.17 0.88 0.85


AdaBoost 87.41 87.14 87.51 0.84 0.17 0.88 0.85
XGBoost 94.13 86.96 87.32 0.84 0.21 0.87 0.84

pretrained networks outperformed all the sim- It can be observed from Table 4.18 that
ple CNN models. XGBoost classifier showed the best overall per-
It can be observed from Table 4.16 that formance. ANN model was observed to have
Random Forest classifier was observed to give the highest reliability index. XGBoost classifier
the best performance. XGBoost and Random was overfitting on training data.
Forest classifiers were overfitting on training It can be observed from Table 4.19 that all
data. The ANN model achieved the highest the classifiers were observed to have identical
kappa score. performance. ANN and XGBoost models were
It can be observed from Table 4.17 that observed to have high-reliability scores of
Random Forest classifier showed the best over- about 0.45. The test accuracies, F1 Score, recall,
all performance. ANN model was seen to have and precision scores were consistently in the
the highest reliability index. Random Forest same range.
and XGBoost classifiers were seen to overfit on It can be observed from Table 4.20 that all
training data. the classifiers were observed to have identical

Applications of Artificial Intelligence in Medical Imaging


4.4 Breast cancer detection using artificial intelligence 131
TABLE 4.17 Performance of VGG19 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 84.43 83.69 84.37 0.8365 0.25 0.8437 0.8307


KNN 88.01 85.69 86.28 0.82 0.1 0.86 0.81
SVM 86.98 86.95 86.92 0.81 0 0.87 0.76
Random Forest 99.99 87.58 87.71 0.83 0.13 0.88 0.86
AdaBoost 87.18 87.04 87.51 0.83 0.12 0.88 0.85
XGBoost 98.19 88.61 88.89 0.87 0.37 0.89 0.87

TABLE 4.18 Performance of ResNet50 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 89.26 88.42 88.62 0.8824 0.4654 0.8863 0.8796


KNN 89.69 85.69 86.6 0.84 0.23 0.87 0.84
SVM 87.68 87.4 87.57 0.83 0.09 0.88 0.87
Random Forest 87.98 87.17 87.3 0.82 0.05 0.87 0.87

AdaBoost 88.25 87.91 87.92 0.86 0.35 0.88 0.86


XGBoost 99.09 89.59 89.76 0.89 0.46 0.9 0.89

TABLE 4.19 Performance of ResNet101 deep feature extraction with different machine learning models
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 89.28 88.56 88.81 0.8817 0.453 0.881 0.8782


KNN 89.66 86.78 86.88 0.84 0.2 0.87 0.84
SVM 88.66 88.09 88.23 0.84 0.19 0.88 0.88
Random Forest 100 87.99 88.17 0.84 0.17 0.88 0.88

AdaBoost 88.68 88.28 88.11 0.87 0.36 0.88 0.86


XGBoost 99.1 89.77 89.6 0.89 0.45 0.9 0.88

performance. ANN and XGBoost models were classifiers. XGBoost model was overfitting on
observed to have high-reliability scores. The training data.
test accuracies, F1 score, recall, and precision It can be observed from Table 4.22 that
scores were consistently in the same range. XGBoost classifier was observed to have the
It can be observed from Table 4.21 that best performance overall. The remaining models
ANN model was able to outperform all other showed identical results.

Applications of Artificial Intelligence in Medical Imaging


132 4. Breast cancer detection from mammograms using artificial intelligence

TABLE 4.20 Performance of MobileNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 89.17 88.77 88.6 0.8688 0.3596 0.8861 0.8679


KNN 89.32 86.87 86.22 0.83 0.15 0.86 0.82
SVM 88.03 87.82 87.57 0.83 0.12 0.88 0.86
Random Forest 100 88.39 88.41 0.85 0.23 0.88 0.87
AdaBoost 87.89 88 87.65 0.86 0.29 0.88 0.85
XGBoost 97.63 89.35 89.28 0.88 0.39 0.89 0.88

TABLE 4.21 Performance of MobileNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 88.55 88.31 88.68 97.68 0.4179 0.8869 0.8731


KNN 89.87 87.11 86.88 0.84 0.21 0.87 0.84
SVM 87.93 87.63 87.46 0.82 0.08 0.87 0.86

Random Forest 89.09 88.1 87.85 0.83 0.13 0.88 0.88


AdaBoost 88.17 88.01 87.67 0.86 0.3 0.88 0.85
XGBoost 98.08 89.47 89.72 0.88 0.43 0.9 0.88

TABLE 4.22 Performance of MobileNet deep feature extraction with different machine learning models.

Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 87.2 87.17 86.79 0.8185 0.0598 0.868 0.8174


KNN 87.69 86.4 86.26 0.82 0.08 0.86 0.81

SVM 87.23 87.28 86.96 0.82 0.05 0.87 0.82


Random Forest 88.53 87.52 87.2 0.82 0.07 0.87 0.84
AdaBoost 87.34 87.32 86.92 0.82 0.06 0.87 0.82
XGBoost 94.86 89.21 89.36 0.88 0.42 0.89 0.88

It can be observed from Table 4.23 that and Random Forest classifiers were found to
XGBoost classifier was observed to have the overfit on training data.
best performance overall. The remaining mod- It can be observed from Table 4.25 that
els showed identical results. XGBoost classifier outperformed other classifiers.
It can be observed from Table 4.24 that all Random Forest and XGBoost classifiers were
the models performed equally well. XGBoost overfitting on training data.

Applications of Artificial Intelligence in Medical Imaging


4.5 Discussion 133
TABLE 4.23 Performance of InceptionResNetV2 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 86.75 86.66 86.79 0.8066 0 0.8679 0.7533


KNN 88.07 84.58 86.2 0.83 0.16 0.86 0.82
SVM 86.77 86.75 86.79 0.81 0 0.87 0.75
Random Forest 99.99 85.04 86.4 0.83 0.15 0.86 0.82
AdaBoost 86.83 86.66 86.59 0.81 0.01 0.87 0.79
XGBoost 97.17 85.51 85.7 0.82 0.08 0.86 0.8

TABLE 4.24 Performance of DenseNet169 deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 83.33 82.84 82.8 0.8426 0.3823 0.828 0.8657


KNN 89.16 86 86.56 0.84 0.21 0.87 0.83
SVM 87.07 87.03 87.06 0.81 0.02 0.87 0.87
Random Forest 99.99 88.96 88.99 0.86 0.32 0.89 0.88

AdaBoost 87.84 87.58 87.99 0.85 0.21 0.88 0.86


XGBoost 98.06 89.77 89.98 0.89 0.48 0.9 0.89

TABLE 4.25 Performance of DenseNet121 deep feature extraction with different machine learning models.

Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 85.93 85.65 85.88 84.55 0.2676 0.8588 0.8376

KNN 88.93 85.53 85.86 0.83 0.14 0.86 0.82


SVM 87.06 87.1 87.12 0.82 0.04 0.87 0.85
Random Forest 99.91 88.08 87.95 0.84 0.2 0.88 0.86
AdaBoost 87.12 86.79 86.96 0.82 0.09 0.87 0.83
XGBoost 98.09 89.2 89.54 0.88 0.43 0.9 0.88

It can be observed from Table 4.26 that mammograms into B (Benign), M (Malignant),
XGBoost classifier outperformed other classifiers. transfer learning was found to have the best
performance. In our experimentation with the
4.5 Discussion MIAS dataset, VGG16 had the best perfor-
mance (Train accuracy: 98.3, Test accuracy:
After experimenting with various AI techni- 95.71, Validation accuracy: 96.04, F1 score:
ques to train our classification model to classify 0.957, Kappa score: 0.9122, ROC score: 0.9545).

Applications of Artificial Intelligence in Medical Imaging


134 4. Breast cancer detection from mammograms using artificial intelligence

TABLE 4.26 Performance of Xception deep feature extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 84.64 84 85.06 0.8413 0.263 0.8507 0.8344


KNN 88.5 86.2 86.68 0.84 0.2 0.87 0.83
SVM 86.96 86.95 86.96 0.81 0 0.87 0.76
Random Forest 91.81 87.81 87.93 0.84 0.2 0.88 0.86
AdaBoost 87.01 86.93 87.02 0.81 0.03 0.87 0.83
XGBoost 97.25 89.2 89.07 0.88 0.41 0.89 0.88

The custom CNN n-layered (CNN 6-layered neural networks has also revived this research
achieved a test accuracy of only 61.59) models domain. The increasing number of layers in
and classifiers (ANN, KNN, XGBoost, SVM, modern networks amplifies the differences
and Random Forest, AdaBoost) trained on between architectures and motivates exploring
extracted features were outperformed by the different connectivity patterns and revisiting
transfer learning models (XGBoost classifier old research ideas [40].
trained on features extracted from ResNet-101
model achieved a test accuracy of only 61.37).
And, in our experimentation with the CBIS- 4.6 Conclusion
DDSM dataset, MobileNet had the best perfor-
mance (Train accuracy: 99.07, Test accuracy: In this chapter, we have highlighted the
95.73, Validation accuracy: 95.86, F1 score: various techniques of image classification in
0.956, Kappa score: 0.80, ROC score: 0.89). The the field of breast cancer detection. Techniques
custom CNN n-Layered (CNN 5-Layered, such as transfer learning, deep feature extrac-
CNN 8-Layered models achieved a test accu- tion, model tuning, and hyperparameter opti-
racy of about 92% each, respectively) models mization were used to obtain the best results.
and classifiers (ANN, KNN, XGBoost, SVM, An in-depth analysis of different models has
Random Forest, AdaBoost) trained on extracted been presented in the work and the various
features were slightly outperformed by the performance evaluation metrics used. Popularly
transfer learning models (XGBoost classifier known datasets in breast cancer, such as DDSM
trained on features extracted from DenseNet169 and MIAS datasets, were used to showcase the
model achieved a test accuracy of only 89.98). impact of how different AI techniques can
When trained on various classifiers, the feature be used in this field. This chapter consists of a
extracted from the CBIS-DDSM dataset per- comparative study of various models such as
formed relatively well compared to the MIAS simple CNNs, transfer learning through pre-
dataset. Transfer learning was able to generalize trained models, deep feature extraction, and
to a terrific extent the features it had learned using traditional machine learning models over
from the training data and effectively use it these pretrained models. Our findings have
to classify test mammograms. The exploration shown that the techniques mentioned above
of network architectures has been a part of neu- can detect breast cancer from mammograms
ral network research since their initial discov- with accuracies as high as 96%. These findings
ery. The recent resurgence in the popularity of indicate that many improvements could be

Applications of Artificial Intelligence in Medical Imaging


References 135
made by tuning existing architectures or build- ArXiv150504597 Cs, ,http://arxiv.org/abs/
ing new architectures that could lead to further 1505.04597., May 2015 (accessed 29.04.21)
[11] X. Qin, Z. Zhang, C. Huang, M. Dehghan, O.R. Zaiane,
improvements in this field of breast cancer M. Jagersand, U$^2$-Net: going deeper with nested u-
detection from mammograms using deep learn- structure for salient object detection, Pattern Recognit.
ing and machine learning techniques. 106 (2020) 107404. Available from: https://doi.org/
10.1016/j.patcog.2020.107404.
[12] R. Agarwal, O. Diaz, X. Lladó, M.H. Yap, R. Martı́,
References Automatic mass detection in mammograms using
deep convolutional neural networks, J. Med. Imaging
[1] H. Sung, et al., Global cancer statistics 2020: 6 (03) (2019) 1. Available from: https://doi.org/
GLOBOCAN estimates of incidence and mortality 10.1117/1.JMI.6.3.031409.
worldwide for 36 cancers in 185 countries, CA. Cancer [13] I.C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M.
J. Clin. 71 (3) (2021). Available from: https://doi.org/ J. Cardoso, J.S. Cardoso, INbreast: Toward a Full-field
10.3322/caac.21660. Digital Mammographic Database, Acad. Radiol. 19 (2)
[2] J. Tang, R.M. Rangayyan, J. Xu, I.E. Naqa, Y. Yang, (2012) 236248. Available from: https://doi.org/
Computer-aided detection and diagnosis of breast 10.1016/j.acra.2011.09.014.
cancer with mammography: recent advances, IEEE [14] M.A. Al-Antari, S.-M. Han, T.-S. Kim, Evaluation of
Trans. Inf. Technol. Biomed. 13 (2) (2009) 236251. deep learning detection and classification towards
Available from: https://doi.org/10.1109/TITB.2008. computer-aided diagnosis of breast lesions in digital
2009441. X-ray mammograms, Comput. Methods Prog.
[3] S. Hadush, Y. Girmay, A. Sinamo, G. Hagos, Breast Biomed. 196 (2020) 105584. Available from: https://
cancer detection using convolutional neural networks, doi.org/10.1016/j.cmpb.2020.105584.
ArXiv200307911 Cs, ,http://arxiv.org/abs/2003.07911. [15] J. Schmidhuber, Deep Learning in Neural Networks:
Aug. 2020 (accessed 30.04.21). An Overview, Neural Netw. 61 (2015) 85117.
[4] P. Xi, C. Shu, R. Goubran, Abnormality detection in Available from: https://doi.org/10.1016/j.neunet.
mammography using deep convolutional neural net- 2014.09.003. Jan.
works, ArXiv180301906 Cs, ,http://arxiv.org/abs/ [16] A. Nawaz, S. M. Anwar, R. Liaqat, J. Iqbal, U. Bagci,
1803.01906., Mar. 2018 (accessed 30.04.21) M. Majid, Deep convolutional neural network based
[5] A. Jalalian, S.B.T. Mashohor, H.R. Mahmud, M.I.B. classification of Alzheimer’s disease using MRI
Saripan, A.R.B. Ramli, B. Karasfi, Computer-aided data, ArXiv210102876 Cs Eess, ,http://arxiv.org/
detection/diagnosis of breast cancer in mammogra- abs/2101.02876., Jan. 2021 (accessed 03.05.21).
phy and ultrasound: a review, Clin. Imaging 37 (3) [17] O. Ozdemir, R.L. Russell, A.A. Berlin, A 3D probabi-
(2013) 420426. Available from: https://doi.org/ listic deep learning system for detection and diagnosis
10.1016/j.clinimag.2012.09.024. of lung cancer using low-dose CT scans,
[6] L. Shen, L.R. Margolies, J.H. Rothstein, E. Fluder, R.B. ArXiv190203233 Cs, ,http://arxiv.org/abs/1902.03233.,
McBride, W. Sieh, Deep Learning to improve breast Jan. 2020 (accessed 03.05.21).
cancer early detection on screening mammography, [18] J. Wen, et al., Convolutional neural networks for clas-
Sci. Rep. 9 (1) (2019) 12495. Available from: https:// sification of Alzheimer’s Disease: overview and repro-
doi.org/10.1038/s41598-019-48995-4. ducible evaluation, Med. Image Anal. 63 (2020)
[7] J.R. Burt, et al., Deep learning beyond cats and dogs: 101694. Available from: https://doi.org/10.1016/j.
recent advances in diagnosing breast cancer with media.2020.101694.
deep neural networks, Br. J. Radiol. (2018) 20170545. [19] D.-P. Fan et al., “Inf-Net: automatic COVID-19
Available from: https://doi.org/10.1259/bjr.20170545. lunginfection segmentation from CT Images,”
Apr. ArXiv200414133 Cs Eess, May 2020 (accessed 29.04.21).
[8] I.J. Goodfellow et al., Generative adversarial networks, [20] S. Bharati, P. Podder, M.R.H. Mondal, Artificial neural
ArXiv14062661 Cs Stat, ,http://arxiv.org/abs/ network based breast cancer screening: a comprehensive
1406.2661., Jun. 2014 (accessed 27.05.21). review, ArXiv200601767 Cs Eess Math, ,http://arxiv.
[9] L.A. Gatys, A.S. Ecker, M. Bethge, “A neural algorithm org/abs/2006.01767. May 2020 (accessed 12.05.21)
of artistic style,” ArXiv150806576 Cs Q-Bio, Sep. [21] E.-K. Kim, et al., Applying data-driven imaging bio-
2015, ,http://arxiv.org/abs/1508.06576., (accessed marker in mammography for breast cancer screening:
27.05.21). preliminary study, Sci. Rep. 8 (1) (2018) 2762.
[10] O. Ronneberger, P. Fischer, T. Brox, U-Net: convolu- Available from: https://doi.org/10.1038/s41598-018-
tional networks for biomedical image segmentation, 21215-1.

Applications of Artificial Intelligence in Medical Imaging


136 4. Breast cancer detection from mammograms using artificial intelligence

[22] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet [32] A. Boyd, A. Czajka, K. Bowyer, Deep learning-based
classification with deep convolutional neural net- feature extraction in iris recognition: use existing mod-
works, Commun. ACM 60 (6) (2017) 8490. Available els, fine-tune or train from scratch?, ArXiv200208916
from: https://doi.org/10.1145/3065386. Cs, ,http://arxiv.org/abs/2002.08916. Feb. 2020.
[23] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, (accessed 06.05.21).
Nature 521 (7553) (2015) 436444. Available from: [33] A. Subasi, A. Mitra, F. Özyurt, T. Tuncer, Automated
https://doi.org/10.1038/nature14539. COVID-19 detection from CT images using deep
[24] A.S. Lundervold, A. Lundervold, An overview of learning, Computer-aided Design and Diagnosis
deep learning in medical imaging focusing on Methods for Biomedical Applications, Taylor &
MRI, Z. Für Med. Phys. 29 (2) (2019) 102127. Francis, 2021, pp. 153176. Available from: https://
Available from: https://doi.org/10.1016/j.zemedi.2018. doi.org/10.1201/9781003121152-7.
11.002. [34] Y. Jiménez-Gaona, M.J. Rodrı́guez-Álvarez, V.
[25] A.W. Seni, et al., Improved protein structure predic- Lakshminarayanan, Deep learning based computer-
tion using potentials from deep learning, Nature, aided systems for breast cancer imaging: a critical
2020, p. 577. Available from: https://www.nature. review, ArXiv201000961 Cs Eess, ,http://arxiv.org/
com/articles/s41586-019-1923-7. abs/2010.00961., Sep. 2020 (accessed 11.05.21).
[26] A. Kryshtafovych, T. Schwede, M. Topf, K. Fidelis, J. [35] P. Ranganathan, C. Pramesh, R. Aggarwal, Common
Moult, Critical assessment of methods of protein pitfalls in statistical analysis: measures of agreement,
structure prediction (CASP)—Round XIII, Proteins Perspect. Clin. Res. 8 (4) (2017) 187. Available from:
Struct. Funct. Bioinforma. 87 (12) (2019) 10111020. https://doi.org/10.4103/picr.PICR_123_17.
Available from: https://doi.org/10.1002/prot.25823. [36] J. Cohen, A coefficient of agreement for nominal
[27] M. Ghassemi, T. Naumann, P. Schulam, A. L. Beam, I. scales, Educ. Psychol. Meas. 20 (1) (1960) 3746.
Y. Chen, R. Ranganath, A review of challenges [37] Z. Yang, M. Zhou, Kappa statistic for clustered physi-
and opportunities in machine learning for health, cianpatients polytomous data, Comput. Stat. Data
ArXiv180600388 Cs Stat, ,http://arxiv.org/abs/ Anal. 87 (2015) 117.
1806.00388. Dec. 2019 (accessed 13.05.21). [38] K. Feng, H. Hong, K. Tang, J. Wang, Decision
[28] The mini-MIAS database of mammograms. ,http:// making with machine learning and ROC curves,
peipa.essex.ac.uk/info/mias.html. (accessed 12.05.21). ArXiv190502810 Cs Econ Q-Fin Stat, ,http://arxiv.
[29] MIAS Mammography. ,https://kaggle.com/kma- org/abs/1905.02810., May 2019 (accessed 12.05.21).
der/mias-mammography. (accessed 14.05.21). [39] J.B. Brown, Classifiers and their metrics quantified,
[30] R. Sawyer-Lee, F. Gimenez, A. Hoogi, D. Rubin, Mol. Inform. 37 (12) (2018) 1700127. Available from:
Curated breast imaging subset of DDSM, Cancer https://doi.org/10.1002/minf.201700127.
Imaging Archive (2016). Available from: https://doi. [40] G. Huang, Z. Liu, L. van der Maaten, K.Q.
org/10.7937/K9/TCIA.2016.7O02S9CY. Weinberger, Densely connected convolutional net-
[31] DDSM Mammography. ,https://kaggle.com/ works, ArXiv160806993 Cs, ,http://arxiv.org/abs/
skooch/ddsm-mammography. (accessed 12.05.21). 1608.06993., Jan. 2018 (accessed 29.04.21).

Applications of Artificial Intelligence in Medical Imaging


C H A P T E R

5
Breast tumor detection in ultrasound
images using artificial intelligence
Omkar Modi1 and Abdulhamit Subasi2,3
1
Indian Institute of Technology, Kharagpur, West Bengal, India 2Institute of Biomedicine, Faculty of
Medicine, University of Turku, Turku, Finland 3Department of Computer Science, College of
Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

5.1 Introduction 137 5.4.1 Feature extraction using deep learning 149
5.4.2 Prediction and classification 151
5.2 Background/literature review 138
5.4.3 Experimental data 165
5.3 Artificial intelligence techniques 139 5.4.4 Performance evaluation measures 166
5.3.1 Artificial neural networks 139 5.4.5 Experimental results 168
5.3.2 Deep learning 140
5.5 Discussion 178
5.3.3 Convolutional neural networks 140
5.6 Conclusion 180
5.4 Breast tumor detection using artificial
intelligence 149 References 180

5.1 Introduction diagnose breast cancer mostly biopsy is the


only the certain way for the doctor to know if
Breast cancer begins when healthy cells in an area of the body has cancer. In a biopsy, the
the breast grow out of control, forming a mass doctor takes a small sample of tissue for testing
or sheet of cells called a tumor. Female breast in a laboratory. Other diagnoses include image
cancer has surpassed lung cancer as the most testing such as diagnostic mammography,
commonly diagnosed cancer, with an esti- MRI, and ultrasound [2].
mated 2.3 million new cases [1]. Mortality can Ultrasound imaging of the breast uses
be reduced by early detection and therapy. sound waves to produce images of the internal
There are various predominant tests to find or structures of the breast. It is used to diagnose

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00003-7 137 © 2023 Elsevier Inc. All rights reserved.
138 5. Breast tumor detection in ultrasound images using artificial intelligence

breast lumps and is complementary to mam- in the automated analysis of medical images for
mograms or breast MRIs. Ultrasound is a safe, anomaly detection. The same is true for breast
noninvasive, and radiation-free procedure. images for possible breast cancer detection [5,6].
Breast ultrasonography can assist to detect Conventional methods or traditional machine
whether an abnormality is solid (such as a algorithms such as K-nearest neighbors (KNN),
benign cyst), fluid-filled (such as a noncancer- support vector machine (SVM), and Random
ous lump of tissue), or both cystic and solid Forest showed moderate performances.
(such as a malignant tumor) [3]. In recent Emerging of DL algorithms, which process
years, with the development in artificial intelli- images and extract features, has shown remark-
gence (AI), especially in the field of deep learn- able results. The CNN model is used for training
ing (DL) networks and its outperforming the data very often in medical image diagnosis,
results in the field of image recognition tasks, analysis, and its applications. In fact, medical
we can leverage the technology for ultrasound imaging in CAD systems becomes successful
tests of breast cancer for early detection because of the use of CNN [7]. CNN exploits the
whether the tumor is benign or malignant. spatial data among the image pixels. CNN has
In this chapter, we will review the task of cate- helped researchers map important features local-
gorizing the tumor with various models and izing it from the scan images of a breast and clas-
compare them. We applied DL models involving sifying them into various kinds of abnormalities.
convolutional neural networks (CNNs) and It is evident that DL requires large data, the pre-
transfer learning for feature extraction and classi- training method provided a solution to classify
fied it on various DL architectures and tradi- for incomplete data or large data was not avail-
tional machine learning (ML) algorithms. We able [8].
compared all the models on various metrics. In Ref. [9], the authors compared the classifi-
cation results of KNN, SVM, Random Forest,
and Decision Tree techniques. The Wisconsin
5.2 Background/literature review Breast Cancer dataset was utilized, which was
downloaded from the UCI repository. KNN
Breast self-examination is a screening method was the top classifier in simulations, followed
that is performed by the individual themselves. by SVM, Random Forest, and Decision Tree.
It is feasible to notice any differences or changes In Ref. [10], the authors presented a unique
in the breasts by palming them at different approach for detecting breast cancer using ML
angles and at varied pressures. Breast inspection, techniques such as the Naive Bayes classifier,
on the other hand, is the least reliable method of SVM classifier, bi-clustering AdaBoost techniques,
cancer detection. Mammography has evolved as RCNN classifier, and bidirectional recurrent neu-
a viable alternative and is now commonly uti- ral networks (HA-BiRNN). A comparison of ML
lized in medicine. However, only relying on techniques and the proposed methodology [deep
mammograms carries a considerable risk of false neural network (DNN) with support value] was
positives, which frequently result in unneeded conducted, and the simulated results revealed
biopsies and procedures [4]. that the DNN algorithm was superior in terms of
Due to the development of AI and computer performance, efficiency, and image quality, all of
vision, there has been massive research in using which are critical in today’s medical systems,
AI for automation in the field of medicine. whereas the other techniques failed to perform as
Recently, AI technology has made great progress expected.

Applications of Artificial Intelligence in Medical Imaging


5.3 Artificial intelligence techniques 139
MuñozMeza and Gómez [11] used ultrasound 5.3 Artificial intelligence techniques
images to classify breast tumors using three
M-dimensional sets of characteristics and princi- 5.3.1 Artificial neural networks
pal component analysis using shared information.
The authors initially addressed image segmenta- An artificial neural network (ANN) is a
tion in ultrasound images of breasts using a brain neural system inspired algorithm that
watershed transformation mechanism and an consists of layers with connected nodes and is
important features extraction method for breast included in ML. It has input and output layers
cancer classification that is based on some statisti- as well as hidden layers. The first layer con-
cal tests and shared information. tains input values, whereas the last layer has
Kwok [12] used four distinct state-of-the-art labeled values. The value of each node is learnt
pre-trained models for breast cancer classifica- during training by parameterizing weights
tion on histological images (VGG19, InceptionV3, using learning algorithms such as backpropa-
InceptionV4, and ResNetV2). To increase predic- gation. Each node’s weights are tuned in the
tion accuracy, several data augmentation direction to reduce losses and hence improve
procedures are used. The maximum accuracy accuracy [18].
was reached by the Inception and ResNetV2 ANNs have been applied to a variety of
models, according to the assessment findings. data classification and pattern recognition tasks
For binary and multiclass classification issues, in medical image processing and have proven
the models achieved 91% and 79% accuracy, to be a promising classification tool in breast
respectively. cancer [19]. The application of neural networks
Nawaz et al. [13] used a fine-tuned AlexNet in image and signal processing increased dra-
model for breast cancer categorization. On the matically in the early 1980s. The key advantage
dataset, the researchers used two ways to train was the decrease in manipulation time owing
the model: patch-wise and image-wise of size to neural networks’ parallel-distributed proces-
512 3 512 pixels. On the dataset, the model sing nature [20]. The network was then widely
obtained 75.73% and 81.25% accuracy. employed in popular image processing techni-
Dhungel et al. [14] advocated the use of CNNs ques as vector quantization, eigenvector extrac-
in microcalcification detection, Mordang et al. tion, 2D pulse code modulation, and 2D
[15] proposed the use of CNNs in breast density filtering. The function of an ANN is similar to
estimation, and Ahn et al. [16] proposed the use that of a biological neuron, and it is made up
of CNNs in breast density estimation. Huynh of neurons with different layers that are con-
et al. [17] proposed using a transfer learning tech- nected by numeric weights; these weights may
nique for ultrasound breast image categorization be modified as the network learns to get closer
in breast ultrasound imaging. to the best output. The number of neurons in
Despite the continual advancement of ML image processing applications is usually pro-
techniques, the performance of these applications portional to the number of pixels in the input
has not improved significantly. Meanwhile, DL picture, and the number of layers is deter-
has proved successful in visual object recognition mined by the processing stages [21].
and classification in a variety of domains, since it Image segmentation has been broadly
learns representations from data and supports applied for cancer diagnosis and categoriza-
the learning of successive layers of increasingly tion. ANNs have been used to train a variety
relevant representations. of image segmentation algorithms, including

Applications of Artificial Intelligence in Medical Imaging


140 5. Breast tumor detection in ultrasound images using artificial intelligence

histogram features, edge detection, region DL is closer to its goal than ML and can extract
growth, and pixel classification. ANNs have features automatically [18].
been used for the classification and segmenta- Accordingly, the DL algorithm gets a lot of
tion of diseases such as Alzheimer’s, breast attention these days to solve various problems in
cancer, lung cancer, and brain tumors, among the medical imaging field. In this chapter, we have
other diseases and have vast potential in the developed and compared various DL models on
medical and health sector. ultrasound images to detect breast cancer. DL has
helped to reduce false-positive rates and decrease
assessment time and unnecessary biopsies.
5.3.2 Deep learning
DL is a subfield of ML. DL learns from the
5.3.3 Convolutional neural networks
data. The data may be unstructured or unla-
beled. Researchers have recently advanced DL to CNN is widely used in computer vision for
expand ANN into DNN by stacking many hid- both supervised and unsupervised learning.
den layers with linked nodes between the input CNN is used to classify the images. It takes the
and output layers. By combining basic decisions images of the breast cancer dataset as input that
between layers, the multilayer can handle and neurons are associated with their corre-
increasingly complicated challenges. In predic- sponding weights. The weights are adjusted to
tion tasks such as classification and regression, minimize the error and enhance the perfor-
DNN often outperforms the shallow layered net- mance. The extracted information includes facial
work. To avoid learning converging at the local features, edge detection, object recognition, and
minimum or overcome overfitting difficulties, other relevant features present in the image.
each layer of the DNN improved its weights These features help distinguish one class of
using the unsupervised restricted Boltzmann images from another. This is what sets the CNNs
machine. Recently, residual neural networks different from other DL techniques. The compo-
used skip connections to avoid vanishing gradi- sitions of CNN are convolutional, pooling, and
ent problems. Furthermore, the introduction of fully connected layers (Fig. 5.1). In the convolu-
big data and graphics processing units has the tion layer, a feature map is used to extract the
potential to solve complicated issues and reduce features of the given image and makes the origi-
computing time. There are several hidden layers nal image more compact. The pooling layer is
between the input layer and the output layer. used to reduce the dimensions of the image.
The nodes known as neurons are found in each Rectified linear unit (ReLU) layer is used as an
layer. The difference between ML and DL is that activation function in which it checks the value

Normal

Benign

Malignant

Breast
Ultrasound

FIGURE 5.1 CNN for breast cancer detection.

Applications of Artificial Intelligence in Medical Imaging


5.3 Artificial intelligence techniques 141
of the activation function lies in a given range or decision for classification or regression tasks. A
not. The fully connected layers are joined at the loss is calculated during training by comparing
end of the CNN architecture and offer a final labeled and predicted values [18]. Python codes
for different CNN architectures are given below.
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.optimizers import SGD
from keras.optimizers import Adam

def cnn_2():
# Create the model
model = Sequential()

# Add convolutional layers


model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform',
padding='valid',input_shape=img_shape))
model.add(MaxPooling2D((2,2)))

model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform',


padding='valid'))
model.add(MaxPooling2D((2,2)))

# Add dropout layer


model.add(Dropout(0.5))
# Add flatten layer
model.add(Flatten())

# Add dense layers


model.add(Dense(128,activation='relu'))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

def cnn_3():
# Create the model
model = Sequential()

Applications of Artificial Intelligence in Medical Imaging


142 5. Breast tumor detection in ultrasound images using artificial intelligence

# Add convolutional layers


model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform',
padding='valid', input_shape=img_shape))
model.add(MaxPooling2D((2,2)))

model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform',


padding='valid'))
model.add(MaxPooling2D((2,2)))

model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform',


padding='valid'))
model.add(MaxPooling2D((2,2)))

# Add dropout layer


model.add(Dropout(0.25))
# Add flatten layer
model.add(Flatten())

# Add dense layers


model.add(Dense(1024,activation='relu'))
model.add(Dense(128,activation='relu'))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

def cnn_4():
# Create the model
model = Sequential()

# Add convolutional layers


model.add(Conv2D(32, (3, 3), padding='valid', strides=(1, 1),input_shape=img_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

Applications of Artificial Intelligence in Medical Imaging


5.3 Artificial intelligence techniques 143
model.add(Conv2D(32, (3, 3), padding='same', strides=(1, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

# Add dropout layer


model.add(Dropout(0.2))

# Add flatten layer


model.add(Flatten())
# Add dense layers
model.add(Dense(3))
model.add(Activation('softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

def cnn_5():
# Create the model
model = Sequential()

# Add convolutional layers


model.add(Conv2D(32, (3, 3), padding='same', strides=(1, 1),input_shape=img_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Add dropout layer
model.add(Dropout(0.2))

Applications of Artificial Intelligence in Medical Imaging


144 5. Breast tumor detection in ultrasound images using artificial intelligence

model.add(Conv2D(32, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

model.add(Conv2D(128, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

# Add dropout layer


model.add(Dropout(0.2))
# Add flatten layer
model.add(Flatten())

# Add dense layers


model.add(Dense(256))
model.add(Dense(3))
model.add(Activation('softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

def cnn_6():
# Create the model
model = Sequential()
# Add convolutional layers
model.add(Conv2D(16, (3, 3), padding='same', strides=(1, 1),input_shape=img_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Add dropout layer
model.add(Dropout(0.2))

Applications of Artificial Intelligence in Medical Imaging


5.3 Artificial intelligence techniques 145
model.add(Conv2D(32, (3, 3), padding='same', strides=(1, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

model.add(Conv2D(32, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

model.add(Conv2D(128, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

# Add dropout layer


model.add(Dropout(0.2))
# Add flatten layer
model.add(Flatten())

# Add dense layers


model.add(Dense(256,activation = 'relu'))
model.add(Dense(3))
model.add(Activation('softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

def cnn_7():
# Create the model
model = Sequential()

Applications of Artificial Intelligence in Medical Imaging


146 5. Breast tumor detection in ultrasound images using artificial intelligence

# Add convolutional layers


model.add(Conv2D(16, (3, 3), padding='same',kernel_initializer='he_uniform',
strides=(1,1),input_shape=img_shape))
model.add(Activation('relu'))

model.add(Conv2D(32, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(32, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(64, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(128, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(128, (3, 3), padding='same', strides=(1, 1)))


model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))

Applications of Artificial Intelligence in Medical Imaging


5.3 Artificial intelligence techniques 147
# Add dropout layer
model.add(Dropout(0.2))
# Add flatten layer
model.add(Flatten())

# Add dense layers


model.add(Dense(3))
model.add(Activation('softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

def cnn_8():
# Create the model
model = Sequential()

# Add convolutional layers


model.add(Conv2D(16, (3, 3), padding='same',kernel_initializer='he_uniform',
input_shape=img_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=1,padding = 'same'))
model.add(Conv2D(32,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(32,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(64,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))

Applications of Artificial Intelligence in Medical Imaging


148 5. Breast tumor detection in ultrasound images using artificial intelligence

model.add(Conv2D(128,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(256,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=1,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(128,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=1,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))

model.add(Conv2D(64,(3,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2),strides=2,padding = 'same'))
# Add dropout layer
model.add(Dropout(0.2))
# Add flatten layer
model.add(Flatten())

# Add dense layer


model.add(Dense(3))
model.add(Activation('softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 149

5.4 Breast tumor detection using #VGG16 PreTrained Model for Feature Extraction
artificial intelligence base_model = tf.keras.applications.VGG16(
include_top=False,
5.4.1 Feature extraction using deep learning weights="imagenet",
Feature extraction is a step in the image pro- input_tensor=None,
cessing, which divides and reduces a large collec- input_shape=img_shape,
tion of raw data into smaller groupings. As a pooling=None
result, processing will be easier. When you have
)
a huge data collection and need to decrease the
feature_extraction(base_model)
number of resources without sacrificing any vital
or relevant information, extracting the features #VGG19 PreTrained Model for Feature Extraction
might help. Feature extraction aids in the reduc- base_model = tf.keras.applications.VGG19(
tion of unnecessary data in data collection. The
include_top=False,
reduction of data makes it easier for the com-
weights="imagenet",
puter to develop the model with less effort, and
it also speeds up the learning and generalization input_tensor=None,
processes in the ML process [22]. input_shape=img_shape,
In our research, we have extracted featured pooling=None
through multilayered CNN layers. CNNs provide )
automatic feature extraction. The specified input
feature_extraction(base_model)
data is initially forwarded to a feature extraction
network, and then the resultant extracted features #ResNet50 PreTrained Model for Feature Extraction
are forwarded to a classifier network after apply- base_model = tf.keras.applications.ResNet50(
ing a fully connected layer. Max and average pool- include_top=False,
ing layers were introduced for dimension
weights="imagenet",
reduction that helps significantly in reducing com-
puting costs. To obtain a better feature, vector input_tensor=None,
dropout and batch normalization were used which input_shape=img_shape,
also helped to prevent high variance in data. We pooling=None
have trained various n-layered CNN models )
(n varying from 2 to 10) and compared the quality feature_extraction(base_model)
of features extracted after passing to classifier neu-
ral network through performance matrix. Apart #ResNet101 PreTrained Model for Feature Extraction
from this, a total of 11 pretrained CNN architec- base_model = tf.keras.applications.ResNet101(
tures have been used for feature extraction and include_top=False,
trained on various ML models. Model weight was weights="imagenet",
initialized with ImageNet weights which contains input_tensor=None,
more than 14 million images that belong to more
input_shape=img_shape,
than 20,000 classes. This usage of models with pre-
pooling=None
trained weights is known as transfer learning that
helped to save time and resources from having to )
train multiple ML models from scratch to com- feature_extraction(base_model)
plete similar tasks. Python codes for deep feature
extraction are given below.

Applications of Artificial Intelligence in Medical Imaging


150 5. Breast tumor detection in ultrasound images using artificial intelligence

#MobileNetV2 PreTrained Model for Feature Extraction


base_model = tf.keras.applications.MobileNetV2(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
feature_extraction(base_model)

#MobileNet PreTrained Model for Feature Extraction


base_model = tf.keras.applications.MobileNet(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
feature_extraction(base_model)

#InceptionV3 PreTrained Model for Feature Extraction


base_model = tf.keras.applications.InceptionV3(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
feature_extraction(base_model)

#InceptionResNetV2 PreTrained Model for Feature Extraction


base_model = tf.keras.applications.InceptionResNetV2(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
feature_extraction(base_model)

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 151
#DenseNet169 PreTrained Model for Feature Extraction
base_model = tf.keras.applications.DenseNet169(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
feature_extraction(base_model)

#DenseNet121 PreTrained Model for Feature Extraction


base_model = tf.keras.applications.DenseNet121(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
feature_extraction(base_model)

#Xception PreTrained Model for Feature Extraction


base_model = tf.keras.applications.Xception(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)
feature_extraction(base_model)

5.4.2 Prediction and classification dropout layers and batch normalization layers
The dataset used in our research consisted were introduced. The model was compiled
of three classes of labels associated with each using Adam optimizer and the categorial loss
scanned ultrasound image. Extracted feature function was used to evaluate the loss. This
vectors processed by CNN layers and various optimization technique showed good conver-
pretrained architecture were passed to a dense gence in the dataset. Data was stratified to deal
neural network consisting of a varied number with class imbalance problems and early stop-
of hidden layers with ReLU activations. The ping was introduced to avoid overfitting and
last layer that is output layer was a softmax to improve the learner’s performance on data
layer that classified the feature vectors into outside of the training set. The Python codes
three classes. To overcome high variance, for Transfer learning models are given below.

Applications of Artificial Intelligence in Medical Imaging


152 5. Breast tumor detection in ultrasound images using artificial intelligence

#VGG16
base_model = tf.keras.applications.VGG16(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)

for l in base_model.layers:
l.trainable = False

def VGG16():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#VGG 19
base_model = tf.keras.applications.VGG19(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 153
for l in base_model.layers:
l.trainable = False

def VGG19():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#ResNet50
base_model = tf.keras.applications.ResNet50(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

Applications of Artificial Intelligence in Medical Imaging


154 5. Breast tumor detection in ultrasound images using artificial intelligence

def resnet():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#InceptionV3
base_model = tf.keras.applications.InceptionV3(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None
)

for l in base_model.layers:
l.trainable = False

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 155
def Inception():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#MobileNet
base_model = tf.keras.applications.MobileNet(
alpha = 0.75,
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

Applications of Artificial Intelligence in Medical Imaging


156 5. Breast tumor detection in ultrasound images using artificial intelligence

def MobileNet():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#DenseNet121
base_model = tf.keras.applications.DenseNet121(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

def DenseNet121():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 157
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#DenseNet169
base_model = tf.keras.applications.DenseNet169(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

def DenseNet169():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

Applications of Artificial Intelligence in Medical Imaging


158 5. Breast tumor detection in ultrasound images using artificial intelligence

#InceptionResNetV2
base_model = tf.keras.applications.InceptionResNetV2(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

def InceptionResNetV2():
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#MobileNetV2
base_model2 = tf.keras.applications.MobileNetV2(
alpha = 0.75,
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model2.layers:
layer.trainable = False

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 159
def MobileNet2():
model = Sequential()
model.add(base_model2)
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#ResNet101
base_model = tf.keras.applications.ResNet101(
include_top=False,
weights="imagenet",
input_tensor=None,
input_shape=img_shape,
pooling=None,
)
for layer in base_model.layers:
layer.trainable = False

def resnet101():
model = Sequential()
model.add(base_model)
model.add(MaxPooling2D((2,2),strides = 2))
model.add(Flatten())
model.add(BatchNormalization())

Applications of Artificial Intelligence in Medical Imaging


160 5. Breast tumor detection in ultrasound images using artificial intelligence

model.add(Dense(512,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return model

#AlexNet
def AlexNet():
AlexNet = Sequential()

#1st Convolutional Layer


AlexNet.add(Conv2D(filters=96, input_shape=img_shape, kernel_size=(11,11),
strides=(4,4), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#2nd Convolutional Layer


AlexNet.add(Conv2D(filters=256, kernel_size=(5, 5), strides=(1,1), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#3rd Convolutional Layer


AlexNet.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))

#4th Convolutional Layer


AlexNet.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))

Applications of Artificial Intelligence in Medical Imaging


#5th Convolutional Layer
AlexNet.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same'))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
AlexNet.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#Passing it to a Fully Connected layer


AlexNet.add(Flatten())

# 1st Fully Connected Layer


AlexNet.add(Dense(4096, input_shape=(32,32,3,)))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
# Add Dropout to prevent overfitting
AlexNet.add(Dropout(0.4))

#2nd Fully Connected Layer


AlexNet.add(Dense(4096))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
#Add Dropout
AlexNet.add(Dropout(0.4))

#3rd Fully Connected Layer


AlexNet.add(Dense(1000))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('relu'))
#Add Dropout
AlexNet.add(Dropout(0.4))

#Output Layer
AlexNet.add(Dense(3))
AlexNet.add(BatchNormalization())
AlexNet.add(Activation('softmax'))

AlexNet.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

return AlexNet

In the second part of our experiment, vari- the images into benign and malignant. The
ous classifiers were trained on features classifiers used were ANNs, KNN classifier,
extracted from the pretrained models. The SVM classifier, Random Forest, AdaBoost,
region of interest captured in the extracted fea- Bagging, XGBoost, LSTM, and Bi-LSTM classi-
tures is fed into various classifiers, to classify fier (Fig. 5.2).
162 5. Breast tumor detection in ultrasound images using artificial intelligence

from sklearn.svm import SVC


from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.neural_network import MLPClassifier
from tensorflow.keras.layers import LSTM,Bidirectional,Reshape
from sklearn.metrics import accuracy_score,recall_score,precision_score,f1_score,
roc_auc_score,cohen_kappa_score

def eval(classifier_name,y_train,y_train_pred,y_val,y_val_pred,y_true,y_pred):
y_train = np.argmax(y_train,axis=1)
y_val = np.argmax(y_val,axis=1)
y_true = np.argmax(y_true,axis=1)

train_accuracy = round(accuracy_score(y_train,y_train_pred),4)
val_accuracy = round(accuracy_score(y_val,y_val_pred),4)
test_accuracy = round(accuracy_score(y_true,y_pred),4)
f1_measure = round(f1_score(y_true,y_pred,average='weighted'),4)
kappa_score = round(cohen_kappa_score(y_true,y_pred),4)
recall = round(recall_score(y_true,y_pred,average='weighted'),4)
precision = round(precision_score(y_true,y_pred,average='weighted'),4)

score={"classifier":classifier_name,"train_accuracy":train_accuracy ,
"val_accuracy":val_accuracy,"test_accuracy":test_accuracy,"f1_
measure":f1_measure,"kappa_score":kappa_score,"recall":recall,"precision":precision}

for e,a in score.items():


print(e,a)
print("--"*20)

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 163
def classifier_eval(classifier,classifier_name,X_train,y_train,X_val,y_val,X_test,
y_test):

classifier.fit(X_train,np.argmax(y_train,axis=1))

y_train_pred = classifier.predict(X_train)
y_val_pred = classifier.predict(X_val)
y_test_pred = classifier.predict(X_test)

eval(classifier_name,y_train,y_train_pred,y_val,y_val_pred,y_test,y_test_pred)

names = ['SVM',
'Random Forest',
'AdaBoost',
'KNN',
'XGBoost',
'Bagging',
'ANN'
]

classifier = [
SVC(),
RandomForestClassifier(),
AdaBoostClassifier(),
KNeighborsClassifier(),
XGBClassifier(),
BaggingClassifier(),
MLPClassifier(max_iter = 400),
]

cls_list = zip(names,classifier)
clsm_list = zip(names,classifier)

Applications of Artificial Intelligence in Medical Imaging


164 5. Breast tumor detection in ultrasound images using artificial intelligence

def feature_extraction(base_model):
X_feat_out = base_model.output
X_feat_flatten = Flatten()(X_feat_out)

X_feat_model = Model(inputs = base_model.input,outputs = X_feat_flatten)


X_feat_train = X_feat_model.predict(X_train)
X_feat_val = X_feat_model.predict(X_val)
X_feat_test = X_feat_model.predict(X_test)

Xm_feat_train = X_feat_model.predict(Xm_train)
Xm_feat_val = X_feat_model.predict(Xm_val)
Xm_feat_test = X_feat_model.predict(Xm_test)

for n,c in cls_list:


classifier_eval(c,n,X_feat_train,y_train.toarray(),X_feat_val,y_val.toarray(),
X_feat_test,y_test.toarray())

for l in base_model.layers:
l.trainable = False

#LSTM
lstm_model = Sequential()
lstm_model.add(base_model)
lstm_model.add(Reshape((base_model.output.shape[1]*base_model.output.shape[2],
base_model.output.shape[3])))

lstm_model.add(LSTM(128, dropout=0.5,recurrent_dropout=0.5))
lstm_model.add(Dense(3,activation='softmax'))

lstm_model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

history = lstm_model.fit(X_train,y_train.toarray(),epochs = 20,validation_data = (X_val,


y_val.toarray()))

lstm_train_predict = np.argmax(lstm_model.predict(X_train),axis=1)
lstm_val_predict = np.argmax(lstm_model.predict(X_val),axis=1)
lstm_test_predict = np.argmax(lstm_model.predict(X_test),axis=1)
eval("LSTM",y_train,lstm_train_predict,y_val,lstm_val_predict,y_test,lstm_test_predict)

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 165
#Bi-LSTM
bidir_model = Sequential()
bidir_model.add(base_model)
bidir_model.add(Reshape((base_model.output.shape[1]*base_model.output.shape[2],
base_model.output.shape[3])))
bidir_model.add(Bidirectional(LSTM(128, dropout=0.5,recurrent_dropout=0.5)))
bidir_model.add(Dense(3,activation='softmax'))

bidir_model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

history = bidir_model.fit(X_train,y_train.toarray(),epochs = 20,validation_data =


(X_val,y_val.toarray()))
bidir_train_predict = np.argmax(bidir_model.predict(X_train),axis=1)
bidir_val_predict = np.argmax(bidir_model.predict(X_val),axis=1)
bidir_test_predict = np.argmax(bidir_model.predict(X_test),axis=1)
eval("Bi-dir",y_train,bidir_train_predict,y_val,bidir_val_predict,y_test,
bidir_test_predict)

5.4.3 Experimental data


25 and 75 years old. The number of patients is
In our study, we have used the dataset of 600 female patients. The dataset consists of 780
breast ultrasound images [23]. Breast ultra- images with an average image size of 500 3 500
sound dataset is categorized into three classes: pixels. The data samples are illustrated in
normal, benign, and malignant images. The Fig. 5.3.
data collected at baseline include breast ultra- Furthermore, each image has its own ground
sound images among women between ages truth (masked image) as shown in Fig. 5.4.

ANN
K-NN
SVM
RF
Adaboost
Bagging
XGBoost
LSTM
BiLSTM
Breast
Ultrasound
Deep Feature Extraction Classification

FIGURE 5.2 Classification of breast ultrasound images using deep feature extraction.

Applications of Artificial Intelligence in Medical Imaging


166 5. Breast tumor detection in ultrasound images using artificial intelligence

FIGURE 5.3 Data sample for each class.

FIGURE 5.4 Ground truth (masked) image example for their respective original image.

At Baheya Hospital, grayscale ultrasound and we trained the model on both images indi-
pictures were gathered and saved in a DICOM vidually [23].
format. They were preprocessed and improved Each image was changed to a grayscale image
after being annotated. The number of ultra- with a target shape of size (128 3 128 3 1). The
sound images was decreased to 780 once the dataset consisted of {“benign”: 891, “normal”:
dataset was refined. Normal, benign, and 266, “malignant”: 421} labels. The data was split
malignant images are separated into three cate- into stratified to maintain class ratios uniformly
gories (cases). To remove unnecessary and through the process. The training set consisted of
irrelevant borders from the images, they were 76.5%, the validation set consisted of 8.5% and
all cropped to various sizes. MATLAB was the test set consisted of 15% of the total dataset.
used to perform ground truth (image bound-
aries) in order to make the ultrasound dataset
more useful. For each image, a freehand seg- 5.4.4 Performance evaluation measures
mentation is created independently. As a Evaluation metrics are used to measure the
result, there is a masked image for each image, performance of the statistical model. In our

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 167
experimentation work, we used different the Precision: Precision is the ratio of correctly
performance metrics, such as accuracy, F1 predicted positive observations to the total
score, Cohen’s Kappa score, area under the predicted positive observations.
curve (AUC), precision, and recall. These are Recall (Sensitivity): Recall is the ratio of
widely used metrics to interpret the perfor- correctly predicted positive observations to
mance of the model on multiclass data. all observations in the actual class.

PREDICTED PREDICTED PREDICTED


BENIGN NORMAL MALIGNANT

BENIGN True Benign False normal False malignant

NORMAL False Benign True normal False malignant

MALIGNANT False Benign False normal True malignant

The above table is a sample of classification F1: F1 score is the weighted average of preci-
table for multiclass classification. The values in sion and recall. Therefore this score takes both
blue are correctly predicted and values in red false positives and false negatives into account.
are wrongly predicted. Depending upon this Tp 1 Tn
1 Accuracy 5 Tp 1 Tn 1 Fp 1 Fn
value other metrics are measured.
Tp
2 Precision 5 Tp 1 Fp
True Benign 5 Total number of observations
that are benign and machine has predicted it 3 Recall 5 Tp Tp
1 Tn
as benign. 4 F1 5 2  Precision 1 Recall
PrecisionRecall
False Benign 5 Total number of observations
that are normal or malignant and machine 5.4.4.1 Cohen’s Kappa coefficient
has predicted it as benign.
The kappa statistic is frequently used to test
True normal 5 Total number of observations
model reliability. The importance of model reli-
that are normal and machine has predicted
ability lies in the fact that it represents the extent
it as normal
to which the data collected in the study are cor-
False normal 5 Total number of observations
rect representations of the variables measured.
that are not normal and machine has
The kappa can range from 21 to 11, where 0
predicted it as normal.
represents the amount of agreement that can be
True malignant 5 Total number of
expected from random chance, and 1 represents
observations that are malignant and
perfect agreement between the raters. values # 0
machine has predicted it as malignant.
as indicating no agreement and 0.010.20 as
False malignant 5 Total number of
none to slight, 0.210.40 as fair, 0.410.60 as
observations that are not malignant and
moderate, 0.610.80 as substantial, and 0.811.00
machine has predicted it as malignant.
as almost perfect agreement [24].
Accuracy: It is the ratio of correctly predicted
observations to the total number of PrðaÞ 2 PrðeÞ
κ5
observations. 1 2 PrðeÞ

Applications of Artificial Intelligence in Medical Imaging


168 5. Breast tumor detection in ultrasound images using artificial intelligence

Where Pr(a) represents the actual observed agree- be able to tell the difference between positive
ment, and Pr(e) represents chance agreement. and negative class values. Because the classi-
fier can recognize more true positives and
5.4.4.2 Area under the curve score true negatives than false negatives and false
positives, this is the case. The classifier is
The receiver operator characteristic (ROC) unable to discriminate between positive and
curve is a classification problem evaluation negative class points when AUC 5 0.5. In
metric. It is a probability curve that compares other words, the classifier predicts a random
the true positive rate to the false-positive rate or constant class for all data points [25].
at various threshold levels, thereby separating During classification training, many genera-
the “signal” from the “noise.” The AUC is a tive classifiers use accuracy as a criterion for
summary of the ROC curve that measures a selecting the best answer. However, the accu-
classifier’s ability to discriminate between racy has various flaws, including reduced
classes. The higher the AUC, the better the uniqueness, discriminability, informativeness,
performance of the model at distinguishing and bias to data from the majority class [26].
between the positive and negative classes. Simple Python implementation function is
When AUC 5 1, the classifier is capable of shown below.

from sklearn.metrics import f1_score, roc_auc_score, cohen_kappa_score

def evaluation(model, X_train, y_train, X_val, y_val, X_test, y_test, history):


train_loss, train_acc = model.evaluate(X_train, y_train.toarray())
val_loss, val_acc = model.evaluate(X_val, y_val.toarray())
test_loss_value, test_accuracy = model.evaluate(X_test,
y_test.toarray())

y_pred = model.predict(X_test)
y_pred_label = np.argmax(y_pred, axis=1)
y_true_label = np.argmax(y_test, axis=1)

f1_measure = f1_score(y_true_label, y_pred_label, average="weighted")


roc_score = roc_auc_score(y_test.toarray(), y_pred)
kappa_score = cohen_kappa_score(y_true_label, y_pred_label)

print("Train accuracy = " + str(train_acc))


print("Validation accuracy = " + str(val_acc))
print("Test accuracy = " + str(test_accuracy))
print("f1_measure = " + str(f1_measure))
print("KAPPA = " + str(kappa_score))
print("roc_area = " + str(roc_score))

successfully distinguishing between all posi-


5.4.5 Experimental results
tive and negative class points. If the AUC was The models were trained on images and
0, however, the classifier would expect all masked images separately from the dataset.
negatives to be positives and all positives to The following table consists of various differ-
be negatives. When the AUC reaches 0.5, there ent architectures and their respective perfor-
is a good likelihood that the classifier will mance metrics.

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 169
i. Multilayered CNN networks on raw image showed better results than multilayered
We trained simple n-layered CNN network CNN architecture and significant
and model seemed to be overfit on training improvement was observed in kappa and
data but failed to generalize on test data, 7- F1 score for both the dataset masked and
layered CNN showed better performance unmasked (Table 5.4).
compared to other networks (Table 5.1). v. Deep feature extraction with pretrained
ii. Multilayered CNN networks on masked models followed by classification
image VGG16 is simple arrangement of convolution
The model performed well on masked and max pool layers consistently throughout the
images although the model appeared to be whole architecture followed by two fully con-
overfitting on the training set. Also, dense nected network. Each convolution layers of 3 3 3
layered CNN architecture performed filter with a stride 1, uses same padding and
comparatively better than shallow layered max pool layer of 2 3 2 filter of stride 2.
CNN (Table 5.2). Overfitting is observed in Random Forest,
iii. Pretrained networks (transfer learning) on XGBoost, and ANN. Test and validation set
raw images accuracies were consistently in the range of
The dataset was fit on various pretrained 6575 (Table 5.5).
architecture and significant improvement The results on Masked images are quite
was observed in comparison to simple impressive although Random Forest, XGBoost,
CNN networks. Test accuracies and ANN were overfitting on training set. KNN
approximately lied in range of 70% to 79% and LSTM models showed impressive results
with kappa score of 0.60 and ROC area of with test accuracy of 98% (Table 5.6).
0.87, ResNet50 performed best among all VGG19 architecture is same as that of VGG16
architectures (Table 5.3). but it contains additional three more CNN layers
iv. Pretrained networks (transfer learning) on than VGG16. ANN and XGBoost performed best
masked images with test accuracy of approx. 76% but they over-
Pretrained networks fit well on masked fit on training data (Table 5.7).
images and test accuracies are in range of Significant improvement is observed on
90%98% except for AlexNet. VGG19 training same network on masked images and
architecture performed best among all SVM performing best with test accuracy of
other architectures. Pretrained models 97% (Table 5.8).

TABLE 5.1 Performance of the classifiers for multilayered CNN networks on images.

Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa ROC area

CNN 2 Layer 0.979866 0.701493 0.649573 0.637768 0.402913 0.828568


CNN 3 Layer 0.776846 0.611940 0.666667 0.642752 0.431614 0.808549
CNN 4 Layer 0.651007 0.656716 0.606838 0.524100 0.267157 0.765958
CNN 5 Layer 0.744966 0.641791 0.683761 0.663500 0.459753 0.809080
CNN 6 Layer 0.706376 0.686567 0.658120 0.580778 0.369952 0.807845
CNN 7 Layer 0.911074 0.686567 0.726496 0.718569 0.539143 0.849139

CNN 8 Layer 0.652685 0.641791 0.555556 0.473127 0.161983 0.738786

Applications of Artificial Intelligence in Medical Imaging


170 5. Breast tumor detection in ultrasound images using artificial intelligence

TABLE 5.2 Performance of the classifiers for multilayered CNN networks on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa ROC area

CNN 2 Layer 0.968121 0.791045 0.760684 0.751484 0.603486 0.838874


CNN 3 Layer 0.996644 0.850746 0.837607 0.835283 0.717427 0.891374
CNN 4 Layer 0.956376 0.865672 0.837607 0.838276 0.722368 0.919463
CNN 5 Layer 0.978188 0.880597 0.905983 0.907059 0.840659 0.956980
CNN 6 Layer 1.000000 0.850746 0.871795 0.872323 0.780817 0.956756
CNN 7 Layer 0.939597 0.835821 0.888889 0.891866 0.816460 0.966787

CNN 8 Layer 0.986577 0.925373 0.914530 0.915208 0.854514 0.983993

TABLE 5.3 Performance of the classifiers on pretrained networks on images.


Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa ROC area

ResNet50 0.953020 0.716418 0.794872 0.791786 0.661483 0.895108


VGG16 0.909396 0.761194 0.743590 0.741204 0.580194 0.864333
VGG19 0.909396 0.701493 0.700855 0.697843 0.505435 0.880529
Inception_v3 0.931208 0.731343 0.726496 0.720230 0.540839 0.879365
MobileNet 0.867450 0.805970 0.726496 0.718394 0.536748 0.858062
DenseNet169 0.892617 0.731343 0.777778 0.775716 0.633803 0.879575

DenseNet121 0.917785 0.761194 0.786325 0.786257 0.655355 0.884861


InceptionResNetV2 0.932886 0.716418 0.769231 0.767404 0.619443 0.889081
MobileNetV2 0.773490 0.716418 0.709402 0.693798 0.502626 0.823654
ResNet101 0.832215 0.701493 0.709402 0.708545 0.532110 0.840034
AlexNet 0.994966 0.761194 0.752137 0.749579 0.588727 0.870327

ResNet (Residual network) is deep network ResNet101 is 101-layered deep neural network
with skip connection that helps in vanishing with residual networks, the features extracted are
gradient problem and decrease error on higher trained to ML models; ANN and XGBoost per-
layered model compared to conventional mod- formed best among all other models (Table 5.11).
els. Feature extracted from ResNet50 followed Bagging model performed quiet impressively
by XGBoost resulted in 78% of test accuracy with test accuracy of 91.45% on ResNet101 fea-
(Table 5.9). ture vectors (Table 5.12).
ANN model on masked image resulted in Inverted residual blocks are introduced in
94% of accuracy with good kappa and F1 score this architecture which helps to reduce the
(Table 5.10). computation cost and model size. SVM

Applications of Artificial Intelligence in Medical Imaging


TABLE 5.4 Performance of the classifiers on pretrained networks on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa ROC area

ResNet50 0.971476 0.865672 0.923077 0.922888 0.868670 0.986087

VGG16 0.989933 0.940298 0.974359 0.974245 0.955777 0.998880

VGG19 0.988255 0.910448 0.982906 0.982749 0.970387 0.999677

Inception_v3 0.963087 0.880597 0.923077 0.924044 0.872022 0.992345

MobileNet 0.971476 0.910448 0.948718 0.949125 0.912708 0.990025

DenseNet169 0.988255 0.940298 0.965812 0.966083 0.941806 0.989245

DenseNet121 0.976510 0.910448 0.965812 0.965812 0.941295 0.998208

InceptionResNetV2 0.973154 0.925373 0.905983 0.907244 0.843545 0.975680

MobileNetV2 0.969799 0.910448 0.957265 0.957075 0.926295 0.995967

ResNet101 0.963087 0.925373 0.897436 0.896492 0.822323 0.972852

AlexNet 0.802013 0.776119 0.794872 0.753114 0.610108 0.972852

TABLE 5.5 Performance of the classifiers using VGG16 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.7097 0.6866 0.6325 0.559 0.3188 0.6325 0.5513

Random Forest 1 0.7463 0.7436 0.7271 0.5512 0.7436 0.7904

AdaBoost 0.8456 0.7313 0.6752 0.6627 0.4397 0.6752 0.695

KNN 0.8339 0.6567 0.6667 0.6674 0.4654 0.6667 0.6693

XGBoost 1 0.7463 0.7692 0.7658 0.6144 0.7692 0.7744

Bagging 0.9849 0.6866 0.6667 0.6493 0.421 0.6667 0.68

ANN 1 0.7313 0.7607 0.7586 0.608 0.7607 0.7583

LSTM 0.802 0.7313 0.6923 0.6819 0.4769 0.6923 0.6955

Bi-LSTM 0.7399 0.6866 0.6838 0.6686 0.4658 0.6838 0.6849


172 5. Breast tumor detection in ultrasound images using artificial intelligence

TABLE 5.6 Performance of the classifiers using VGG16 pretrained model for feature extraction on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision

SVM 0.9631 0.8955 0.9573 0.957 0.9264 0.9573 0.9575

Random Forest 1 0.9254 0.9658 0.9658 0.9413 0.9658 0.9658

AdaBoost 0.9362 0.8209 0.9573 0.9578 0.9276 0.9573 0.9599

KNN 0.9748 0.9254 0.9744 0.9742 0.9558 0.9744 0.9743

XGBoost 1 0.9403 0.9744 0.9745 0.9562 0.9744 0.9748

Bagging 0.9983 0.9104 0.9744 0.9745 0.9562 0.9744 0.9748

ANN 1 0.9403 0.9744 0.9742 0.9558 0.9744 0.9743

LSTM 0.9899 0.9552 0.9829 0.983 0.9709 0.9829 0.9839

Bi-LSTM 0.9883 0.9701 0.9744 0.9747 0.9565 0.9744 0.9766

TABLE 5.7 Performance of the classifiers using VGG19 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision

SVM 0.6544 0.6866 0.5556 0.4504 0.1421 0.5556 0.5427

Random Forest 1 0.7164 0.7094 0.6929 0.4868 0.7094 0.7635

AdaBoost 0.8591 0.7015 0.7094 0.7021 0.5006 0.7094 0.7348

KNN 0.8322 0.7015 0.6838 0.6834 0.4881 0.6838 0.6832

XGBoost 1 0.7313 0.7692 0.7633 0.6093 0.7692 0.7837

Bagging 0.9899 0.7164 0.7521 0.7426 0.5706 0.7521 0.7849

ANN 1 0.7761 0.7607 0.7594 0.6076 0.7607 0.7614

LSTM 0.8054 0.6716 0.6581 0.6576 0.4452 0.6581 0.6572

Bi-LSTM 0.8322 0.6866 0.7009 0.7002 0.5146 0.7009 0.6998

TABLE 5.8 Performance of the classifiers using VGG19 pretrained model for feature extraction on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9715 0.9254 0.9744 0.974 0.9554 0.9744 0.9755

Random Forest 1 0.9254 0.9487 0.9491 0.9127 0.9487 0.9503

AdaBoost 0.9446 0.8657 0.9145 0.9152 0.8545 0.9145 0.9166

KNN 0.9681 0.9254 0.9744 0.9742 0.9558 0.9744 0.9743

XGBoost 1 0.9254 0.9658 0.9658 0.9413 0.9658 0.9658

Bagging 1 0.9104 0.9658 0.9658 0.9413 0.9658 0.9658

ANN 1 0.9403 1 1 1 1 1

LSTM 0.9815 1 0.9231 0.9251 0.8729 0.9231 0.9404

Bi-LSTM 0.9849 0.9851 0.9316 0.9333 0.8866 0.9316 0.9456

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 173
TABLE 5.9 Performance of the classifiers using ResNet50 pretrained model for feature extraction.
Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision

Model 0.5721 0.5672 0.4957 0.3286 0 0.4957 0.2457

Random Forest 1 0.7015 0.6667 0.6511 0.413 0.6667 0.7153

AdaBoost 0.8238 0.6866 0.641 0.6282 0.3777 0.641 0.6609

KNN 0.8054 0.7313 0.6752 0.6752 0.4752 0.6752 0.6752

XGBoost 1 0.7463 0.7778 0.7671 0.6201 0.7778 0.7929

Bagging 0.9832 0.6866 0.7009 0.689 0.4859 0.7009 0.7152

ANN 0.8775 0.7761 0.735 0.7169 0.54 0.735 0.7617

LSTM 0.6409 0.6567 0.5726 0.4851 0.1988 0.5726 0.4779

Bi-LSTM 0.6426 0.6866 0.5726 0.4813 0.1872 0.5726 0.518

TABLE 5.10 Performance of the classifiers using ResNet50 pretrained model for feature extraction on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.854 0.8358 0.9316 0.9285 0.8783 0.9316 0.939

Random Forest 1 0.8507 0.9231 0.9227 0.8673 0.9231 0.9226

AdaBoost 0.8339 0.7164 0.8205 0.8231 0.7014 0.8205 0.8318

KNN 0.9262 0.806 0.8889 0.8855 0.8052 0.8889 0.8886

XGBoost 1 0.8358 0.9145 0.9155 0.8572 0.9145 0.9215

Bagging 0.9933 0.8657 0.906 0.9063 0.8409 0.906 0.9079

ANN 1 0.8806 0.9402 0.9399 0.8968 0.9402 0.9398

LSTM 0.9211 0.8507 0.9316 0.931 0.8815 0.9316 0.9311

Bi-LSTM 0.9295 0.8507 0.9402 0.9399 0.8968 0.9402 0.9398

TABLE 5.11 Performance of the classifiers using ResNet101 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision

SVM 0.5721 0.5672 0.4957 0.3286 0 0.4957 0.2457

Random Forest 1 0.6866 0.6838 0.6624 0.4454 0.6838 0.7286

AdaBoost 0.8154 0.597 0.6496 0.6373 0.4082 0.6496 0.6431

KNN 0.7718 0.7015 0.6838 0.6728 0.4693 0.6838 0.6734

XGBoost 1 0.6866 0.7607 0.7549 0.5994 0.7607 0.7627

Bagging 0.9765 0.6866 0.6325 0.6129 0.3665 0.6325 0.6355

ANN 0.9295 0.6866 0.7521 0.7469 0.5834 0.7521 0.758

LSTM 0.6258 0.6567 0.5897 0.5101 0.2418 0.5897 0.4855

Bi-LSTM 0.6275 0.6418 0.5812 0.518 0.2616 0.5812 0.4685

Applications of Artificial Intelligence in Medical Imaging


174 5. Breast tumor detection in ultrasound images using artificial intelligence

performed best with test accuracy if 79.5% and accuracies in range of 92%98.2%, Here again
F1 score of 0.7871 (Table 5.13). ANN performs best among all other models
Feature vector extracted from MobileNetV2 with F1 score of 0.9706 (Table 5.16).
for masked image shows test accuracy in the Random Forest and XGBoost were overfit-
range 95.5% to 98.3% (Table 5.14). ting on training set, the accuracies were in
Feature extracted from MobileNet architecture range of 60%73.5%, SVM performed best
obtained test accuracies in range of 67%80%, among all other models with F1 score of 0.7812
where ANN performance best with F1 score of (Table 5.17).
0.7992 (Table 5.15). Training accuracies for Random Forest,
Feature extracted from MobileNet architec- XGBoost, and ANN were 100% and clearly it is
ture for masked images obtained test overfitting. KNN performed best among all

TABLE 5.12 Performance of the classifiers using ResNet101 pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision

SVM 0.8641 0.806 0.8632 0.8561 0.7573 0.8632 0.8677


Random Forest 1 0.8209 0.906 0.9056 0.8378 0.906 0.9053
AdaBoost 0.8372 0.7313 0.8205 0.8259 0.706 0.8205 0.8478
KNN 0.9178 0.806 0.8547 0.8526 0.7472 0.8547 0.852

XGBoost 1 0.8806 0.8889 0.8891 0.8111 0.8889 0.8898


Bagging 0.9916 0.8507 0.9145 0.9128 0.8506 0.9145 0.9142
ANN 1 0.8806 0.9231 0.9227 0.8673 0.9231 0.9226
LSTM 0.9346 0.806 0.906 0.9071 0.8407 0.906 0.9096
Bi-LSTM 0.9211 0.8358 0.8803 0.8792 0.7927 0.8803 0.8788

TABLE 5.13 Performance of the classifiers using MobileNetV2 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 Measure Kappa Recall Precision

SVM 0.9279 0.791 0.7949 0.7871 0.6506 0.7949 0.8087


Random Forest 1 0.7612 0.6923 0.6674 0.4451 0.6923 0.7794
AdaBoost 0.8322 0.7761 0.6239 0.5883 0.3257 0.6239 0.6605

KNN 0.8154 0.6716 0.7179 0.7145 0.5413 0.7179 0.7155


XGBoost 1 0.7463 0.7949 0.7879 0.6479 0.7949 0.8227
Bagging 0.9866 0.7313 0.6838 0.6631 0.4426 0.6838 0.7214
ANN 1 0.8358 0.8034 0.7986 0.6681 0.8034 0.8144
LSTM 0.8322 0.7463 0.735 0.7259 0.5532 0.735 0.7385

Bi-LSTM 0.8674 0.7612 0.7863 0.7789 0.6392 0.7863 0.7965

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 175
TABLE 5.14 Performance of the classifiers using MobileNetV2 pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9916 0.9104 0.9829 0.9829 0.9706 0.9829 0.9829

Random Forest 1 0.9104 0.9744 0.9742 0.9558 0.9744 0.9743

AdaBoost 0.948 0.8955 0.906 0.9076 0.842 0.906 0.9134

KNN 0.9799 0.9104 0.9829 0.9827 0.9704 0.9829 0.9834

XGBoost 1 0.9104 0.9573 0.9574 0.9269 0.9573 0.9578

Bagging 0.9983 0.9104 0.9658 0.9658 0.9413 0.9658 0.9658

ANN 1 0.9104 0.9744 0.9742 0.9558 0.9744 0.9743

LSTM 0.9883 0.9403 0.9744 0.9742 0.9558 0.9744 0.9743

Bi-LSTM 0.9983 0.9552 0.9744 0.9742 0.9558 0.9744 0.9743

TABLE 5.15 Performance of the classifiers using MobileNet pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.948 0.7463 0.7692 0.7598 0.6046 0.7692 0.7889

Random Forest 1 0.7164 0.7607 0.7417 0.5782 0.7607 0.8055

AdaBoost 0.8272 0.6567 0.6923 0.6816 0.4666 0.6923 0.7213

KNN 0.8205 0.6269 0.7094 0.7135 0.542 0.7094 0.7314

XGBoost 1 0.7463 0.7607 0.7524 0.5899 0.7607 0.7824

Bagging 0.9916 0.6567 0.6752 0.6542 0.4302 0.6752 0.6983

ANN 1 0.7463 0.8034 0.7992 0.6688 0.8034 0.8154

LSTM 0.9178 0.7463 0.7607 0.7536 0.5954 0.7607 0.7737

Bi-LSTM 0.9128 0.7164 0.7692 0.7633 0.6103 0.7692 0.7821

TABLE 5.16 Performance of the classifiers using MobileNet pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9899 0.9104 0.9744 0.9742 0.9558 0.9744 0.9743

Random Forest 1 0.8955 0.9573 0.9574 0.9269 0.9573 0.9578

AdaBoost 0.9933 0.9254 0.9573 0.9574 0.9269 0.9573 0.9578

KNN 0.9748 0.9104 0.9658 0.9655 0.9408 0.9658 0.966

XGBoost 1 0.9403 0.9487 0.949 0.9136 0.9487 0.9516

Bagging 0.9983 0.9254 0.9231 0.9244 0.8707 0.9231 0.93

ANN 1 0.9104 0.9829 0.9829 0.9706 0.9829 0.9829

LSTM 0.9933 0.9701 0.9573 0.9578 0.9276 0.9573 0.9599

Bi-LSTM 0.9966 0.9552 0.9744 0.9747 0.9565 0.9744 0.9766

Applications of Artificial Intelligence in Medical Imaging


176 5. Breast tumor detection in ultrasound images using artificial intelligence

other models with test accuracy of 94.87% and Model was overfitting on Random Forest,
F1 score of 0.9491 and kappa score 0.9127 XGBoost, and ANN model. Highest test accu-
(Table 5.18). racy 79.49% was observed in ANN with F1
InceptionResNetV2 combines Inception net- score of 0.7897 and kappa score of 0.6558
work with residual linkage. Bi-LSTM showed (Table 5.21).
best results with test accuracy 76% and 0.7574 Approximately same performance was
and 0.5981 F1 score and kappa scores, respec- observed by SVM, KNN, and bagging model
tively (Table 5.19). with highest accuracy of 97.44% (Table 5.22).
Test accuracies were consistently in range Highest test accuracy of 80.34% was
91%95%, LSTM and Bi-LSTM performed best obtained in XGBoost. Bi-LSTM model showed
with test accuracy of 95.73% (Table 5.20). good results with test accuracy approx. 80%

TABLE 5.17 Performance of the classifiers using InceptionV3 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9228 0.7015 0.7265 0.7182 0.5331 0.7265 0.7486

Random Forest 1 0.6716 0.6325 0.5961 0.3411 0.6325 0.6714

AdaBoost 0.7617 0.5522 0.6068 0.5959 0.323 0.6068 0.6214

KNN 0.7752 0.5672 0.641 0.6418 0.4234 0.641 0.6431

XGBoost 1 0.6716 0.735 0.725 0.5451 0.735 0.7528

Bagging 0.9832 0.7015 0.6325 0.6131 0.3535 0.6325 0.6621

ANN 0.9983 0.6716 0.7094 0.7082 0.5194 0.7094 0.7149

LSTM 0.9178 0.7313 0.7009 0.6967 0.5007 0.7009 0.7029

Bi-LSTM 0.8674 0.7761 0.7179 0.7142 0.532 0.7179 0.7171

TABLE 5.18 Performance of the classifiers using InceptionV3 pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9849 0.9104 0.9487 0.9491 0.9127 0.9487 0.9503

Random Forest 1 0.8806 0.9573 0.9574 0.9269 0.9573 0.9578

AdaBoost 0.8893 0.7164 0.8205 0.8263 0.7084 0.8205 0.8551

KNN 0.9681 0.8955 0.9487 0.9491 0.9127 0.9487 0.9503

XGBoost 1 0.9254 0.9402 0.9409 0.8986 0.9402 0.9431

Bagging 0.9983 0.8806 0.9145 0.9137 0.8519 0.9145 0.9137

ANN 1 0.9254 0.9487 0.9495 0.9135 0.9487 0.9531

LSTM 0.9966 0.9403 0.9487 0.9495 0.9135 0.9487 0.9531

Bi-LSTM 0.9916 0.9254 0.9316 0.9333 0.8866 0.9316 0.9456

Applications of Artificial Intelligence in Medical Imaging


5.4 Breast tumor detection using artificial intelligence 177
TABLE 5.19 Performance of the classifiers using InceptionResNetV2 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.8893 0.7313 0.7436 0.7291 0.549 0.7436 0.7941

Random Forest 1 0.6716 0.6923 0.6659 0.4481 0.6923 0.7595

AdaBoost 0.7819 0.6866 0.5983 0.5848 0.302 0.5983 0.6233

KNN 0.8188 0.6716 0.7607 0.7565 0.5985 0.7607 0.7687

XGBoost 1 0.7313 0.735 0.7275 0.5451 0.735 0.758

Bagging 0.9866 0.7313 0.6752 0.6469 0.4245 0.6752 0.7012

ANN 1 0.7761 0.7436 0.7347 0.5613 0.7436 0.7605

LSTM 0.9279 0.7612 0.7521 0.7448 0.5751 0.7521 0.7757

Bi-LSTM 0.9195 0.7612 0.7607 0.7574 0.5981 0.7607 0.77

TABLE 5.20 Performance of the classifiers using InceptionResNetV2 pretrained model for feature extraction on
masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9815 0.9254 0.9402 0.9404 0.8977 0.9402 0.9408

Random Forest 1 0.9254 0.9487 0.9495 0.9135 0.9487 0.9531

AdaBoost 0.9681 0.8806 0.9402 0.9415 0.9003 0.9402 0.9512

KNN 0.9631 0.9104 0.9145 0.9151 0.8553 0.9145 0.9168

XGBoost 1 0.9104 0.9316 0.9326 0.8853 0.9316 0.9372

Bagging 0.9983 0.9403 0.9402 0.9412 0.8995 0.9402 0.9466

ANN 1 0.9254 0.9487 0.9491 0.9127 0.9487 0.9503

LSTM 0.9966 0.9254 0.9573 0.958 0.9282 0.9573 0.9632

Bi-LSTM 0.9983 0.9254 0.9573 0.9578 0.9276 0.9573 0.9599

TABLE 5.21 Performance of the classifiers using DenseNet169 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.8809 0.791 0.735 0.7254 0.5419 0.735 0.7635

Random Forest 1 0.7761 0.7009 0.6803 0.466 0.7009 0.7667

AdaBoost 0.8523 0.6716 0.641 0.6193 0.3626 0.641 0.6846

KNN 0.797 0.7164 0.6667 0.6612 0.4443 0.6667 0.6692

XGBoost 1 0.806 0.735 0.7218 0.5414 0.735 0.7568

Bagging 0.995 0.7313 0.6838 0.6629 0.4381 0.6838 0.7365

ANN 1 0.7313 0.7949 0.7897 0.6558 0.7949 0.8005

LSTM 0.8691 0.7612 0.6923 0.6741 0.4641 0.6923 0.7098

Bi-LSTM 0.9446 0.8358 0.7607 0.7531 0.5968 0.7607 0.7647

Applications of Artificial Intelligence in Medical Imaging


178 5. Breast tumor detection in ultrasound images using artificial intelligence

TABLE 5.22 Performance of the classifiers using DenseNet169 pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9883 0.9254 0.9744 0.9745 0.9562 0.9744 0.9748

Random Forest 1 0.9104 0.9744 0.9745 0.9562 0.9744 0.9748

AdaBoost 0.9597 0.8806 0.9658 0.9655 0.9408 0.9658 0.966

KNN 0.9782 0.9104 0.9744 0.9745 0.9562 0.9744 0.9748

XGBoost 1 0.9403 0.9402 0.9409 0.8986 0.9402 0.9431

Bagging 0.9933 0.9104 0.9744 0.9742 0.9558 0.9744 0.9743

ANN 1 0.9552 0.9487 0.9495 0.9135 0.9487 0.9531

LSTM 0.9933 0.9701 0.9487 0.9495 0.9135 0.9487 0.9531

Bi-LSTM 0.9966 0.9701 0.9231 0.9248 0.8719 0.9231 0.9347

TABLE 5.23 Performance of the classifiers using DenseNet121 pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.8909 0.791 0.7521 0.7373 0.5678 0.7521 0.7854

Random Forest 1 0.7164 0.7521 0.7361 0.5655 0.7521 0.7917

AdaBoost 0.854 0.6866 0.641 0.6218 0.3712 0.641 0.6645

KNN 0.8087 0.6269 0.6496 0.6451 0.4227 0.6496 0.6458

XGBoost 1 0.7164 0.8034 0.7969 0.6637 0.8034 0.8266

Bagging 0.9933 0.6567 0.7009 0.6898 0.4851 0.7009 0.7181

ANN 1 0.7015 0.7778 0.7717 0.6271 0.7778 0.7806

LSTM 0.8893 0.7612 0.7521 0.7462 0.5819 0.7521 0.7587

Bi-LSTM 0.9581 0.7313 0.7949 0.7932 0.6626 0.7949 0.7949

and F1 score of 0.7932 and kappa score of Bi-LSTM performed best with accuracy of
0.6626 (Table 5.23). 97.44% and F1 score of 0.9747 and kappa score
All the model performed extremely good with of 0.9565 (Table 5.26).
accuracies in range 96%99% (Table 5.24).
Xception is a convolutional neural network
that is 71-layer deep. Random Forest, XGBoost, 5.5 Discussion
and ANN were overfitting on training set, accu-
racies are in range of61%76% with ANN per- In this chapter, we have trained various DL
forming best of all other models (Table 5.25). architectures on unmasked and masked
Xception architecture on masked images images, and the performance of each model is
yield test accuracies in range 93%97%. quantitatively compared. The multilayered

Applications of Artificial Intelligence in Medical Imaging


5.5 Discussion 179
TABLE 5.24 Performance of the classifiers using DenseNet121 pretrained model for feature extraction on masked
images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9832 0.9104 0.9829 0.9829 0.9706 0.9829 0.9829

Random Forest 1 0.9104 0.9915 0.9914 0.9853 0.9915 0.9916

AdaBoost 0.9513 0.791 0.9402 0.9412 0.8995 0.9402 0.9466

KNN 0.9715 0.8806 0.9744 0.9742 0.9558 0.9744 0.9743

XGBoost 1 0.9104 1 1 1 1 1

Bagging 0.9983 0.8955 0.9829 0.9827 0.9704 0.9829 0.9834

ANN 1 0.9403 0.9573 0.9578 0.9276 0.9573 0.9599

LSTM 0.9966 0.9701 0.9658 0.9663 0.9423 0.9658 0.9697

Bi-LSTM 0.9983 0.9701 0.9658 0.9663 0.9423 0.9658 0.9697

TABLE 5.25 Performance of the classifiers using Xception pretrained model for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.8977 0.7313 0.6667 0.6482 0.4172 0.6667 0.689

Random Forest 1 0.7164 0.7009 0.6851 0.4742 0.7009 0.7426

AdaBoost 0.8523 0.6866 0.641 0.6036 0.3546 0.641 0.7103

KNN 0.8003 0.6866 0.6154 0.6141 0.3721 0.6154 0.6142

XGBoost 1 0.7164 0.7436 0.7356 0.5578 0.7436 0.7725

Bagging 0.9899 0.6866 0.6496 0.6225 0.3795 0.6496 0.6694

ANN 1 0.7015 0.7692 0.7657 0.6161 0.7692 0.7705

LSTM 0.807 0.6716 0.641 0.6242 0.3777 0.641 0.6509

Bi-LSTM 0.745 0.6866 0.6154 0.5693 0.2977 0.6154 0.6592

TABLE 5.26 Performance of the classifiers using Xception pretrained model for feature extraction on masked images.
Model Training accuracy Validation accuracy Test accuracy F1 measure Kappa Recall Precision

SVM 0.9782 0.8955 0.9487 0.9486 0.9126 0.9487 0.9494

Random Forest 1 0.9254 0.9744 0.9742 0.9558 0.9744 0.9743

AdaBoost 0.9027 0.8507 0.8974 0.8995 0.8284 0.8974 0.9073

KNN 0.9715 0.8806 0.9316 0.9315 0.8833 0.9316 0.9316

XGBoost 1 0.9254 0.9658 0.9661 0.9418 0.9658 0.9671

Bagging 0.995 0.9104 0.9658 0.9658 0.9413 0.9658 0.9658

ANN 1 0.9254 0.9658 0.9661 0.9418 0.9658 0.9671

LSTM 0.9681 0.9254 0.9316 0.9333 0.8866 0.9316 0.9456

Bi-LSTM 0.9883 0.9552 0.9744 0.9747 0.9565 0.9744 0.9766

Applications of Artificial Intelligence in Medical Imaging


180 5. Breast tumor detection in ultrasound images using artificial intelligence

CNN architecture followed by the dense neural and evaluating them with various performance
network to predict each image as benign, nor- metrics. The performance on masked imaged
mal, or malignant on unmasked images with such a small dataset was impressive and
showed average results with validation and test accuracies up to 98% were observed.
test accuracies in the range of 60%70%. Training and transfer learning made it possible
Overfitting was observed on the training set to give satisfactory results on unmasked data
due to a small dataset, so batch normalization and test accuracy of 80% was achieved.
and dropout layer were introduced. No signifi-
cant improvement was observed on stacking
many CNN layers. On the other hand, CNN References
architecture on masked images showed good
results and increasing CNN layers, the model [1] H. Sung, et al., Global Cancer Statistics 2020:
GLOBOCAN Estimates of Incidence and Mortality
learned better and performance metrics were Worldwide for 36 Cancers in 185 Countries, CA.
quite impressive. Cancer J. Clin. 71 (3) (2021) 209249. Available from:
To overcome the problem of insufficient data, https://doi.org/10.3322/caac.21660.
pretraining models and training a classifier on top [2] Cancer.Net, Breast cancer - diagnosis, https://www.
of it were implemented. For unmasked data, train- cancer.net/cancer-types/breast-cancer/diagnosis, Jun.
25, 2012 (accessed 26.11.21).
ing accuracies were in the range of 90%95%, [3] Radiologyinfo.org., R. S. of N. A. (RSNA) and A. C. of
while validation and test accuracies were in the Radiology (ACR), Ultrasound - breast, https://www.
range of 70%80%. For masked data, the training radiologyinfo.org/en/info/breastus, June 15, 2020
accuracies were in the range of 97%98%, while (accessed 26.11.21).
validation and test accuracies were in the range of [4] M.H.-M. Khan, et al., Multi- class classification of
breast cancer abnormalities using deep convolutional
90%95%. There were significant improvements neural network (CNN), PLOS ONE 16 (8) (2021)
on F1, kappa score, and ROC area as well. e0256500. Available from: https://doi.org/10.1371/
Further ML classifiers such as SVM, AdaBoost, journal.pone.0256500.
Random Forest, XGBoost, KNN, ANN, LSTM, [5] M. Talo, Automated classification of histopathology
and Bi-LSTM were trained on deep features images using transfer learning, Artif. Intell. Med. 101
(2019) 101743. Available from: https://doi.org/
extracted from pretrained models. These showed 10.1016/j.artmed.2019.101743.
satisfactory results. Bi-LSTM and LSTM models [6] S.M. Shah, R.A. Khan, S. Arif, U. Sajid, Artificial
outperformed other models. intelligence for breast cancer detection: trends &
The number of hyperparameters such as the directions, ArXiv211000942 Cs Eess, http://arxiv.
total number of hidden layers, number of neu- org/abs/2110.00942, 2021 (accessed 18.12.21)
[7] S.S. Yadav, S.M. Jadhav, Deep convolutional neural net-
rons in each hidden layer, number of epochs to work based medical image classification for disease
train, learning rate, and choice of optimizer diagnosis, J. Big Data 6 (1) (2019) 113. Available from:
were fine-tuned on validation accuracies. This https://doi.org/10.1186/s40537-019-0276-2.
plays important role in determining the perfor- [8] G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning
mance of the model. algorithm for deep belief nets, Neural Comput 18 (7)
(2006) 15271554. Available from: https://doi.org/
10.1162/neco.2006.18.7.1527.
[9] S. Bhise, S. Gadekar, A.S. Gaur, S. Bepari, D. Kale, D.S.
5.6 Conclusion Aswale, Breast cancer detection using machine learning
techniques, Int. J. Eng. Res. Technol. 10 (7) (2021)
In this chapter, we classified breast cancer as Accessed: Dec. 18, 2021. [Online]. Available from:
https://www.ijert.org/research/breast-cancer-detection-
benign, malignant, or normal using ultrasound using-machine-learning-techniques-IJERTV10IS070064.pdf.
images by training various CNN models and Available from: https://www.ijert.org/breast-cancer-
pretrained architecture, optimizing, fine-tuning, detection-using-machine-learning-techniques.

Applications of Artificial Intelligence in Medical Imaging


References 181
[10] A.R. Vaka, B. Soni, K. Sudheer Reddy, Breast cancer networks - Huynh - 2016 - Medical Physics - Wiley
detection by leveraging machine learning, ICT Express Online Library. ,https://aapm.onlinelibrary.wiley.
6 (4) (2020) 320324. Available from: https://doi.org/ com/doi/abs/10.1118/1.4957255. (accessed Mar. 23,
10.1016/j.icte.2020.04.009. 2022).
[11] C. Muñoz-Meza, W. Gómez, “A feature selection meth- [18] M. Kim, et al., Deep learning in medical imaging,
odology for breast ultrasound classification,” in 2013 Neurospine 16 (4) (2019) 657668. Available from:
10th International Conference on Electrical Engineering, https://doi.org/10.14245/ns.1938396.198.
Computing Science and Automatic Control (CCE), Sep. [19] S.-C.B. Lo, H.-P. Chan, J.-S. Lin, H. Li, M.T. Freedman,
2013, pp. 245249. Available from: https://doi.org/ S.K. Mun, Artificial convolution neural network for
10.1109/ICEEE.2013.6676056. medical image pattern recognition, Neural Netw. 8 (7)
[12] S. Kwok, Multiclass classification of breast cancer in (1995) 12011214. Available from: https://doi.org/
whole-slide images, Image Analysis and Recognition, 10.1016/0893-6080(95)00061-5.
Cham, 2018, pp. 931940. Available from: https://doi. [20] W.S. Gan, Application of neural networks to the proces-
org/10.1007/978-3-319-93000-8_106. sing of medical images, in: Proc. 1991 IEEE International
[13] W. Nawaz, S. Ahmed, A. Tahir, H.A. Khan, Joint Conference on Neural Networks, 1991,
Classification of Breast Cancer Histology Images pp. 300306 vol. 1. Available from: https://doi.org/
Using ALEXNET, SpringerLink, 2018. https://link. 10.1109/IJCNN.1991.170419.
springer.com/chapter/10.1007/978-3-319-93000-8_99 [21] M.M. Mehdy, P.Y. Ng, E.F. Shair, N.I.M. Saleh, C.
(accessed 23.03.22). Gomes, Artificial neural networks in image processing
[14] N. Dhungel, G. Carneiro, A.P. Bradley, Deep learning for early detection of breast cancer, Comput. Math.
and structured prediction for the segmentation of Methods Med. 2017 (2017) e2610628. Available from:
mass in mammograms, Medical Image Computing https://doi.org/10.1155/2017/2610628.
and Computer-Assisted Intervention, Cham, 2015, [22] ScienceDirect Topics, Feature extraction network - an
pp. 605612. Available from: https://doi.org/ overview. 2019. https://www.sciencedirect.com/topics/
10.1007/978-3-319-24553-9_74. computer-science/feature-extraction-network (accessed
[15] J.-J. Mordang, T. Janssen, A. Bria, T. Kooi, A. Gubern- 06.01.22).
Mérida, N. Karssemeijer, Automatic microcalcification [23] W. Al-Dhabyani, M. Gomaa, H. Khaled, A. Fahmy,
detection in multi-vendor mammography using con- Dataset of breast ultrasound images, Data Brief. 28
volutional neural networks, Breast Imaging, Cham, (2020) 104863. Available from: https://doi.org/
2016, pp. 3542. Available from: https://doi.org/ 10.1016/j.dib.2019.104863.
10.1007/978-3-319-41546-8_5. [24] M.L. McHugh, Interrater reliability: the kappa statistic,
[16] C.K. Ahn, C. Heo, H. Jin, J.H. Kim. A novel deep Biochem. Medica 22 (3) (Oct. 2012) 276282.
learning-based approach to high accuracy breast den- [25] Analytics Vidhya, AUC-ROC curve in machine learn-
sity estimation in digital mammography. Proc. SPIE ing clearly explained. https://www.analyticsvidhya.
10134, Medical Imaging 2017: Computer-Aided com/blog/2020/06/auc-roc-curve-machine-learning/,
Diagnosis, 101342O, March 3, 2017. Available from: Jun. 15, 2020 (accessed 08.01.22).
https://doi.org/10.1117/12.2254264. [26] M. Hossin, M.N. Sulaiman, A review on evaluation
[17] B. Huynh, K. Drukker, M. Giger, MO-DE-207B-06: com- metrics for data classification evaluations, Int. J. Data
puter-aided diagnosis of breast ultrasound images Min. Knowl. Manag. Process. 5 (2015) 0111. Available
using transfer learning from deep convolutional neural from: https://doi.org/10.5121/ijdkp.2015.5201.

Applications of Artificial Intelligence in Medical Imaging


This page intentionally left blank
C H A P T E R

6
Artificial intelligence-based skin
cancer diagnosis
Abdulhamit Subasi1,2 and Saqib Ahmed Qureshi3
1
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland
2
Department of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia
3
Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India

O U T L I N E

6.1 Introduction 183 6.3.10 Convolutional neural network 193


6.3.11 Transfer learning 195
6.2 Literature review 186
6.4 Results and discussions 196
6.3 Machine learning techniques 187
6.4.1 Dataset 196
6.3.1 Artificial neural network 187
6.4.2 Experimental setup 196
6.3.2 k-nearest neighbor 189
6.4.3 Performance metrics 196
6.3.3 Support vector machine 189
6.4.4 Experimental results 197
6.3.4 Random Forest 189
6.4.5 Discussion 201
6.3.5 XGBoost 190
6.3.6 AdaBoost 190 6.5 Conclusion 203
6.3.7 Bagging 191
References 203
6.3.8 Long short-term memory 191
6.3.9 Bidirectional long short-term
memory 192

6.1 Introduction malignant melanoma is a type of skin cancer.


It is less common but the most serious type of
Skin cancer is associated with abnormal or skin cancer because it spreads to other parts
uncontrolled growth of skin cells. Like other of the body [1]. Its seriousness depends upon
skin cancers, melanoma typically named as how early you were aware about it and start the

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00006-2 183 © 2023 Elsevier Inc. All rights reserved.
184 6. Artificial intelligence-based skin cancer diagnosis

diagnosis [2]. Melanoma as a disease was first symptom of melanoma that defines whether
discovered by Rene Laennec in 1804 by the you are infected is a new spot on the skin (likely
name melanose. The heterogeneous nature of moles) or a spot that is changing in size, shape,
some melanoma tumors was first observed by or color. However, you can also determine the
William Norris in 1820. Sir Robert Carswell symptoms with the help of ABCDE rule [10]:
introduced the term melanoma in 1838 [3]. In
• Asymmetry: Shape of the mole is irregular.
the year 2020 in the United States nearly 100,350
• Border: It has irregular edges instead of
new cases of melanoma were diagnosed. It
smooth.
affected 60,190 men and 40,160 women. And in
• Color: The mole has dark spots or uneven
2020 in the United States, an estimated 6850
shading.
deaths from melanoma were expected, which
• Diameter: The spot is larger than the size of
include 4610 men and 2240 women [4].
a pencil eraser.
Melanoma cancer developed through the
• Evolving or elevation: It is continuously
pigment producing cells known as melano-
changing in size, shape, or texture.
cytes. It is also caused due to exposure of UV
(ultraviolet rays). It is like moles in the body, Other symptoms are such as a sore that does
have black/pink color, and can develop any- not heal, redness or a new swelling, changes in
where on the skin. In men, these are more sensation such as itchiness, pain, or tenderness,
likely to begin on the chest and back, while in change in size of the mole, bleeding, oozing, or
case of women, it generally appears on the the development of lump or bump. Due to
legs. The neck and face are other common sites changing signs of cancer, it is better to take con-
where melanoma would appear. It is more sultation of skin specialist as soon as possible to
likely to be found among the fair-skinned peo- diagnose melanoma. Detection of cancer early
ple, whereas people with dark skin have very often allows for more treatment options. Like
less possibility to develop melanoma, but it can other cancers, melanoma also has four stages.
develop between the gaps of fingers, under the As a rule, the lower the number, the less the can-
nails, and also sometimes inside the mouth cer has spread, so melanoma can be diagnosed
and eyes [5]. Melanoma can affect people of all successfully after detecting the early stage [11].
ages. It is often considered serious for older Prevention of melanoma cancer can be done
people with average age of 65 [6], while the through limiting your exposure to UV rays, by
incidence of melanoma is rapidly rising in avoiding the use of tanning beds and sunlamp,
young adults. It has been observed that mela- by wearing protective clothing and putting
noma is now the most common form of cancer moisturizers on skin, and by boosting the
in men and women aged 2039 [7]. The risk of immune system, as having a weakened immune
melanoma seems to be increasing under the system increases not only risk of getting mela-
age of 40 especially in women. It is also the noma but also other types of skin cancer also.
most common cancers in young adults [8]. Melanoma treatment includes chemotherapy,
There are several types of melanoma cancer radiotherapy, surgery, radiation therapy, immu-
but most common are cutaneous. Other types notherapy, or targeted therapy, either alone or
include superficial spreading melanoma, acral in combination. Treatment may include a com-
lentiginous melanoma, nodular melanoma, bination of procedures or not depends upon
amelanotic and desmoplastic melanomas, lenti- how early you have detected the cancer [12].
go maligna melanoma, metastatic melanoma, Chemotherapy is recommended for patients
and ocular melanoma [9]. First important who have metastatic melanoma which has

Applications of Artificial Intelligence in Medical Imaging


6.1 Introduction 185
spread over distant parts of the body. It is used Various experiments are done to find out
to find the location of a tumor [13]. whether the person is suffering from skin cancer
Immunotherapy drugs may be used as a first- disease or not from the image of the sore using
line treatment for melanoma and some other deep learning. Deep learning algorithms are so
cancers [14]. Several types of immunotherapies powerful such that they can even surpass the
may be treated as options to treat melanoma, human-level performance in various tasks. It has
including cytokines and oncolytic virus a very significant impact on various tasks such as
therapy. object detection, face recognition, classification
Radiation therapy may be used for patients problems, machine translation, etc. Initially in
with advanced melanoma. These therapies this experiment, AI algorithms were used for pre-
include external beam radiation therapy, diction in an end-to-end process. This is known
intensity-modulated radiation therapy, and as single-stage detection methods. In dual-stage
Tomo Therapy [15]. detection algorithms or methods, features were
Surgery is the primary treatment for mela- extracted from the images using various convolu-
noma, and it can be considered as a treatment tional neural networks (CNNs) and then these
option for melanoma that has metastasized. features were fed into machine learning classifier
Surgical procedures for localized melanoma for classification.
include excision, reconstructive surgery, sentinel Melanoma is one of the most dangerous can-
lymph node biopsy, lymph node biopsy removal, cers. It occurs in the cells, which produces mel-
and surgery for metastatic melanoma [16]. anin. Melanin is a pigment that gives color to
Artificial intelligence (AI) has transformed the skin and the cells which produce it are
the world in various domains and gave us new known as melanocytes. It generally occurs on
ways for solving real-life problems. the skin and sometimes in eyes but very rarely
Specifically, in medical science it has shown it happens inside the body, that is, nose or
great development, as a result of which many throat. In a normal person, new and healthy
diseases are detected at much ease and pace. skin cells grow and push the older one
Due to advancement in deep learning, rein- upward. When these older cells die, they shed
forcement learning, natural language proces- off from the skin and new cells take their place.
sing, computer vision, etc., it has become It happens in a controlled and systematic mat-
possible to detect diseases without the help of ter. But when a person is suffering from mela-
a doctor or radiologist. Especially, when the noma, there is an uncontrolled growth of these
number of doctors is less and we need to detect cells which is due to the damage in DNA of
patients who are actually suffering from dis- these cells. Doctors believe that this damage of
ease, this could turn out to be a great helping DNA primarily happens because of exposure
hand. It could help us in detecting anomalous to UV radiation. But the actual reason is still
images of skin cancer with the help of com- not clear, as melanoma also occurs at the part
puter vision. The challenge, which AI or of skin which is not exposed to UV radiations.
machine learning faces today, is lack of labeled Hence, it can be due to the combination of
data. Therefore, for using machine learning many factors, such as environmental, genetical,
algorithms, we need to first collect labeled data etc. As this disease primarily happens on the
from trusted source. Although there are algo- outer skin and produces moles, which results
rithms, which can make use of unlabeled data, in the change of skin color, therefore these
called unsupervised learning, it is always good moles can be distinguished from normal moles
to have labeled data. using computer vision with the help of images.

Applications of Artificial Intelligence in Medical Imaging


186 6. Artificial intelligence-based skin cancer diagnosis

Feature Image
Image of Extracon with Extracted classificaon
mole Pre - trained Features with Arficial
Models Intelligence

Classified
image

Melanoma
OR
Normal

FIGURE 6.1 Skin cancer detection framework.

The main method or test for detecting mela- produce the segmentation and coarse classifi-
noma skin cancer is biopsy. Biopsy is a process cation results. It consisted of two fully convolu-
of removal of skin tissues for testing [17]. What tional residual networks. The classification
type of biopsy is to be used depends on doctor was done in two steps: a simple CNN was
and the condition of skin? It is costlier and used for feature extraction and a lesion index
time-consuming process as compare to deep calculation unit is developed to refine the
learning techniques. Hence, it is better to use coarse classification results by calculating the
deep learning techniques first and then these distance heat map. Accuracies of this frame-
biopsy testing methods can be done as subse- work showed promising results, that is, for
quent steps. It will save money and time as task 1, task 2, and task 3, 0.753, 0.848, and 0.912
well. As there are several Deep Learning mod- were obtained respectively.
els, they can be deployed on web or made Castro et al. [19] developed an accessible
available in the form of apps for maximizing mobile app which contains a CNN model
the reach. trained on images collected from smartphones
In this chapter, several deep learning meth- and lesion clinical information. The dataset
ods are used for the detection of skin cancer used for this problem was highly imbalanced.
using images (Fig. 6.1). Apart from classical From the proposed approach, promising
methods, transfer learning is also used which results were obtained with an accuracy of 92%
uses pretrained models. The below image and a recall of 94%.
shows the pipeline of transfer learning used. Gulati and Bhogal [20] used two pretrained
models AlexNet and VGG16 in two different
ways. They used these pretrained models for
6.2 Literature review transfer learning and feature extraction. The
results showed that transfer learning was more
Li and Shen [18] developed a framework efficient for both CNN models as compared to
using deep learning which can simultaneously feature extraction method. In transfer learning

Applications of Artificial Intelligence in Medical Imaging


6.3 Machine learning techniques 187
method, AlexNet gave 95% of accuracy and According to MIL technology, these sets are
VGG16 gave 97.5% accuracy; while in feature called bags and the items inside the bags are
extraction method AlexNet gave 90% of accu- called instances. The hypothesis of this technol-
racy and VGG16 gave 95% accuracy. ogy states that a bag is negative if all its
Adegun and Viriri [21] proposed a system instances are negative and it is positive if at least
that makes use of a softmax classifier for pixel- one of its instances is positive. In case of image
wise classification of melanoma wounds. It classification, this approach fits well, since an
was a multistage and multiscale approach. image (bag) is basically distinguished on the
They performed the experiment on two differ- basis of some of its subregions (instances). This
ent publicly available datasets, Hospital Pedro method performed better both in accuracy and
Hispano (PH2) and International Symposium sensitivity than the other classical machine
on Biomedical Imaging (ISBI) 2017. They learning approaches such as SVM. The results
obtained an accuracy of 95% on both the were promising as follows: accuracy 5 92.50%,
International Skin Imaging Collaboration (ISIC) sensitivity 5 97.50%, and specificity 5 87.50%.
2017 dataset and PH2 datasets. Ali et al. [25] proposed an ensemble technique
Codella [22] with his team proposed a system which includes VGG19-UNet, DeeplabV3 1 , and
that uses ensemble learning which combines other preprocessing methodologies. For evalua-
deep learning along with machine learning tion purpose, ISIC 2018 datasets which contain
approaches and is capable of separating skin 2594 dermoscopy images with their ground-truth
lesions, as well as analyzing the affected area. segmentation tasks were utilized. The proposed
There was a significant improvement in all the model showed promising results on the testing
performance-related characteristics. An incre- dataset: accuracy 5 93.6%, average Jaccard
ment of 4% in average precision, 7.5% in the index 5 0.815, and dice coefficient 5 0.887.
area under the receiver operating characteristic
(ROC) curve (AUC) was observed compared to
the previous state of the art. The proposed sys-
tem’s accuracy was also compared with the
6.3 Machine learning techniques
average of eight expert dermatologists on a sub-
set of 100 test images, it was observed that the
6.3.1 Artificial neural network
system obtained higher accuracy (76% vs The idea of artificial neural network (ANN)
70.5%), and specificity (62% vs 59%) evaluated comes from the neurons which transmit infor-
at an equivalent sensitivity (82%). mation from one cell to another cell. These
Lopez et al. [23] proposed a method for ANNs exhibit same behavior as biological neu-
solving the problem of distinguishing a dermo- rons and send information from one layer to
scopic image containing a skin lesion as malig- another. There are various types of ANNs,
nant or benign. The developed method used among them the most common one is multi-
transfer learning with the help of VGGNet. layer perceptron (MLP) which is used in this
Experimental results were promising as it was experiment as well [26]. In MLP, there are three
observed that the proposed method achieves a types of layers: input, hidden, and output.
sensitivity value of 78.66% on the ISIC archive Input layer takes the input in the form of vec-
dataset. tors, hidden layer process them to extract out
Astorino et al. [24] presented a multiple the features, and then finally output layer dis-
instance learning (MIL) approach, the objective plays the final output. In this algorithm, inputs
of which, in binary case, is to differentiate are multiplied with weights and a bias term is
between positive and negative sets of items. added to them and then it is passed through a

Applications of Artificial Intelligence in Medical Imaging


188 6. Artificial intelligence-based skin cancer diagnosis

nonlinear function. This process continues till make it more flexible. The weights are updated
output layer is reached [27]. During this whole by using an algorithm named as backpropaga-
process, neurons of only one layer remain active tion algorithm [28]. During this process, differ-
at a time; the output of one hidden layer acts as entials are taken with respect to weights and
an input for the next hidden layer. As the num- then they are multiplied in subsequent steps.
ber of hidden layer increases, the complexity of This gives rise to two major problems: explod-
algorithm also increases. If nonlinear function is ing gradient and vanishing gradient problem.
not used, then the expression in the output layer In exploding gradient problem, sometimes the
can be written as a linear combination of the differentials take very big values, which leads to
inputs. This would eliminate the purpose of all overflow and makes the gradient “NaN,” that
hidden layers and the whole system will act as is, not a number. While in vanishing gradient
a single-layer network. Also, this nonlinearity problem, the differentials take very small values
function makes the model more complex and which tend toward zero after subsequent multi-
helps it learn more complicated features. The plication; it results in no update of weights.
bias term is added to generalize the model and Sample Python code for ANN is given below.

import tensorflow as tf
model_ANN = tf.keras.models.Sequential([
tf.keras.Input(shape=(last_layer_shape[1],last_layer_shape[2],last_layer_shape[3])),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation = 'relu'),
tf.keras.layers.Dense(1, activation = 'sigmoid')
])

LEARNING_RATE = 1e-4
OPTIMIZER = RMSprop(lr=LEARNING_RATE,decay=1e-2)
LOSS = 'binary_crossentropy'
METRICS = [
'accuracy',
'AUC'
]

model_ANN.compile(
loss=LOSS,
metrics=METRICS,
optimizer=OPTIMIZER,
)

print("fit model on gpu")


history_ANN = model_ANN.fit(
train_features, train_labels,
epochs=EPOCHS,
verbose=VERBOSE_LEVEL,
validation_data=(val_features,val_labels)
)

y_pred_ANN = model_ANN.predict(X_test)

Applications of Artificial Intelligence in Medical Imaging


6.3 Machine learning techniques 189

6.3.2 k-nearest neighbor separates different classes. This hyperplane is


the most optimal one as it maximizes the dis-
k-nearest neighbor (k-NN) is one of the easiest tance between two nearest points of different
and straightforward machine learning algo- classes. In case of two-dimensional data and
rithms. It can be used for both regression and linear classifier, this hyperplane becomes a
classification. It does not build a model unlike straight line which divides the space into two
other machine learning algorithms; it does not halves belonging to two different classes.
have any trainable parameters. For every new When data is nonlinear, it cannot be separated
test sample, it computes distances between this using a straight line, then to solve this prob-
test sample and all training samples. Among all lem, SVM uses different kernels which trans-
these distances, it chooses the “k” nearest training forms the data into higher dimensional space
samples and then checks which class has maxi- and makes a nonlinear decision boundary.
mum elements in the “k” closest set; it labels the Compared to neural networks they have better
test sample with the class having the maximum speed and also, they do not have vanishing
elements in the “k” closest set. The value of “k” and exploding gradient problems. Sample
is chosen empirically, it shouldn’t be too large or Python code for SVM is given below.
too small. The selection of distance function is
very important; it depends on the application.
from sklearn.svm import SVC
The disadvantage is that for every new sample, classifier_SVM = SVC(kernel = 'rbf', random_state = 0)
we have to compute distances with every train- classifier_SVM.fit(X_train, y_train)
ing sample present in the dataset. Thus it is very
expensive computationally. Also, some prepro- test_acc_SVM = classifier_SVM.score(X_test, y_test)
cessing such as normalization should be done to
ensure that algorithm does not favor any single y_pred_SVM = classifier_SVM.predict(X_test)
distance just because it is closer specifically [29].
Sample Python code for k-NN is given below.

from sklearn.neighbors import KNeighborsClassifier


classifier_kNN = KNeighborsClassifier(n_neighbors = 5, algorithm='ball_tree', leaf_size=30)
classifier_kNN.fit(X_train, y_train)

test_acc_kNN = classifier_kNN.score(X_test, y_test)

y_pred_kNN = classifier_kNN.predict(X_test)

6.3.3 Support vector machine 6.3.4 Random Forest


Support vector machine (SVM) [30] is a Random Forest [31] is one of the mostly used
bunch of supervised learning algorithms which ensemble learning algorithm which produces
is used for classification, regression, and anom- promising results most of the times. It is used at
aly detection. It takes all the training points many places due to its simplicity and diversity. It
and produces a decision boundary which is can be used for both regression and classification
called as maximum margin hyperplane which problems. The word “forest” means collection of

Applications of Artificial Intelligence in Medical Imaging


190 6. Artificial intelligence-based skin cancer diagnosis

trees; here trees refer to decision tree classifiers. new decision tree models are created to rectify
Hence, it makes a forest of decision tree classifiers. the mistakes made by existing models. These
They follow “Bagging” approach which says that models are created sequentially on top of each
the collection of learning algorithms increases the other following an iterative approach. This pro-
overall performance of the model. Decision trees cess continues till stopping criteria is achieved.
are very interpretable and straightforwardly Gradient descent algorithm is used to minimize
deterministic. For a given feature set, they always the loss while making new models, therefore it is
produce the same regression or classification called gradient boosting. Execution speed of
model structure. To add some amount of ran- XGBoost is very fast as it uses parallelization to
domness or fuzziness, Random Forest classifier is make use of all cores of CPU. During preproces-
used. There can be some cases where one decision sing of data, if there exist missing values or vari-
tree classifier is overfitting, while the other one is ables need to be one-hot encoded it makes the
highly biased; they cancel errors by supporting data sparse. XGBoost has an inbuilt algorithm
each other and as a whole become more robust which handles different types of patterns in
model. During model building, whole dataset is sparse data. In order to tackle the problem of
not taken at once; instead, some randomly overfitting, it uses L1 and L2 regularization.
selected data points were given to individual deci- Sample Python code for XGBoost is given below.
sion tree classifier. Another important function of
Random Forest is that it can be used to determine
the features which are more important. At first, it import xgboost as xgb
classifier_xgb = xgb.XGBClassifier(n_estimators = 300)
creates the model on the whole dataset and calcu- classifier_xgb.fit(X_train, y_train)
lates the score. Then it shuffles one of the features
randomly and calculates the score on the remain- test_acc_xgb = classifier_xgb.score(X_test, y_test)
ing features. By observing the magnitude of incre-
ment or decrement of the score, the importance of y_pred_xgb = classifier_xgb.predict(X_test)

feature can be easily determined. Sample Python


code for Random Forest is given below.

from sklearn.ensemble import RandomForestClassifier


classifier_RF = RandomForestClassifier(n_estimators = 800, criterion = 'entropy', random_state=0)
classifier_RF.fit(X_train,y_train)

test_acc_RF = classifier_RF.score(X_test, y_test)

y_pred_RF = classifier_RF.predict(X_test)

6.3.5 XGBoost 6.3.6 AdaBoost


XGBoost [32] stands for extreme gradient AdaBoost [33] stands for Adaptive Boosting. It
boosting. It is an ensemble machine learning is an ensemble machine learning algorithm which
algorithm which uses Decision Tree classifier as follows “Boosting” technique. It assigns certain
a basic model and then builds the whole model weights to the training samples. The choice of
using “Boosting” approach. In this technique, base learner depends upon whether it accepts

Applications of Artificial Intelligence in Medical Imaging


6.3 Machine learning techniques 191
training data with weights or not; in many cases 6.3.8 Long short-term memory
decision tree classifier is used as base learner. It
builds model iteratively on top of one another and LSTM [35] stands for long short-term mem-
reassign weights in each iteration. It gives higher ory. Recurrent neural network (RNN) is a
weights to the misclassified examples so that in type of neural network that has an internal
next iteration these examples are given more pri- memory to store the previous states. Unlike
ority. This process continues until there is no error other neural networks, inputs are not inde-
and all the examples are classified properly, or pendent to each other because RNN makes
maximum number of base estimators is achieved. decision by considering both input and the
Sample Python code for AdaBoost is given below. output of the previous states. This property of
RNN makes them applicable for many tasks
such as speech recognition, music generation,
from sklearn.ensemble import AdaBoostClassifier
classifier_AdaBoost = AdaBoostClassifier(n_estimators = 100)
etc. There is a problem which arises while
classifier_AdaBoost.fit(X_train, y_train) using RNN architecture. During backpropaga-
tion in time, if the sequence is very large then
test_acc_AdaBoost = classifier_AdaBoost.score(X_test, y_test)
vanishing and exploding gradient problem
y_pred_AdaBoost = classifier_AdaBoost.predict(X_test) come into the picture. If one of derivative
takes very large value, then it may cause over-
6.3.7 Bagging flow problem in subsequent derivatives. On
the other hand, if a derivative takes very small
Bagging [34] classifier is based upon ensem-
value, then the value of subsequent deriva-
ble machine learning algorithm. This algorithm
tives in time become negligible; information
fits base classifiers on a random choice of the does not transfer properly and the effect of
portion of original training data. This random previous time states on the current state
allotment of training data is independent of each becomes negligible. Exploding gradient prob-
other. Sometimes because of this random allot- lem can be solved by clipping them in the
ment, some portion of data is used repeatedly
desired range. To solve the problem of vanish-
while some portion is left out. There can be
ing gradients, LSTM architecture is devel-
some base learners which were having the prob-
oped, which makes it easier to remember the
lem of overfitting while some of the learners
previous hidden states. They have the ability
dealing with underfitting. Base learners try to to control which information they want to
cancel errors of one another and give the final remember and which one to forget. It has
output by aggregating the predictions from all three gates, which control the flow of informa-
the learners either by voting or averaging. tion: input gate, forget gate, and output gate.
Sample Python code for Bagging is given below.
Input gate decides which value from the input
should be used to update the memory. Forget
from sklearn.ensemble import BaggingClassifier gate decides which information to be dis-
classifier_Bagging = BaggingClassifier(n_estimators=100)
classifier_Bagging.fit(X_train,y_train) carded or which should be retained. And
finally, output gate decides the flow of output
test_acc_Bagging = classifier_Bagging.score(X_test, y_test) information by making use of input and mem-
ory of the cell. Sample Python code for LSTM
y_pred_Bagging = classifier_Bagging.predict(X_test) is given below.

Applications of Artificial Intelligence in Medical Imaging


192 6. Artificial intelligence-based skin cancer diagnosis

import tensorflow as tf
model_LSTM = tf.keras.models.Sequential([
tf.keras.Input(shape=(last_layer_shape[1], last_layer_shape[2]*last_layer_shape[3])),
tf.keras.layers.LSTM(100, return_sequences=True),
tf.keras.layers.LSTM(32),
tf.keras.layers.Dense(1, activation = 'sigmoid')
])

LEARNING_RATE = 1e-4
OPTIMIZER = Adam(lr=LEARNING_RATE,decay=1e-2)
LOSS = 'binary_crossentropy'
METRICS = [
'accuracy',
'AUC'
]

model_LSTM.compile(
loss=LOSS,
metrics=METRICS,
optimizer=OPTIMIZER,
)

print("fit model on gpu")


history_LSTM = model_LSTM.fit(
train_features_2d, train_labels,
epochs=EPOCHS,
verbose=VERBOSE_LEVEL,
validation_data=(val_features_2d, val_labels)
)

y_pred_LSTM = model_LSTM.predict(test_features_2d)

6.3.9 Bidirectional long short-term memory LSTMs and makes the inputs to run in both for-
Bidirectional LSTM (Bi-LSTM) [36] is the ward and backward direction. This makes
extended version of traditional LSTMs. This model more robust and efficient as for taking
structure considers information not only from the decision both future and past hidden states
past states but also from future states. It enables are utilized. All other properties of Bi-LSTM are
the current state to take decision by considering similar to unidirectional LSTM. Sample Python
both past and present information. It uses two code for Bi-LSTM is given below.

model_Bi_LSTM.compile(
loss=LOSS,
metrics=METRICS,
optimizer=OPTIMIZER,
)

print("fit model on gpu")


history_Bi_LSTM = model_Bi_LSTM.fit(
train_features_2d, train_labels,
epochs=EPOCHS,
verbose=VERBOSE_LEVEL,
validation_data=(val_features_2d, val_labels)
)

y_pred_Bi_LSTM = model_Bi_LSTM.predict(test_features_2d)

Applications of Artificial Intelligence in Medical Imaging


6.3 Machine learning techniques 193
import tensorflow as tf
model_Bi_LSTM = tf.keras.models.Sequential([
tf.keras.Input(shape=(last_layer_shape[1], last_layer_shape[2]*last_layer_shape[3])),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(100, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(100)),
tf.keras.layers.Dense(1, activation = 'sigmoid')
])

LEARNING_RATE = 1e-4
OPTIMIZER = Adam(lr=LEARNING_RATE,decay=1e-2)
LOSS = 'binary_crossentropy'
METRICS = [
'accuracy',
'AUC'
]

6.3.10 Convolutional neural network weights are needed. Now if one more neuron is
added, it becomes 32 3 32 3 3 3 2, that is, more
CNN [37] is one of the special types of deep than 6000 weights. For such a small resolution
neural networks which are specialized in analyz-
image only, there are so many weights need to be
ing visual imagery. The term “convolution”
trained. Hence, it is more fruitful and practical
comes from the linear operation in mathematics approach to look for the local regions in an image
which is performed between matrices. CNN has rather than full image. Parameters or weights can
performed exceptionally well in computer vision, be decreased more rapidly by increasing the
natural language processing, voice recognition, strides. In convolutional layer, there is a problem
etc. The features should not be spatially depen- of loss of information on borders of the image,
dent; it is very important assumption, which is which can be easily overcome by using zero pad-
considered for all problems solved by CNNs. ding. Pooling layer is used to downsample the
CNNs consist of filters in the form of matrix image for reducing the complexity for further
which extract different features from the image. layers. Also, it does not contain any trainable
For example, if a boundary detection filter is parameters or weights. Nonlinearity layer is used
passed through the image, it extracts all the to adjust or saturate the output. The most com-
boundaries present in the image. It actually cre- mon function which is used for nonlinearity is
ates an activation map of that particular feature Rectified Linear Unit also known as “ReLU”;
so that it can be easily seen which part of the other functions are “sigmoid” and “tanh.” After
image is activated or what features are extracted the processing of image is done by going through
by this filter. A CNN architecture has many required number of layers, global pooling is done
layers: convolutional layer, pooling layer, nonline- and the matrix is flattened into a vector to extract
arity layer, and fully connected layer. In convolu- out final features. These features are fed into the
tional layer, instead of looking for the full image, fully connected layer. The fully connected layer is
local regions are focused so that trainable para- the same as traditional neural networks. It is used
meters can be reduced. For example, if an image to give the final output from the model by using
has resolution 32 3 32 3 3 raw pixels are used as “sigmoid” or “softmax” function depending
an input. To connect this input layer with one upon the number of classes. Sample Python code
neuron only 32 3 32 3 3 weights, that is, 3072 for CNN is given below.

Applications of Artificial Intelligence in Medical Imaging


194 6. Artificial intelligence-based skin cancer diagnosis

from tensorflow.keras.models import Sequential


from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras import optimizers

def create_model():
print("create model")

model = Sequential()

model.add(Conv2D(16, kernel_size=(3, 3), activation='relu',


input_shape=INPUT_SHAPE))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))


model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))


model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))


model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))


model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, kernel_size=(3, 3), activation='relu'))


model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))

return model

LEARNING_RATE = 1e-4
OPTIMIZER = Adam(lr=LEARNING_RATE,clipvalue=0.45)
LOSS = 'binary_crossentropy'
METRICS = [
'accuracy',
'AUC'
]

model.compile(
loss=LOSS,
metrics=METRICS,
optimizer=OPTIMIZER,
)

print("fit model on gpu")


history = model.fit(
train_gen,
epochs=EPOCHS,
verbose=VERBOSE_LEVEL,
validation_data=(valX, valY),
)

y_p = model.predict_generator(test_gen)

Applications of Artificial Intelligence in Medical Imaging


6.3.11 Transfer learning as edge detection and contour detection can be
used in the similar tasks using transfer learning,
Transfer learning [38] is a method in AI which without doing training from scratch in the other
uses the learning or feature extraction done in one task. For example, if weights are saved for anom-
task for solving the problem in another task. The aly detection task previously, then these weights
model made in one task behaves as a starting can be used again in detecting a benign tumor or
point in another task. It tries to store the knowl- melanoma. Also, when there is lack of data, a big
edge gained in a task, so that again training from CNN architecture cannot be trained from scratch.
scratch is not required for similar kind of tasks; In these cases, use of transfer learning becomes
saved model can be reused. In deep learning the best option. The weights of many trained net-
tasks, this technique is widely used as promising works are available on internet in various frame-
results are obtained from it. As it is known that in works. The CNN architectures used in this
CNN architecture different layers learn different chapter are VGG16, VGG19, ResNet, ResNet50,
features; the layers present in the initial stage learn InceptionV3, InceptionResNetV2, DenseNet121,
very basic features; for example, edge detection or DenseNet169, MobileNet, MobileNetV2, and
contour detection as compared to layers present Xception. Sample Python code for transfer learn-
in the later stage. Hence, these basic features such ing (MobileNet) is given below.
from keras.applications.mobilenet import MobileNet
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

def load_pretrained_model():
base_model = MobileNet(
input_shape=INPUT_SHAPE,
include_top=False,
weights='imagenet'
)

# freeze the first 75 layers of the base model. All other layers are trainable.
for layer in base_model.layers[0:75]:
layer.trainable = False

return base_model

def create_model():
print("create model")

model = Sequential()

model.add(load_pretrained_model())
model.add(layers.Flatten())

model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.3))

model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.3))

model.add(layers.Dense(32, activation='relu'))

model.add(layers.Dense(1, activation='sigmoid'))

return model

model = create_model()
196 6. Artificial intelligence-based skin cancer diagnosis

6.4 Results and discussions down. After this step, different convolutional
layer models were built ranging from two to
6.4.1 Dataset eight convolutional layers. This whole process
completes the first stage of the experiment.
The dataset used in this experiment is taken In the second stage, features of the image
from Kaggle website [39]. It was generated and were extracted from different pretrained mod-
available by the ISIC and images are from the els (mentioned above). These features were
following sources: Memorial Sloan Kettering passed through global average pooling layer
Cancer Center, Melanoma Institute Australia, and a classifier was put on the top of it. Several
Hospital Clinic de Barcelona, Medical University machine learning techniques were used for
of Vienna, The University of Athens Medical final classification such as XGBoost, Random
School, and the University of Queensland. The Forest, SVM, k-NN, AdaBoost, ANN, LSTM,
dataset was highly imbalanced as the number of Bi-LSTM, and Bagging classifier.
melanoma cases was only 575, while on the
other hand the number of benign cases was
31,956. For balancing the dataset, equal number
of benign and melanoma cases was taken, that 6.4.3 Performance metrics
is, 575. The height and the width of images var- The dataset was initially imbalanced, for mak-
ied a lot. To make all the images have uniform ing it balanced equal number of positive and
size, they were converted to the size 224 3 224. negative cases was taken. It was then further
divided into three sets, that is, train, validation,
and test set and accuracies were recorded sepa-
6.4.2 Experimental setup
rately for them. Training was done on training
In the first stage, CNNs were used for classifi- set, hyperparameter tuning was done on valida-
cation of benign and melanoma images. A total tion set and finally model was tested on test set
of 11 different pretrained models (VGG16, to see how the model is generalized. After these
VGG19, MobileNet, MobileNetV2, InceptionV3, steps, F1 score was calculated on test set as it
InceptionResNetV2, DenseNet121, DenseNet169, includes both precision and recall which gives us
Xception, ResNet, and ResNet50) with some better idea whether the model is biased for one
fine-tunings were used for classification and particular class or not. Finally, AUC is calculated
their different accuracy parameters were noted for test set to check the model robustness further.

from sklearn import metrics


import numpy as np
def print_performance_metrics(test_labels,predict):
print('Accuracy:', np.round(metrics.accuracy_score(test_labels, predict),4))
print('ROC Area:', np.round(metrics.roc_auc_score(test_labels, predict),4))
print('Precision:', np.round(metrics.precision_score(test_labels, predict,
average='weighted'),4))
print('Recall:', np.round(metrics.recall_score(test_labels, predict,
average='weighted'),4))
print('F1 Score:', np.round(metrics.f1_score(test_labels, predict,
average='weighted'),4))
print('Cohen Kappa Score:', np.round(metrics.cohen_kappa_score(test_labels, predict),4))
print('Matthews Corrcoef:', np.round(metrics.matthews_corrcoef(test_labels, predict),4))
print('\t\tClassification Report:\n', metrics.classification_report(test_labels, predict))

print_performance_metrics(y_test,y_pred)

Applications of Artificial Intelligence in Medical Imaging


6.4 Results and discussions 197

6.4.4 Experimental results while the model with four layers performed
the worst with an F1 score of 0.657.
Different pretrained architectures were used From Table 6.2 to Table 6.12, deep feature
for classification which contains pretrained extraction techniques were used. Pretrained
weights. Apart from these pretrained architec- model architectures were used to extract the
tures, several CNNs with different number of features and then several machine learning
layers were also used. classifiers were put on top of that to give the
From Table 6.1, it can be observed that in final output.
pretrained models “MobileNet” architecture In Table 6.2, VGG16 architecture was used
performed the best with an F1 score of 0.8014. for feature extraction. It can be clearly seen that
On the other hand, “ResNet50” performed the “LSTM” classifier performed the best with an
worst with an F1 score of 0.371. Among the F1 score of 0.802. The worst performance was
CNN layers, the model with eight layers per- given by the “k-NN” classifier with an F1 score
formed the best with an F1 score of 0.7675, of 0.6446.

TABLE 6.1 Pretrained and convolutional neural network (CNN) models.


Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

VGG16 0.79 0.7969 0.7647 0.7603 0.7654


VGG19 0.7747 0.7344 0.7594 0.7593 0.7593
MobileNet 0.8359 0.7812 0.8021 0.8014 0.8018
MobileNetV2 0.7941 0.8438 0.7754 0.7752 0.7756
InceptionV3 0.8451 0.8438 0.7647 0.7644 0.7645

InceptionResNetV2 0.8889 0.8281 0.7914 0.7913 0.7913


DenseNet121 0.844 0.9062 0.754 0.7531 0.7543
DenseNet169 0.8756 0.8438 0.7594 0.7591 0.7596
Xception 0.8328 0.7344 0.6845 0.6828 0.6841
ResNet 0.5015 0.6719 0.5936 0.5911 0.5932
ResNet50 0.5413 0.4375 0.5187 0.371 0.5161

CNN 2 Layer 0.7574 0.75 0.7326 0.7316 0.733


CNN 3 Layer 0.7472 0.7656 0.7594 0.7587 0.7591
CNN 4 Layer 0.7584 0.7344 0.6578 0.657 0.6575
CNN 5 Layer 0.7299 0.7344 0.7433 0.7431 0.7435
CNN 6 Layer 0.7666 0.7812 0.7273 0.7272 0.7274
CNN 7 Layer 0.7248 0.7344 0.7273 0.7266 0.727

CNN 8 Layer 0.7798 0.7812 0.7701 0.7675 0.7695

Applications of Artificial Intelligence in Medical Imaging


198 6. Artificial intelligence-based skin cancer diagnosis

TABLE 6.2 VGG16 architecture was used for feature extraction.


Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9195 0.7991 0.7701 0.7693 0.7704


Random Forest 1 0.829 0.7807 0.7803 0.781
AdaBoost 1 0.7778 0.7166 0.7153 0.7169
K-NN 0.7543 0.6624 0.6471 0.6446 0.6475
XGBoost 1 0.8205 0.7701 0.77 0.77
Bagging 1 0.7991 0.7433 0.7427 0.7436

ANN 1 0.8034 0.7487 0.7481 0.7489


LSTM 1 0.8419 0.8021 0.802 0.8023
Bi-LSTM 1 0.8333 0.7861 0.7859 0.7863

TABLE 6.3 VGG19 architecture was used for feature extraction.

Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9256 0.7735 0.738 0.738 0.7379


Random Forest 1 0.7991 0.7433 0.7428 0.7431

AdaBoost 0.999 0.7607 0.7005 0.7005 0.7006


K-NN 0.7533 0.6538 0.6203 0.607 0.6213
XGBoost 1 0.8077 0.7487 0.7477 0.7483
Bagging 1 0.829 0.7861 0.7861 0.7861
ANN 1 0.7863 0.7273 0.7272 0.7272
LSTM 1 0.8077 0.7487 0.7469 0.7482

Bi-LSTM 1 0.8077 0.754 0.7525 0.7536

In Table 6.3, VGG19 architecture was used F1 score were “XGBoost” with a score of
for feature extraction. Among the classifiers, 0.818 and “k-NN” with a score of 0.6608,
“Bagging” classifier achieved the highest F1 respectively.
score, that is, 0.7861, while “k-NN” classifier In Table 6.5, MobileNetV2 architecture was
achieved the lowest F1 score, that is, 0.607. used for feature extraction. Among the classi-
In Table 6.4, MobileNet architecture was fiers, “XGBoost” performed the best with an F1
used for feature extraction. The classifiers score of 0.8128; on the other hand, “k-NN” per-
which achieved the highest and the lowest formed poorly with an F1 score of 0.6575.

Applications of Artificial Intelligence in Medical Imaging


6.4 Results and discussions 199
TABLE 6.4 MobileNet architecture was used for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.949 0.8376 0.8075 0.8074 0.8074


Random Forest 1 0.829 0.7807 0.7807 0.7808
AdaBoost 1 0.8034 0.7487 0.7486 0.7486
K-NN 0.738 0.7008 0.6791 0.6608 0.6804
XGBoost 1 0.859 0.8182 0.818 0.818
Bagging 1 0.8376 0.7914 0.7914 0.7914

ANN 1 0.8547 0.8182 0.818 0.818


LSTM 1 0.8333 0.7861 0.7854 0.7858
Bi-LSTM 1 0.8547 0.8128 0.8124 0.8126

TABLE 6.5 MobileNetV2 architecture was used for feature extraction.


Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9511 0.812 0.7701 0.7697 0.7698


Random Forest 1 0.8419 0.7914 0.7914 0.7915
AdaBoost 1 0.812 0.754 0.7538 0.7542
K-NN 0.7727 0.6752 0.6684 0.6575 0.6694

XGBoost 1 0.859 0.8128 0.8128 0.8129


Bagging 1 0.812 0.754 0.7539 0.7541
ANN 1 0.8291 0.7701 0.7701 0.7701
LSTM 1 0.812 0.7487 0.7484 0.7485
Bi-LSTM 1 0.8291 0.7701 0.7695 0.7698

In Table 6.6, InceptionV3 architecture was In Table 6.7, InceptionResNetV2 architecture


used for feature extraction. From the table, it was used for feature extraction. Among the clas-
can be observed that “Bi-LSTM” performed the sifiers, “ANN” classifier performed the best
best with an F1 score of 0.7693, while “k-NN” with an F1 score of 0.7857, while “k-NN” per-
performed the worst with an F1 score of 0.6763. formed the worst with an F1 score of 0.6819.

Applications of Artificial Intelligence in Medical Imaging


200 6. Artificial intelligence-based skin cancer diagnosis

TABLE 6.6 InceptionV3 architecture was used for feature extraction.


Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.946 0.7906 0.7487 0.7484 0.7489


Random Forest 1 0.7949 0.738 0.738 0.738
AdaBoost 1 0.7906 0.738 0.7378 0.7381
K-NN 0.788 0.7094 0.6791 0.6763 0.6796
XGBoost 1 0.7521 0.6845 0.6842 0.6843
Bagging 1 0.7906 0.7326 0.7325 0.7325

ANN 1 0.812 0.7594 0.7586 0.7597


LSTM 1 0.7906 0.7326 0.7322 0.7328
Bi-LSTM 1 0.8205 0.7701 0.7693 0.7704

TABLE 6.7 InceptionResNetV2 architecture was used for feature extraction.

Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9266 0.7863 0.7326 0.7326 0.7326


Random Forest 1 0.8205 0.7594 0.7593 0.7593

AdaBoost 1 0.7906 0.7273 0.7272 0.7272


K-NN 0.7788 0.7179 0.6898 0.6819 0.6907
XGBoost 1 0.812 0.7487 0.748 0.7484
Bagging 1 0.8291 0.7701 0.7695 0.7698
ANN 1 0.8419 0.7861 0.7857 0.7859
LSTM 1 0.8205 0.7594 0.7593 0.7593

Bi-LSTM 1 0.8077 0.738 0.7377 0.7378

In Table 6.8, DenseNet121 architecture was 0.8075, while “AdaBoost” classifier achieved
used for feature extraction. The classifiers with the lowest F1 score, that is, 0.6898.
the highest and the lowest F1 score were In Table 6.10, Xception architecture was
“ANN” with a score of 0.7968 and “k-NN” used for feature extraction. “Random
with a score of 0.6416, respectively. Forest” classifier achieved the highest F1
In Table 6.9, DenseNet169 architecture was score, that is, 0.7366, while “Bagging” classi-
used for feature extraction. “Bi-LSTM” classi- fier achieved the lowest F1 score, that is,
fier achieved the highest F1 score, that is, 0.6969.

Applications of Artificial Intelligence in Medical Imaging


6.4 Results and discussions 201
TABLE 6.8 DenseNet121 architecture was used for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9368 0.8077 0.7807 0.7807 0.7807


Random Forest 1 0.8248 0.7754 0.7751 0.7752
AdaBoost 1 0.7906 0.7326 0.7326 0.7326
K-NN 0.7808 0.6838 0.6524 0.6416 0.6533
XGBoost 1 0.812 0.7594 0.7587 0.7591
Bagging 1 0.7991 0.7487 0.7486 0.7486

ANN 1 0.8504 0.7968 0.7968 0.7967


LSTM 0.999 0.8248 0.7754 0.7754 0.7755
Bi-LSTM 1 0.8419 0.7861 0.7861 0.7861

TABLE 6.9 DenseNet169 architecture was used for feature extraction.


Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9246 0.7949 0.7701 0.77 0.77


Random Forest 1 0.8333 0.7914 0.7914 0.7914
AdaBoost 1 0.7564 0.6898 0.6898 0.6899
K-NN 0.8114 0.7222 0.7059 0.7036 0.7064
XGBoost 1 0.8205 0.7754 0.7754 0.7755
Bagging 1 0.8248 0.7807 0.7807 0.7807

ANN 1 0.8376 0.7968 0.7968 0.7968


LSTM 1 0.8162 0.7701 0.7699 0.7699
Bi-LSTM 1 0.8462 0.8075 0.8075 0.8075

In Table 6.11, ResNet architecture was used while “k-NN” performed the worst with an F1
for feature extraction. The classifiers with the score of 0.6769.
highest and the lowest F1 score were
“Bagging” with a score of 0.8021 and “k-NN”
with a score of 0.6636, respectively.
In Table 6.12, ResNet50 architecture was
6.4.5 Discussion
used for feature extraction. From the table, it From the results obtained, it is observed that
can be observed that “Random Forest” per- both end-to-end learning and deep feature
formed the best with an F1 score of 0.7967, extraction technique is effective in classification

Applications of Artificial Intelligence in Medical Imaging


202 6. Artificial intelligence-based skin cancer diagnosis

TABLE 6.10 Xception architecture was used for feature extraction.


Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9195 0.7607 0.7273 0.7266 0.727


Random Forest 1 0.7949 0.738 0.7366 0.7376
AdaBoost 1 0.7778 0.7112 0.7099 0.7109
K-NN 0.8043 0.7051 0.7059 0.7055 0.7061
XGBoost 1 0.7906 0.7273 0.7254 0.7268
Bagging 1 0.7692 0.7005 0.6969 0.7

ANN 1 0.7906 0.7326 0.7325 0.7325


LSTM 1 0.7778 0.7166 0.7166 0.7165
Bi-LSTM 1 0.7735 0.7112 0.7112 0.7112

TABLE 6.11 ResNet architecture was used for feature extraction.


Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9521 0.8333 0.7914 0.7914 0.7914


Random Forest 1 0.8248 0.7754 0.7751 0.7752
AdaBoost 1 0.8205 0.7647 0.7644 0.7645
K-NN 0.7982 0.7009 0.6738 0.6636 0.6747
XGBoost 1 0.8205 0.7701 0.77 0.77
Bagging 1 0.8419 0.8021 0.8021 0.802

ANN 1 0.8205 0.7701 0.7698 0.7702


LSTM 1 0.8376 0.7914 0.7914 0.7915
Bi-LSTM 1 0.8376 0.7914 0.7913 0.7916

of melanoma and benign cases. Especially, MobileNet architecture and XGBoost was used
MobileNet architecture performed very well in as a classifier achieved the highest F1 score of
both end-to-end learning and deep feature 0.818. Other classifiers such as LSTM, Random
extraction technique. In end-to-end learning, Forest, and ANN also performed well. Features
MobileNet architecture achieved the highest F1 extracted from VGG16, ResNet, MobileNetV2,
score of 0.8014. In feature extraction technique, and DenseNet169 also gave promising results
the model in which features were extracted from and secured the F1 score of more than 80.

Applications of Artificial Intelligence in Medical Imaging


References 203
TABLE 6.12 ResNet50 architecture was used for feature extraction.
Model Training accuracy Validation accuracy Test accuracy F1 score ROC area

SVM 0.9358 0.8333 0.7914 0.7914 0.7915


Random Forest 1 0.8419 0.7968 0.7967 0.7967
AdaBoost 0.999 0.7949 0.738 0.7377 0.7382
K-NN 0.789 0.7222 0.6898 0.6769 0.6909
XGBoost 1 0.8333 0.7861 0.7861 0.786
Bagging 1 0.8162 0.7647 0.7642 0.7645

ANN 1 0.8419 0.7968 0.7966 0.797


LSTM 1 0.8333 0.7861 0.7861 0.786
Bi-LSTM 1 0.8376 0.7914 0.7914 0.7914

6.5 Conclusion [2] The Skin Cancer Foundation, How dangerous is mela-
noma? It’s all a matter of timing ,https://www.skincan-
cer.org/blog/dangerous-melanoma-matter-timing/.,
Skin cancer is one of the most hazardous and Oct. 27, 2017 (accessed 03.02.21).
common diseases which world is facing right [3] V.C. Gorantla, J.M. Kirkwood, State of melanoma: an his-
now. Therefore proper research and actions need toric overview of a field in transition, Hematol. Oncol.
to be taken for reducing the spread of this dis- Clin. North. Am. 28 (3) (2014) 415435. Available from:
https://doi.org/10.1016/j.hoc.2014.02.010.
ease. AI has proven its importance in every field,
[4] Melanoma Research Alliance, 2020. Melanoma mortality
especially in medical science. To reduce its rates decreasing despite ongoing increase in
spread, the first step and most important step incidence, , https://www.curemelanoma.org/blog/arti-
is to detect it. Many times, people ignore it cle/2020-melanoma-mortality-rates-decreasing-despite-
and hesitate in going to the doctor because they ongoing-increase-in-incidence-rates . (accessed 03.02.21).
[5] Mayo Clinic, Skin cancer - symptoms and causes,
consider it as a very rare thing. This ignorance
, https://www.mayoclinic.org/diseases-conditions/
results in great disaster and the situation skin-cancer/symptoms-causes/syc-20377605 .
becomes out of control. If some kind of app or (accessed 03.02.2021).
website can be made which can deploy these [6] SEER, Melanoma of the skin - cancer stat facts,
machine learning models, then people will hesi- ,https://seer.cancer.gov/statfacts/html/melan.html.
(accessed 03.02.21).
tate less in checking this disease. More number
[7] AIM at Melanoma Foundation, Age and risk,
of cases can be detected on time and people can ,https://www.aimatmelanoma.org/melanoma-101/
be cured on time. Hence, these kinds of research understanding-melanoma/melanoma-risk-factors/age-
in medical science are very crucial. and-risk/. (accessed 03.02.21).
[8] Cancer young in adults. ,https://www.cancer.org/can-
cer/cancer-in-young-adults.html. (accessed 03.02.21).
References [9] Cancer Treatment Centers of America, Types of mela-
[1] Wikipedia, Melanoma, ,https://en.wikipedia.org/w/ noma: common, rare and more varieties. ,https://www.
index.php?title 5 Melanoma&oldid 5 1002726498., Jan. cancercenter.com/cancer-types/melanoma/types., Oct.
25, 2021 (accessed 03.02.21). 05, 2018 (accessed 03.02.21).

Applications of Artificial Intelligence in Medical Imaging


204 6. Artificial intelligence-based skin cancer diagnosis

[10] Cancer Treatment Centers of America, What are the [22] N.C. Codella, et al., Deep learning ensembles for mela-
symptoms and signs of melanoma? ,https://www. noma recognition in dermoscopy images, IBM J. Res.
cancercenter.com/cancer-types/melanoma/ Dev. 61 (4/5) (2017). 51.
symptoms., Oct. 05, 2018 (accessed 03.02.21). [23] A.R. Lopez, X. Giro-i-Nieto, J. Burdick, O. Marques,
[11] Stages of melanoma skin cancer. ,https://www.can- Skin lesion classification from dermoscopic images
cer.org/cancer/melanoma-skin-cancer/detection- using deep learning techniques, in: 2017 13th IASTED
diagnosis-staging/melanoma-skin-cancer-stages. International Conference on Biomedical Engineering
html. (accessed 03.02.21). (BioMed), Feb. 2017, pp. 4954. Available from:
[12] Cancer Treatment Centers of America, Melanoma https://doi.org/10.2316/P.2017.852-053.
treatment options & advanced therapies, ,https:// [24] A. Astorino, A. Fuduli, P. Veltri, E. Vocaturo,
www.cancercenter.com/cancer-types/melanoma/ Melanoma detection by means of multiple instance
treatments., Oct. 05, 2018 (accessed 03.02.21). learning, Interdiscip. Sci. Comput. Life Sci 12 (1)
[13] CTCA, Chemotherapy: personalized therapies to treat (2020) 2431. Available from: https://doi.org/
cancer. ,https://www.cancercenter.com/treatment- 10.1007/s12539-019-00341-y.
options/chemotherapy. (accessed 03.02.21). [25] R. Ali, R.C. Hardie, B.N. Narayanan, S.D. Silva, Deep
[14] Cancer Treatment Centers of America, learning ensemble methods for skin lesion analysis
Immunotherapy to treat cancer: options & side effects, towards melanoma detection, in: 2019 IEEE National
Oct. 17, 2018. https://www.cancercenter.com/treat- Aerospace and Electronics Conference (NAECON),
ment-options/precision-medicine/immunotherapy Jul. 2019, pp. 311316. Available from: http://
(accessed Feb. 03, 2021). 10.1109/NAECON46414.2019.9058245.
[15] Cancer Treatment Centers of America, Radiation ther- [26] I. Yilmaz, N. Erik, O. Kaynar, Different types of learn-
apy: usages, side effects & more, Oct. 17, 2018. ing algorithms of artificial neural network (ANN)
,https://www.cancercenter.com/treatment-options/ models for prediction of gross calorific value (GCV) of
radiation-therapy., (accessed 03.02. 21). coals, Sci. Res. Essays 5 (2010) 22422249.
[16] Cancer Treatment Centers of America, What is cancer [27] S.-C. Wang, Artificial neural network, in: S.-C. Wang
surgery? | Options & side effects, ,https://www. (Ed.), Interdisciplinary Computing in Java
cancercenter.com/treatment-options/surgery., Oct. Programming, Springer US, Boston, MA, 2003,
17, 2018 (accessed 03.02.21). pp. 81100. Available from: https://doi.org/10.1007/
[17] Mayo Clinic, Melanoma - symptoms and causes, 978-1-4615-0377-4_5.
,https://www.mayoclinic.org/diseases-conditions/ [28] V. Skorpil, J. Stastny, Neural networks and back prop-
melanoma/symptoms-causes/syc-20374884. agation algorithm, Sep. 2006.
(accessed 28.02.21). [29] G. Guo, H. Wang, D. Bell, Y. Bi, “KNN model-based
[18] Y. Li, L. Shen, Skin lesion analysis towards melanoma approach in classification,” Aug. 2004.
detection using deep learning network, Sensors 18 (2) [30] T. Evgeniou, M. Pontil, Support vector machines: the-
(2018). Available from: https://doi.org/10.3390/ ory and applications, Advanced Course on Artificial
s18020556. Art. no. 2. Intelligence, vol. 2049, Springer, 2001, pp. 249257.
[19] P.B.C. Castro, B. Krohling, A.G.C. Pacheco, R.A. Available from: http://doi.org/10.1007/3-540-44673-
Krohling, “An app to detect melanoma using deep 7_12.
learning: an approach to handle imbalanced data [31] J. Ali, R. Khan, N. Ahmad, I. Maqsood, Random
based on evolutionary algorithms,” in: 2020 Forests and Decision Trees, Int. J. Computer Sci. Issues
International Joint Conference on Neural Networks (IJCSI) 9 (2012).
(IJCNN), Jul. 2020, pp. 16. Available from: https:// [32] T. Chen, C. Guestrin, XGBoost: a scalable tree boosting
doi.org/10.1109/IJCNN48605.2020.9207552. system, Aug. 2016, pp. 785794. Available from:
[20] S. Gulati and R. K. Bhogal, “Detection of malignant https://doi.org/10.1145/2939672.2939785.
melanoma using deep learning,” in: International [33] T. Chengsheng, L. Huacheng, X. Bing, AdaBoost
Conference on Advances in Computing and Data typical algorithm and its application research,
Science, Singapore, 2019, pp. 312325. Available from: MATEC Web of Conferences, 139, p. 00222, Jan. 2017,
https://doi.org/10.1007/978981-13-9939-8_28. Available from: https://doi.org/10.1051/matecconf/
[21] A.A. Adegun, S. Viriri, Deep learning-based system 201713900222.
for automatic melanoma detection, IEEE Access. 8 [34] P. Bühlmann, B. Yu, Analyzing bagging, Ann. Stat.
(2020) 71607172. Available from: https://doi.org/ 30 (4) (2002) 927961. Available from: https://doi.
10.1109/ACCESS.2019.2962812. org/10.1214/aos/1031689014.

Applications of Artificial Intelligence in Medical Imaging


References 205
[35] S. Hochreiter, J. Schmidhuber, Long short-term mem- Conference on Engineering and Technology (ICET),
ory, Neural Comput. 9 (1997) 17351780. Available Aug. 2017, pp. 16. Available from: https://doi.org/
from: https://doi.org/10.1162/neco.1997.9.8.1735. 10.1109/ICEngTechnol.2017.8308186.
[36] A. Graves, S. Fernández, J. Schmidhuber, Bidirectional [38] M. Hussain, J. Bird, D. Faria, A study on CNN transfer
LSTM Networks for Improved Phoneme Classification learning for image classification, UKCI 2018: 18th Annual
and Recognition, International Conference on UK Workshop on Computational Intelligence, Jun. 2018.
Artificial Neural Networks, Jan. 2005, pp. 799804. [39] Kaggle, SIIM-ISIC melanoma classification. ,https://kag-
[37] S. Albawi, T.A. Mohammed, S. Al-Zawi, Understanding gle.com/c/siim-isic-melanoma-classification. (accessed
of a convolutional neural network, in: 2017 International 07.03.21).

Applications of Artificial Intelligence in Medical Imaging


This page intentionally left blank
C H A P T E R

7
Brain stroke detection from computed
tomography images using deep learning
algorithms
Aykut Diker1, Abdullah Elen1 and Abdulhamit Subasi2,3
1
Department of Software Engineering, Faculty of Engineering and Natural Sciences, Bandirma Onyedi
Eylul University, Bandirma, Balikesir, Turkey 2Institute of Biomedicine, Faculty of Medicine,
University of Turku, Turku, Finland 3Department of Computer Science, College of Engineering, Effat
University, Jeddah, Saudi Arabia

O U T L I N E

7.1 Introduction 207 7.3.4 VGG-16 214


7.3.5 VGG-19 215
7.2 Literature survey in brain stroke
detection 209 7.4 Experimental results 216
7.4.1 Dataset 217
7.3 Deep learning methods 210
7.3.1 AlexNet 210 7.5 Conclusion 221
7.3.2 GoogleNet 212
References 221
7.3.3 Residual convolutional neural
network 212

7.1 Introduction medical disorder in which the brain tissues


lose their capability to get oxygen as a result of
The World Health Organization reported a diminished or completely cut blood supply.
that in 2016, brain stroke is the second leading The loss of brain cells occurs quickly as a result
cause of mortality worldwide. Hereby, the gen- of this. Ischemic and hemorrhagic strokes are
erality of survivors are compelled to live with the two forms of stroke. Ischemic stroke is the
a constant or long-term injury [1,2]. It is a most prevalent form of stroke, and it occurs

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00013-X 207 © 2023 Elsevier Inc. All rights reserved.
208 7. Brain stroke detection from computed tomography images using deep learning algorithms

when the blood supply to the brain tissues is methods for automatically detecting the sever-
decreased; other stroke is hemorrhagic, and it ity of stroke disease to overcome these issues
occurs when a vessel inside the brain ruptures. [6]. Normal and stroke brain computed tomog-
Stroke, with the simplest definition, is a “brain raphy (CT) images are given in Fig. 7.1.
attack” caused by cessation of blood flow. Besides, scientists and engineers have fre-
Although stroke is used synonymously with quently used machine learning (ML) and artificial
the term hemiplegia, it is colloquially known intelligence methods for the detection and classifi-
as paralysis. Its distribution is age-related and cation of stroke. Al-Qazzaz et al. [7] suggested an
doubles every 10 years after the age of 55. It autonomous machine interface structure to deter-
ranks third among the causes of death in the mine rehabilitation changes and present a new
population over 65 years of age, after heart dis- method based on BCI. The EEG samples from
eases and cancer. Since stroke occurs as a sud- poststroke subjects with upper extremity hemi-
den event, this event that happens to a person paresis were examined. Additionally, Random
unexpectedly is considered as a crisis situation Forest (RF), Support vector machine (SVM), and
[35]. MRI images contain important informa- k-NN classifiers were used to classify EEG sig-
tion for the classification of stroke severity. nals. Sung et al. [8] proposed an ML program that
On the other hand, it is very difficult to inter- could detect subjects with suspected stroke during
pret the scan results as small changes in these emergency department triage. The application
images are spots that indicate the severity of can be integrated into an electronic triage system
the stroke. This complexity corresponds to a and used to initiate code strokes. To develop
significant amount of time spent manually ana- stroke classification models, researchers investi-
lyzing images. The mental efforts required to gated SVM, RF, k-NN, C4.5, CART, and logistic
determine the severity of a stroke create regression (LR). In the experimental study, the
exhaustion, and fatigue can contribute to accuracy values of C4.5, CART, RF, k-NN, SVM,
human error, which affects diagnostic quality. and LR classifiers were obtained 81.2%, 81.1%,
On the outside of fatigue, all human-based 82.6%, 80.6%, 81.3%, and 82.0%, respectively. Ref.
classification approaches have interobserver [9] presents a new deep learning-based technique
and intraobserver variability. Clinical analyst for segmenting stroke lesions in CT perfusion
education is a technique that can help to miti- maps. The suggested approach was tested using
gate some of these errors. However, educating the ISLES 2018 dataset. The positive predictive
someone to be an expert takes time and money, value and sensitivity (SEN) value of the proposed
making regular stroke risk prediction activities method were obtained as 68% and 67%, respec-
uneconomical. Researchers have developed tively. In Ref. [10] a method generated from the

FIGURE 7.1 Normal and stroke brain CT image samples. CT, Computed tomography.

Applications of Artificial Intelligence in Medical Imaging


7.2 Literature survey in brain stroke detection 209
primary frequencies of CT samples of the brain is that the mean accuracy values for all classifiers
used to categorize hemorrhagic and ischemic were above 95%. Vargas et al. [13] proposed a
strokes. The major benefit of this method is that it model to predict the presence and laterality of
only makes use of the image as a parameter. The perfusion deficiency in patients with acute ische-
results mentioned a high-value rate with SPE mic stroke. In the proposed model, a long-term
99.1%, SEN 97%, and ACC 98% in stroke recurrent convolutional network is built, consist-
classification. ing of a convolutional neural network (CNN)
In this chapter, deep learning models are stacked on top of the long short-term memory
employed for stroke classification using brain CT layer. They reported that the best accuracy score
images. Moreover, the Brain Stroke CT Image in the validation data in experimental studies
Dataset was used for stroke classification. The was 85.8%. Gautam and Raman [14] proposed a
chapter is arranged as follows: studies in brain new classification model that uses image fusion
stroke detection are detailed in Part 2. The deep and deep learning to define cerebral palsy. They
learning techniques used in the chapter are preprocessed the CT images used as input data
described in Part 3. The experimental results are with quadtree-based image fusion method and
reported in Part 4. In Parts 5, discussion and con- used fusion technique to increase the contrast of
cluding remarks are given, respectively. the stroke district. Next, they proposed a new
CNN structure for the classification of cerebral
palsy from CT images into three and two (ische-
7.2 Literature survey in brain stroke mic and hemorrhagic) categories. The classifica-
detection tion accuracy of their proposed method is
between 98.33% and 98.77%. Kanchana and
In recent years, many studies have been car- Menaka [15] proposed a new histogram-based
ried out on stroke detection in the field of com- model for the detection of ischemic stroke, using
puter science. Studies on this subject are optimal feature group selection to classify abnor-
generally on the classification of CT and MR mal and normal districts on CT images and seg-
images through deep learning and ML algo- ment the ischemic stroke lesion. They used SVM,
rithms. Bento et al. [11] proposed a model to LR, RF, and neural network (NN) as classifica-
identify subjects with carotid artery atheroscle- tion methods. In experimental studies, they
rosis from a cohort of magnetic resonance brain reported that they reached classification accuracy
samples based on computer aid. They used the ranging from 88.77% to 99.79% in the detection
SVM classifier after identifying a set of hand- of ischemic stroke lesion. Raghavendra et al. [16]
crafted and convolutional discriminant features proposed a probabilistic NN model using non-
to apply the automatic identification task. linear features for early detection of intracerebral
According to their results, they stated that it hemorrhage on CT images. The model they
has an accuracy rate of 97.5%, a sensitivity of developed reported the best accuracy rate of
96.4%, and a specificity of 97.9%. Filho et al. 97.37% in distinguishing between normal and
[12] developed a method for extracting features hemorrhagic subjects. Herzog et al. [17] pro-
based on radiological density models of the posed a CNN model that incorporates Bayesian
brain, called Brain Tissue Density Analysis. uncertainty into the analysis procedure to diag-
They applied this method to CT images to nose ischemic stroke patients based on MR
classify and identify the occurrence of stroke images. In a cohort of 511 patients, their pro-
diseases. They applied multi-layer perceptron, posed model was improved by 2% over a non-
optimum path forest, SVM, k-NN, and Bayesian Bayesian counterpart, reporting 95.33% accuracy
classifiers to classify CT images. They reported at the image level. Badriyah et al. [18] employed

Applications of Artificial Intelligence in Medical Imaging


210 7. Brain stroke detection from computed tomography images using deep learning algorithms

RF, Naive Bayes, k-NN, LR, Decision Tree, SVM, 7.3.1 AlexNet
multilayer perceptron neural network, and deep
learning, to classify stroke into its two subtypes Even though it is stated that Yann LeCun
(ischemic and hemorrhage). According to their employed deep learning for the first time in an
experimental results, they reported that the RF article published in 1998, it was first heard
algorithm gave the best classification score with worldwide in 2012. The AlexNet model,
95.97% accuracy. designed with a deep learning architecture,
won the ImageNet competition held that year.
The study was announced with the paper
named “ImageNet Classification with Deep
7.3 Deep learning methods Convolutional Networks.” The computerized
object recognition error rate was lowered from
In this chapter, a model based on the CNN 26.2% to 15.4% using this deep learning model.
has been proposed to classify it as stroke and The architecture is consisting of five convolu-
normal, which consists of two classes in total. tion layers, a pooling layer, and three fully con-
The proposed CNN structure is given in Fig. 7.2. nected layers (FCL) given in Fig. 7.3. The first
convolutional layer filters the input image

FIGURE 7.2 Block diagram of the brain stroke detection architecture.

FIGURE 7.3 AlexNet architecture.

Applications of Artificial Intelligence in Medical Imaging


7.3 Deep learning methods 211
(227 3 227 3 3) with 96 cores of 11 3 11 3 3 size. is no local response normalization and no pool-
The second convolutional layer filters the output ing layer. There are 384 cores with a size of
of the first convolutional layer, which is locally 3 3 3 3 256. In the fifth convolutional layer, the
normalized and pooled with 256 cores of size output will be pooled. There are 256 cores with
5 3 5 3 96. In the third convolutional layer, there the size of 3 3 3 3 384. Each of the FCL contains

TABLE 7.1 Layers of AlexNet architecture.


Layer #filters/neurons Filter size Stride Padding Size of feature map Activation function

Input     227 3 227 3 3 

Conv 1 96 11 3 11 4  55 3 55 3 96 ReLU

Max Pool 1  333 2  27 3 7 3 96 

Conv 2 256 535 1 2 27 3 27 3 256 ReLU

Max Pool 2  333 2  13 3 13 3 256 

Conv 3 384 333 1 1 13 3 13 3 384 ReLU

Conv 4 384 333 1 1 13 3 13 3 384 ReLU

Conv 5 256 333 1 1 13 3 13 3 256 ReLU

Max Pool 3  333 2  6 3 6 3 256 

Dropout 1 Rate 5 0.5    6 3 6 3 256 

%% Load AlexNet
%Pretrained model "AlexNet "
net = alexnet();
%% AlexNet training options
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose',true);
[net,info] = trainNetwork(imdsTrain,layers,options);
%% Alexnet Classification
[YPred,scores] = classify(net,imdsTest);% Testing for accuracy
testLabels = imdsTest. Labels;
trainLabels = imdsTrain. Labels;
Accuracy_AlexNet = mean(YPred == testLabels)*100;% Alexnet Classification result

Applications of Artificial Intelligence in Medical Imaging


212 7. Brain stroke detection from computed tomography images using deep learning algorithms

4096 neurons. Additionally, AlexNet is designed 7.3.3 Residual convolutional neural


to classify 1000 objects [19,20]. network
The filters are 11 3 11 in size and the num-
ber of step shifts is 4 as shown in Table 7.1 Residual Network, known as ResNet in
short in the literature, is a type of NN intro-
duced by He et al. [23], in 2015 and titled
7.3.2 GoogleNet “Deep Residual Learning for Image
Recognition.” To solve a mostly complex prob-
Because of the early modules in its construc- lem, some additional layers have been added
tion, GoogleNet is a CNN type with an intricate resulting in improved accuracy and perfor-
design. It contains 22 layers with a 5.7% error rate. mance in deep neural networks. The intuition
Besides, this architecture, a deep CNN, obtained a behind adding more layers is that those layers
high classification performance in competition. It learn more and more complex features. Thus,
is frequently considered as one of the primary in the condition of recognizing samples, the
CNN structures to refrain from stacking convolu- first layer can find out to determine edges, the

%% Load GoogleNet
%Pretrained model "GoogleNet "
net = googlenet();
%% GoogleNet
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose',true);
[net,info] = trainNetwork(imdsTrain,layers,options);
%% GoogleNet Classification
[YPred,scores] = classify(net,imdsTest); % Testing for accuracy
testLabels = imdsTest. Labels;
trainLabels = imdsTrain. Labels;
Accuracy_GoogleNet = mean(YPred == testLabels)*100; % GoogleNet Classification result

tional and accumulated layers sequentially. It also


has a significant situation for stacking all layers second layer can find out to determine tex-
and incorporating assorted filters will involve tures, and likewise, the third layer can find out
time-consuming computations [21,22]. Block dia- to determine objects. However, He et al. exper-
gram of the GoogleNet is shown in Fig. 7.4. imentally showed that there is the highest
threshold for depth with the conventional deep

Applications of Artificial Intelligence in Medical Imaging


7.3 Deep learning methods 213
learning structure. Among the various meth- enhance the last from the former layer to the fol-
ods clarifying why deeper networks do not lowing layer. But from time to time x and FðxÞ
outperform their shallow counterparts, it is will not have the equal dimension. Recall that a
occasionally preferable to seek empirical convolution process generally decreases the spa-
results for clarification and work backward tial resolution of a sample, for example, 3 3 3
from there. Residual block as shown in Fig. 7.5 convolution on a 32 3 32 image consequences in
is a new NN layer where the challenge of train- a 30 3 30 image. The identity mapping is multi-
ing very deep networks was decreased. plied by a linear projection W to extend the
The most important thing to understand from shortcut channels to couple the residual. This
the figure shown above is the identity mapping consents the input x and F(x) to be gathered as
“skip connection.” There are no parameters in input to the next layer as shown in Eq. (7.1).
this identity mapping, and it’s solely used to

%% Load Residual Leaky CNN


%Pretrained model "Residual Leaky CNN"
netWidth = 8;
layers = [
imageInputLayer ([224 224 3],'Name','input')
convolution2dLayer(3,netWidth,'Padding','same','Name','convInp')
batchNormalizationLayer('Name','BNInp')
leakyReluLayer('Name','reluInp')

%% Residual Leaky CNN training options


options = trainingOptions ('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose’, true);
[net_Residual, info] = trainNetwork (imdsTrain, lgraph, options);

%% Residual Leaky CNN classification


[YPred, scores] = classify (net_Residual, imdsTest);
testLabels = imdsTest.Labels;
Accuracy = mean(YPred == testLabels)*100;% Residual Leaky CNN classification result

Applications of Artificial Intelligence in Medical Imaging


FIGURE 7.4 GoogleNet architecture.

y 5 Fðx; fWi gÞ 1 Ws x (7.1) 7.3.4 VGG-16


where FðxÞ and the equation used when x has a VGG-16 is a CNN architecture with 16
different size such as 32 3 32 and 30 3 30. layers that uses a size 3 3 3 filter for all convo-
Here, the term Ws can be applied with 1 3 1 lution layers, making it the smallest size filter.
convolutions. The architecture is fed RGB images at a resolu-
tion of 224 3 224 size. A set of convolutional
layers are used to extract the image’s features.
The convolution stride is 1. After each convolu-
tion operation, the spatial padding of the

%% Load VGG-16
%Pretrained model "VGG-16 "
net = vgg16();
%% VGG-16 training options
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose',true);
[net,info] = trainNetwork(imdsTrain,layers,options);

%% VGG-16 Classification
[YPred,scores] = classify(net,imdsTest);% Testing for accuracy
testLabels = imdsTest. Labels;
trainLabels = imdsTrain. Labels;
Accuracy_VGG-16 = mean(YPred == testLabels)*100;% VGG-16 Classification result
nonlinear activation. Five pooling layers are
responsible for spatial pooling. A 2 3 2 size fil-
ter and stride 2 are used for max-pooling.
Three FCL are constructed after a succession of
convolutional and max-pooling layers. The
softmax layer is the final layer [24]. The archi-
tecture of VGG-16 is shown in Fig. 7.6.

7.3.5 VGG-19
VGG-19 is a 19-layer CNN that has been
pretrained. The model has trained on over a
FIGURE 7.5 The residual learning: a building block. million data which contains roughly 10,000

FIGURE 7.6 VGG-16’s structure.


convolutional layer input must ensure its spatial size. The convolution output is used for

FIGURE 7.7 VGG-19’s structure.


216 7. Brain stroke detection from computed tomography images using deep learning algorithms

%% Load VGG-19
%Pretrained model "VGG-19 "
net = vgg19();
%% VGG-19 training options
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',100, ...
'Momentum',0.8, ...
'InitialLearnRate',1e-4, ...
'L2Regularization',1e-6,...
'Shuffle','every-epoch', ...
'Verbose',true);
[net,info] = trainNetwork(imdsTrain,layers,options);
%% VGG-19 Classification
[YPred,scores] = classify(net,imdsTest); % Testing for accuracy
testLabels = imdsTest. Labels;
trainLabels = imdsTrain. Labels;
Accuracy_VGG-19 = mean(YPred == testLabels)*100;% VGG-19 Classification result

object types. There are 16 convolution layers,


performance metrics, which are obtained from
3 FCL, 5 max pool layers, and a softmax layer
this matrix such as accuracy (ACC), sensitivity
in this VGG-19 pretrained model. The network
(SEN), specificity (SPE), F-score, and receiver
is utilized for feature extraction from the input
operating characteristic (ROC). It comprises
layer to the last max-pooling layer, while the
four components: true positive (TP), true nega-
rest of the network is used for classification
tive (TN), false positive (FP), and false nega-
[25]. The architecture of VGG-19 is given in
tive. Confusion matrix in Fig. 7.8 and
Fig. 7.7.
Eq. (7.27.5) for these components are given
below.

7.4 Experimental results


TP FP
Output class

The experiments were executed on 2.60 GHz


CPU, 16GB RAM, and 2 GB RAM NVIDIA FN TN
GTX960M graphic card hardware in the com-
puter. The simulation environment was Target class
MATLAB (2019a).
FIGURE 7.8 Confusion matrix representation.
To evaluate the proposed model, we have
used the confusion matrix and commonly used

Applications of Artificial Intelligence in Medical Imaging


7.4 Experimental results 217
TP 1 TN 7.4.1 Dataset
ACC 5 (7.2)
TP 1 TN 1 FP 1 FN
The Brain Stroke CT Image Dataset [26] con-
TP tains a total of 2501 CT images of 130 healthy
SEN 5 (7.3)
TP 1 FN (normal) and stroke-diagnosed subjects. All
TN images in the dataset are 650 3 650 pixels and
SPE 5 (7.4) are in JPEG format. A total of 1551 of the
TN 1 FP
images in the dataset belong to healthy people,
2 3 TP
F 2 score 5 (7.5) and 950 of them belong to patients diagnosed
2 3 TP 1 FP 1 FN with stroke. The figures of the brain stroke are
Hereby, TP and TN indicate the sample of true shown in Fig. 7.9, the top of the image depicts
estimated positive and negative samples, whereas images of cases which do not have any prob-
FP and FN show the number of false predicted lem, that is, healthy, and the figures of strokes
positive and negative samples. Additionally, ROC are depicted in the below row.
has been considered to assess the deep model effi- The efficiency of CNN structures was investi-
ciency. The training of the deep models was gated in an experimental investigation for the
achieved in 100 epochs, the mini-batch size was brain stroke images. For the training and testing
32 as can be seen from the given MATLAB code of pretrained models, the dataset is divided into
of each model in Section 7.3. 70% training and 30% testing. The confusion
matrix and ROC curves of the pretrained CNN
models are given in Fig. 7.10 and Fig. 7.11.

FIGURE 7.9 CT images of (A) normal and (B) stroke. CT, Computed tomography.

Applications of Artificial Intelligence in Medical Imaging


218 7. Brain stroke detection from computed tomography images using deep learning algorithms

FIGURE 7.10 Confusion matrix and ROC of AlexNet, GoogleNet, and VGG-19. ROC, Receiver operating characteristic.

Applications of Artificial Intelligence in Medical Imaging


7.4 Experimental results 219

FIGURE 7.11 Confusion matrix and ROC of Residual CNN and VGG-16. ROC, Receiver operating characteristic.

Applications of Artificial Intelligence in Medical Imaging


220 7. Brain stroke detection from computed tomography images using deep learning algorithms

TABLE 7.2 Stroke classification performance All scores of the classifiers pretrained CNN
evaluation results. architectures are reported in Table 7.2. With
CNN models ACC SEN SPE F-score respect to Table 7.2, the highest stroke classifi-
cation performance was reached 97.06% with
AlexNet 94.53% 98.06% 88.77% 93.18%
VGG-19 pretrained CNN model. Besides, the
GoogleNet 92.00% 95.26% 86.66% 90.76% training loss and training accuracy of the
Residual CNN 94.80% 98.06% 89.47% 93.57% model are given in Fig. 7.12.
VGG-16 94.66% 96.98% 90.87% 93.83%
Consequently, the maximum accuracy value
for stroke classification was achieved ACC
VGG-19 97.06% 97.41% 96.49% 96.95%
97.06%, SEN 97.41%, SPE 96.49%, and F-Score

(A) Training loss (B) Training accuracy


FIGURE 7.12 (A) Training loss and (B) training accuracy of CNN model. CNN, Convolutional neural network.

TABLE 7.3 Comparison of the proposed method with previous studies.


References Methods Image datasets Accuracy

Chin et al. (2017) [27] Deep learning (CNN) CT image dataset 90%

Karthik et al. [28] Fully convolutional network (FCN) MRI image dataset with 4,284 samples 70%

Liu et al. [29] Support vector machine (SVM) CT-scan image dataset with 1,157 samples 83.3%

Gaidhani et al. [30] Deep learning models (LeNet and SegNet) MRI scan with 400 samples 96%97%

Badriyah et al. [31] Random Forest CT scan images from 102 patients 95.97%

This study AlexNet CT image dataset with 2501 samples 94.53%


GoogleNet 92.00%
Residual CNN 94.80%
VGG-16 94.66%
VGG-19 97.06%

Applications of Artificial Intelligence in Medical Imaging


References 221
96.95% with VGG-19, respectively. In the transfer learning models. The proposed model
experimental study, it was observed that was tested by operating the Brain Stroke CT
the training of the model was completed in Image Database. As a result of the best classifi-
87 minutes and 42 seconds. In addition, at the cation, the values of accuracy 97.06%, sensitiv-
end of 4400 iterations, it was observed that the ity 97.41%, specificity 96.49%, and 96.95% with
training accuracy became balanced. VGG-19 CNN model were obtained.
In Table 7.3, the conventional ML models,
CNN models, and the proposed method are com-
pared according to the performance criteria, References
which are explained in Section 7.4. Gaidhani et al. [1] M. Nishio, S. Koyasu, S. Noguchi, T. Kiguchi, K.
[30] suggested a technique to classify brain stroke Nakatsu, T. Akasaka, et al., Automatic detection of
MRI samples as healthy and unhealthy. The acute ischemic stroke using non-contrast computed
tomography and two-stage deep learning model,
LeNet CNN was used for stroke classification. Comput. Methods Prog. Biomed. 196 (2020) 105711.
The classification score in the experimental study [2] A. Gautam, B. Raman, Towards effective classification
was obtained in the range of 96%97%. Liu et al. of brain hemorrhagic and ischemic stroke using CNN,
[29], used normally accessible data to estimate Biomed. Signal. Process. Control. 63 (2021) 102178. no.
hematoma expansion in spontaneous intracerebral August 2020.
[3] E. Altunışık, A. Arık, Decreased stroke applications
hemorrhage (ICH). For this purpose, 1157 subjects during pandemic: collateral effects of covid-19, Turk.
with ICH therein were examined. Consequently, Noroloji Derg. 27 (2) (2021) 171175.
an overall accuracy of 83% in the prediction of [4] V. Abramova, A. Clèrigues, A. Quiles, D.G. Figueredo, Y.
hematoma expansion was obtained. Table 7.3 Silva, S. Pedraza, et al., Hemorrhagic stroke lesion seg-
shows that the proposed model’s classification mentation using a 3D U-Net with squeeze-and-excitation
blocks, Comput. Med. Imaging Graph. 90 (2021).
accuracy results in satisfactory results. [5] F. Er, Ş. Yıldırım, İnme sonrası hasta ve Bakım veren-
̇ aile
lerin ̇ ilişki
̇ leri
̇ nin
̇ ̇ lmesi
Değerlendiri ̇ ,̇ Celal Bayar
Üniversitesi Sağlık Bilim. Enstitüsü Derg. 6 (3) (2019)
182189.
7.5 Conclusion [6] U. Rajendra Acharya, K.M. Meiburger, O. Faust, J. En
Wei Koh, S. Lih Oh, E.J. Ciaccio, et al., Automatic
Stroke is a complex condition with around 150 detection of ischemic stroke using higher order spectra
different causes. Early diagnosis of stroke is vitally features in brain MRI images, Cogn. Syst. Res. 58
important. The long-term effects of stroke can be (2019) 134142.
[7] N.K. Al-Qazzaz, Z.A.A. Alyasseri, K.H. Abdulkareem,
minimized with the help of early diagnosis. Deep N.S. Ali, M.N. Al-Mhiqani, C. Guger, EEG feature
learning models can no way be used as a decision fusion for motor imagery: a new robust framework
support system for clinicians. But with its image towards stroke patients rehabilitation, Comput. Biol.
analysis power, it can make a big impact. Med. 137 (August) (2021) 104799.
Computer-aided, especially deep learning-based [8] S.F. Sung, L.C. Hung, Y.H. Hu, Developing a stroke
alert trigger for clinical decision support at emergency
medical image analysis methods have increased triage using machine learning, Int. J. Med. Inform. 152
in recent years. In this chapter, five deep CNN (February) (2021) 104505.
approaches are considered in stroke classification. [9] M. Soltanpour, R. Greiner, P. Boulanger, B. Buck,
With this chapter, a pretrained CNN models Improvement of automatic ischemic stroke lesion seg-
that can distinguish between stroke and nor- mentation in CT perfusion maps using a learned deep
neural network, Comput. Biol. Med. 137 (August)
mal on brain stroke CT images have emerged. (2021) 104849.
The dataset used consists of two classes. The [10] S.A. Peixoto, P.P. Rebouças Filho, Neurologist-level
classification accuracy of VGG-19 CNN model classification of stroke using a Structural Co-
has been superior compared to the other Occurrence Matrix based on the frequency domain,
Comput. Electr. Eng. 71 (August) (2018) 398407.

Applications of Artificial Intelligence in Medical Imaging


222 7. Brain stroke detection from computed tomography images using deep learning algorithms

[11] M. Bento, R. Souza, M. Salluzzi, L. Rittner, Y. Zhang, GoogLeNet encodings, Comput. Biol. Med. 125 (June)
R. Frayne, Automatic identification of atherosclerosis (2020) 103993.
subjects in a heterogeneous MR brain imaging data [22] A. Diker, Sıtma Hastalığının Sınıflandırılmasında
set, Magn. Reson. Imaging 62 (June) (2019) 1827. Evrişimsel Sinir Ağlarının Performanslarının
[12] P.P. Rebouças Filho, R.M. Sarmento, G.B. Holanda, D. de Karşılaştırılması, BEÜ Fen. Bilim. Derg. 9 (4) (2020)
Alencar Lima, New approach to detect and classify 18251835.
stroke in skull CT images via analysis of brain tissue den- [23] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning
sities, Comput. Methods Prog. Biomed. 148 (2017) 2743. for image recognition, in: Proc. IEEE Comput. Soc.
[13] J. Vargas, A. Spiotta, A.R. Chatterjee, Initial experi- Conf. Comput. Vis. Pattern Recognit., vol. 2016Dec,
ences with artificial neural networks in the detection pp. 770778, 2016.
of computed tomography perfusion deficits, World [24] P. Saha, M.S. Sadi, O.F.M.R.R. Aranya, S. Jahan, F.-A.
Neurosurg. 124 (2019) e10e16. Islam, COV-VGX: An automated COVID-19 detection
[14] A. Gautam, B. Raman, Towards effective classification system using X-ray images and transfer learning,
of brain hemorrhagic and ischemic stroke using CNN, Inform. Med. Unlocked 26 (2021) 100741.
Biomed. Signal. Process. Control. 63 (April 2020) [25] S. Agarwal, A. Rattani, C.R. Chowdary, A comparative
(2021) 102178. study on handcrafted features v/s deep features for
[15] R. Kanchana, R. Menaka, Ischemic stroke lesion detec- open-set fingerprint liveness detection, Pattern
tion, characterization and classification in CT images Recognit. Lett. 147 (2021) 3440.
with optimal features selection, Biomed. Eng. Lett. [26] A. Rahman, Brain Stroke CT Image Dataset, Kaggle,
10 (3) (2020) 333344. ,https://www.kaggle.com/afridirahman/brain-
[16] U. Raghavendra, T.-H. Pham, A. Gudigar, V. Vidhya, stroke-ct-image-dataset., 2021. (accessed 10.11.21)
B.N. Rao, S. Sabut, et al., Novel and accurate non- [27] C.L. Chin, B.J. Lin, G.R. Wu, T.C. Weng, C.S. Yang, R.
linear index for the automated detection of haemor- C. Su, et al., An automated early ischemic stroke
rhagic brain stroke using CT images, Complex. Intell. detection system using CNN deep learning algorithm,
Syst. 7 (2) (2021) 929940. in: Proc. - 2017 IEEE 8th Int. Conf. Aware. Sci.
[17] L. Herzog, E. Murina, O. Dürr, S. Wegener, B. Sick, Technol. iCAST 2017, vol. 2018January, no. iCAST,
Integrating uncertainty in deep neural networks for pp. 368372, 2017.
MRI based stroke analysis, Med. Image Anal. [28] R. Karthik, U. Gupta, A. Jha, R. Rajalakshmi, R.
65 (2020) 121. Menaka, A deep supervised approach for ischemic
[18] T. Badriyah, N. Sakinah, I. Syarif, D.R. Syarif, lesion segmentation from multimodal MRI using Fully
Machine Learning Algorithm for Stroke Disease Convolutional Network, Appl. Soft Comput. J.
Classification, 2020 International Conference on 84 (2019) 105685.
Electrical, Communication, and Computer Engineering [29] J. Liu, H. Xu, Q. Chen, T. Zhang, W. Sheng, Q. Huang,
(ICECCE) 1 (5) (2020). Available from: doi:10.1109/ et al., Prediction of hematoma expansion in spontane-
ICECCE49384.2020.9179307. In this issue. ous intracerebral hemorrhage using support vector
[19] J. Chen, Z. Wan, J. Zhang, W. Li, Y. Chen, Y. Li, et al., machine, EBioMedicine 43 (2019) 454459.
Medical image segmentation and reconstruction of [30] B.R. Gaidhani, R. Rajamenakshi, S. Sonavane, Brain
prostate tumor based on 3D AlexNet, Comput. stroke detection using convolutional neural network
Methods Prog. Biomed. 200 (2021) 105878. and deep learning models, in: 2019 2nd Int. Conf.
[20] Ö. İnik, E. Ülker, Deep learning and deep learning Intell. Commun. Comput. Tech. ICCT 2019,
models used in image analysis, Gaziosmanpasa J. Sci. pp. 242249, 2019.
Res. 6 (3) (2017) 85104. [31] T. Badriyah, N. Sakinah, I. Syriaf, Machine Learning
[21] S. Deepak, P.M. Ameer, Retrieval of brain MRI with Algorithm for Classification, J. Phys. Conf. Ser. 1994
tumor using contrastive loss based similarity on (1) (2021) 1213.

Applications of Artificial Intelligence in Medical Imaging


C H A P T E R

8
A deep learning approach for COVID-19
detection from computed tomography
scans
Ashutosh Varshney1 and Abdulhamit Subasi2,3
1
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal,
India 2Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 3Department
of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

8.1 Introduction 224 8.5.1 K-nearest neighbors 230


8.5.2 Support vector machine 231
8.2 Literature review 224
8.5.3 Random Forests 231
8.3 Subjects and data acquisition 225 8.5.4 Bagging 232
8.5.5 AdaBoost 232
8.4 Proposed architecture and transfer
8.5.6 XGBoost 232
learning 225
8.4.1 ResNet 227 8.6 Results and discussions 233
8.4.2 DenseNet 227 8.6.1 Performance evaluation measures 233
8.4.3 MobileNet 228 8.6.2 Experimental results 234
8.4.4 Xception 228 8.6.3 Discussion 237
8.4.5 Visual geometry group (VGG) 229
8.7 Conclusion 238
8.4.6 Inception/GoogLeNet 229
References 238
8.5 COVID-19 detection with deep feature
extraction 230

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00011-6 223 © 2023 Elsevier Inc. All rights reserved.
224 8. A deep learning approach for COVID-19 detection from computed tomography scans

8.1 Introduction are the white glassy patches called the ground-
glass opacity [3]. Our intuition therefore was to
Coronavirus disease 2019 (COVID-19) has make use of CNNs to identify these unique and
affected more than 17 million people all around COVID-specific features in the CT scans which
the world and caused 681 K deaths worldwide, might not be distinguishable through the naked
as of August 2 in 2020 [1]. The estimated viral eye. Hence, the purpose of our research was to
reproduction number tells that an infected indi- study the diagnostic performance of various deep
vidual can transmit this deadly disease to around learning models using CT images to screen for
2.5 noninfected individuals [1] with low immu- COVID-19.
nity, indicating a high risk of massive spread of
the disease. Therefore it is crucial to have fast
testing of suspected individuals as early as possi- 8.2 Literature review
ble for quarantine and treatment purposes.
The major problem in disease control is the The recent advances in medical imaging and
lack of sufficient test kits available for testing. disease prediction have witnessed a great rise in
The current tests are mostly based on reverse machine learning methods. But these methods, to
transcription-polymerase chain reaction (RT- work accurately, require features to build upon
PCR). The RT-PCR tests are not very accurate which are themselves not very easily extractable.
and might sometimes give false-positive results. Therefore there has been high amount of prog-
It is reported that many “suspected” cases with ress in deep learning models because they can
typical clinical characteristics of COVID-19 and extract features or can make use of some pretrain-
identical specific computed tomography (CT) ing beforehand [4] and the whole process can be
images were not diagnosed [2]. The test takes molded into one single step.
around 6 hours which is very slow compared to There have been several previous attempts of
the disease spreading rate. Thus the shortage of using deep learning methods for detecting
RT-PCR kits and their inaccurate results moti- COVID-19 from chest CT and X-Ray scans. Shin
vates us to study an alternative testing proce- et al. [5] used deep CNN to classify the intersti-
dure, which can be made widely available and tial lung disease in CT images. Pezeshk et al. [6]
is faster, cheaper, and more feasible than RT- used 3D CNN to detect the pulmonary nodules
PCR, in particular, CT scans. in chest CT. Li et al. [7] developed a 3D deep
The CT images of various viral pneumonia learning architecture COVNet for COVID-19
and other lung diseases are more or less similar detection. Panwar et al. [8] used a Grad-CAM
to that of COVID-19, therefore it becomes difficult based color visualization along with a transfer
for radiologists to diagnose the disease as well as learning approach for COVID-19 detection.
to distinguish it from other viral pneumonia [3]. Meng et al. [9] showed that CT scans of patients
There are several artificial intelligence techniques with COVID-19 have definite characteristics.
developed to extract shape and spatiotemporal Das et al. [10] also tried to use a transfer learning
features from images and use it for disease pre- approach using Xception architecture. Similarly,
diction. There has been recent development in Lalmuanawma et al. [11] also showed that
medical imaging techniques using deep learning, machine learning and AI-based methods can
especially convolutional neural network (CNN). prove to be helpful in automatic detection of
Several features are used for identifying viral COVID-19. Ardakani et al. [12] explored the pos-
pathogens on the basis of imaging patterns, sibility of CNNs being used, whereas Panwar
which are associated with their specific pathogen- et al. [13] proposed nCOVnet which makes use
esis. The main identifying features of COVID-19 of X-Ray scans for the classification.

Applications of Artificial Intelligence in Medical Imaging


8.4 Proposed architecture and transfer learning 225
Fan et al. [14] proposed a semisupervised 8.3 Subjects and data acquisition
COVID-19 Lung Infection Segmentation Deep
Network (Inf-Net). Minaee et al. [15] also The dataset used for the experiments com-
worked in the direction of X-Rays for the detec- prised 1252 CT scan images of COVID-19-
tion and prepared a rich dataset of X-Ray infected patients and 1229 CT scan images of
scans. Brunese et al. [16] also show that imag- non-COVID patients. It was divided into 60%
ing techniques such as X-Rays can prove to be training, 20% validation, and 20% test sets and
beneficial for the disease detection. Nour et al. the images were normalized. The image size
[17] used simple shallow CNNs instead of pre- used was 200 3 200 3 3. These data had been
trained networks for COVID-19 diagnosis. collected from real patients in hospitals from
Tuncer et al. [18] tried residual exemplar LBP Sao Paulo, Brazil. Images from the dataset are
and iterative ReliefF, while Hassantabar et al. given in Fig. 8.1, in which the left one is the CT
[19] used CNNs on lung X-Rays for the detec- scan of a COVID-19 patient, while the right
tion of infected tissue. one is the CT scan of a non-COVID patient.
Similar experiments were carried out by
Mahmud et al. [20] where they used an archi-
tecture with depthwise dilated convolutions.
Alakus et al. [21] also provide a comparative 8.4 Proposed architecture and transfer
study of various deep learning approaches. learning
Curvelet transform was also one of the techni-
ques which was used by Altan and Karasu We experimented with eight different CNNs
[22]. Similar experiments were carried out by pretrained on ImageNet as our base models. The
Sufian et al. [23] using transfer learning. top layer of all the pretrained models was
Ozturk et al. [24] proposed heatmaps, which removed. For transfer learning, the “number of
can help radiologists to locate the affected features” of the final fully connected layer was
regions in chest X-Rays. Khan et al. [25] used kept 2. The models were trained using categorical
Xception architecture, whereas Shaban et al. cross entropy loss and Adam optimizer with a
[26] used K-Nearest Neighbor (K-NN) classifier learning rate of 0.002 and batch size of 64 for 50
for the same task. epochs. The dataset was shuffled at every epoch.

FIGURE 8.1 Images from dataset.

Applications of Artificial Intelligence in Medical Imaging


226 8. A deep learning approach for COVID-19 detection from computed tomography scans

The same training and validation datasets were passed through a convolutional layer and then
selected for all networks to facilitate the perfor- through a pretrained model. The output is then
mance comparison of networks. Fig. 8.2 depicts passed through a series of pooling, batch normal-
our proposed architecture. The input image is ization, and dense layers to get the final output.

def select_pretrained_model(name):
# function to select one of the pre-trained ImageNet models
pretrainedbase = None
if name == "DenseNet121":
pretrainedbase = DenseNet121(weights='imagenet', include_top=False)
if name == "VGG16":
pretrainedbase = VGG16(weights='imagenet', include_top=False)
if name == "VGG19":
pretrainedbase = VGG19(weights='imagenet', include_top=False)
if name == "ResNet50":
pretrainedbase = ResNet50(weights='imagenet', include_top=False)
if name == "InceptionV3":
pretrainedbase = InceptionV3(weights='imagenet', include_top=False)
if name == "InceptionResNetV2":
pretrainedbase = InceptionResNetV2(weights='imagenet', include_top=False)
if name == "Xception":
pretrainedbase = Xception(weights='imagenet', include_top=False)
if name == "MobileNet":
pretrainedbase = MobileNet(weights='imagenet', include_top=False)
return pretrainedbase

def build_model(pretrainedbase, num_features):


# function to build the complete model
input = Input(shape=(200,200,3))
x = Conv2D(3, (3, 3), padding='same')(input)
x = pretrainedbase(x)
x = GlobalAveragePooling2D()(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
output = Dense(num_features,activation = 'softmax', name='root')(x)
model = Model(input,output)
return model

def transfer_learning(modelname):
# function to train model using transfer learning
base = select_pretrained_model(modelname)
model = build_model(base,2)
optimizer = Adam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=0.1, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=optimizer,
metrics=['accuracy']) model.summary()
checkpoint = ModelCheckpoint('model.h5', verbose=1, save_best_only=True)
history = model.fit(X_train, Y_train, validation_split=0.25, epochs=50,
batch_size=64, verbose=2,callbacks=[checkpoint])
return history

Applications of Artificial Intelligence in Medical Imaging


8.4 Proposed architecture and transfer learning 227

Output

CT Scans
Output
Pre-trained Model Layer

Fully
Connected
Layer

FIGURE 8.2 Proposed architecture.

8.4.1 ResNet 8.4.2 DenseNet


Residual blocks help in tackling the vanishing In a traditional deep learning architecture, a
gradient problem without compromising on the series of operations, which may include activa-
number of hidden layers of the architecture being tions such as ReLU, softmax, or operations
used. Residual networks preserve what they have such as convolutions, pooling, or batch normal-
learned and keep on adding to the preserved ization are applied at each layer. The equation
weights by using an identity mapping that is for this would be:
added to the activations after one step. Thus the
xl 5 Hl ðxl21 Þ
network keeps on adding whenever it has learnt
something new to its previous experience. ResNets extended this behavior including the
The residual block first creates a copy of the skip connection, reformulating this equation into:
weights x and preserves it to be used in the skip
connection. Then an activation is applied to the xl 5Hl ðxl21 Þ1xl21
weights x to get F(x). After the activation, the DenseNets differ from ResNets mainly at this
preserved weights x through the skip connection step. Instead of summing up the history of the
is added to F(x) which completes the basic work- weights learned, it is concatenated together.
ing of a residual block [27]. ResNet is made up Consequently, the equation transforms again into:
of several such residual blocks and are very
deep and have proven to be very successful for xl 5 Hl ð½x0 ; x1; . . . :; xl21 Þ
image recognition and other such tasks.

history = transfer_learning("ResNet50") ####change the model name here


model = load_model('./model.h5')
final_loss, final_accuracy = model.evaluate(X_test, Y_test)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))
plot_model_accuracy()
plot_model_loss()

Applications of Artificial Intelligence in Medical Imaging


228 8. A deep learning approach for COVID-19 detection from computed tomography scans

DenseNets comprise dense blocks which are of output feature maps with very few convolu-
based upon this idea of collective knowledge. tion operations. The growth rate k is also kept
Since the feature maps are getting concatenated, low which further optimizes the computation
the channel dimension keeps on increasing at and makes it much more memory efficient [29].

history = transfer_learning("MobileNet") ####change the model name here


model = load_model('./model.h5')
final_loss, final_accuracy = model.evaluate(X_test, Y_test)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))
plot_model_accuracy()
plot_model_loss()

each layer. If we make Hl to produce k feature 8.4.4 Xception


maps every time, then for the lth layer:
Xception just like MobileNet makes use of
kl 5 k0 1 kðl 2 1Þ depthwise separable convolutions to make
lightweight deep learning networks. It was
This hyperparameter k, called the growth
developed by Google researchers. Inception
rate, is a measure of the amount of information
that is being added to the network in the form
of feature maps at each layer [28].

history = transfer_learning("DenseNet121") ####change the model name here


model = load_model('./model.h5')
final_loss, final_accuracy = model.evaluate(X_test, Y_test)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))
plot_model_accuracy()
plot_model_loss()

8.4.3 MobileNet
module can be summarized as a depthwise
MobileNets are based on a streamlined archi- convolution followed by a pointwise convolu-
tecture that uses depth-wise separable convolu- tion. Thus the depthwise separable convolution
tions to build lightweight deep neural networks. used here is just an Inception module with
They are known to be memory efficient and large number of towers. The data first goes
lightweight because of a smaller number of para- through the entry flow, then through the mid-
meters to be trained upon. They make use of dle flow, which is repeated eight times and
dense blocks efficiently by having large number finally through the exit flow [30].

Applications of Artificial Intelligence in Medical Imaging


8.4 Proposed architecture and transfer learning 229
history = transfer_learning("Xception") ####change the model name here
model = load_model('./model.h5')
final_loss, final_accuracy = model.evaluate(X_test, Y_test)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))
plot_model_accuracy()
plot_model_loss()

8.4.5 Visual geometry group (VGG) to use the convolutional operation. For exam-
ple, in AlextNet, we have the convolutional
The visual geometry group (VGG) architecture operation and max-pooling operation follow-
is very simple to understand. The input image is ing each other, whereas in VGGNet, we have
of size 224 3 224 and is then passed through a three convolutional operations in a row and
series of five convolution blocks consisting of then one max-pooling layer. Thus the idea
3 3 3 kernels with stride 1 and ReLU activations. behind GoogLeNet is to use all the operations
The convolution blocks are followed by max- at the same time. It computes multiple kernels
pooling. Finally, it is passed through three fully of different size over the same input map in
connected layers with last one having a dimen- parallel, concatenating their results into a sin-
sion of 1000 corresponding to 1000 different image gle output. The intuition is that convolution
classes [31]. VGG architectures have also proved
to be very successful in ImageNet challenge.
history = transfer_learning("VGG19") ####change the model name here
model = load_model('./model.h5')
final_loss, final_accuracy = model.evaluate(X_test, Y_test)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))
plot_model_accuracy()
plot_model_loss()

8.4.6 Inception/GoogLeNet
In most of the standard network architec- filters of different sizes will handle objects at
tures, the intuition is not clear why and when multiple scale better. This is called an
to perform the max-pooling operation, when Inception module [32].

history = transfer_learning("InceptionV3") ####change the model name here


model = load_model('./model.h5')
final_loss, final_accuracy = model.evaluate(X_test, Y_test)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))
plot_model_accuracy()
plot_model_loss()

Applications of Artificial Intelligence in Medical Imaging


230 8. A deep learning approach for COVID-19 detection from computed tomography scans

K-NN
SVM
RF
Bagging
AdaBoost
XGBoost

CT Scan

Feature Extraction Classification

FIGURE 8.3 COVID-19 detection using deep feature extraction and conventional machine learning.

8.5 COVID-19 detection with deep 8.5.1 K-nearest neighbors


feature extraction The K-NN algorithm is very simple. K
objects that are nearest to the test object are
We also tried out with six different machine
found using some distance metric from the
learning classifiers on the extracted features
training set. Then, the class of the test object is
from our proposed architecture. For K-NN clas-
decided by a majority vote count of the K
sifier, the optimal value of K was found to be 3
neighbors found. The choice of the distance
after fine-tuning. Similarly, the number of esti-
metric is another important consideration. The
mators used in Random Forest, AdaBoost,
metric is selected such that the smaller the dis-
XGBoost, and Bagging was found to be 1000.
tance between two objects, the more is the
The learning rate for XGBoost was kept 1. The
likelihood of them having the same class.
num_features of the final fully connected layer
Thus, for example, if K-NN is being applied to
were kept 256 (Fig. 8.3).

def featureextraction(modelname):
base = select_pretrained_model(modelname)
model = build_model(base,500)
X_T = np.array([model.predict(i.reshape(1,200,200,3)).reshape(500) for i in X_Train])
return X_T

Applications of Artificial Intelligence in Medical Imaging


8.5 COVID-19 detection with deep feature extraction 231
classify documents, then it may be better to #Import SVM Model
use the cosine similarity rather than Euclidean from sklearn import svm
#Create the Model
distance [33].
clf = make_pipeline(StandardScaler(), SVC())
#Train the model with Training set
#Import k-NN Model clf.fit(X_train, Y_train)
from sklearn.neighbors import KNeighborsClassifier #Test the model with Test set
#Create the Model Y_pred = clf.predict(X_test)
clf = KNeighborsClassifier(n_neighbors=3) #Print performance
#Train the model with Training Dataset print(clf.score(X_train,Y_train))
clf.fit(X_train,Y_train) print(clf.score(X_val,Y_val))
#Test the model with Testset print(clf.score(X_test,Y_test))
Y_pred = clf.predict(X_test) print_confusion_matrix(Y_test,Y_pred)
#Print performance print_scores(Y_test,Y_pred)
print(clf.score(X_train,Y_train))
print(clf.score(X_val,Y_val))
print(clf.score(X_test,Y_test))
print_confusion_matrix(Y_test,Y_pred)
print_scores(Y_test,Y_pred) 8.5.3 Random Forests
Random Forests as the name suggest is an
ensemble machine learning algorithm where sev-
eral decision trees are combined together to form
8.5.2 Support vector machine a “forest.” The main idea behind the algorithm is
that these trees are uncorrelated from each other
Support vector machine (SVM) is one of the and thus overall, they reduce the error percentage.
most robust and accurate methods among all Decision trees work by splitting the data at each
well-known machine learning algorithms. The step and thus selecting the most favorable splits
main aim of SVM is to find a function that can that lead to correct classification. Also, the dataset
differentiate the two different classes accu- for each tree is forced to be selected from a ran-
rately in a binary classification problem. For a dom subset of features, thus enforcing more ran-
linearly separable dataset, a linear classification domness and generalization and thereby reducing
function can be visualized as a separating the collective error. Thus this idea of combining
hyperplane that passes through the middle of predictions from several decision trees increases
the two classes, separating the two. Since there the overall accuracy on the dataset [34,35].
can be many such hyperplanes, SVM helps in
#Import Random Forest Ensemble Model
deciding the hyperplane which maximizes the
from sklearn.ensemble import RandomForestClassifier
separation between the two classes. This sepa- #Create the Model
ration is also called the margin sometimes. The clf = RandomForestClassifier(n_estimators=100)
margin can be defined as the distance between #Train the model with Training set
clf.fit(X_train,Y_train)
the two objects of different classes as classified
#Test the model with Test set
by the classifier that are nearest to each other. Y_pred = clf.predict(X_test)
This definition of margin enforces the classifier #Print performance
to select the best boundary among all the possi- print(clf.score(X_train,Y_train))
ble candidates. This helps in making the classi- print(clf.score(X_val,Y_val))
print(clf.score(X_test,Y_test))
fier much more generalized to unseen data as print_confusion_matrix(Y_test,Y_pred)
the margin is set to maximum possible [33]. print_scores(Y_test,Y_pred)

Applications of Artificial Intelligence in Medical Imaging


232 8. A deep learning approach for COVID-19 detection from computed tomography scans

8.5.4 Bagging 8.5.5 AdaBoost


The decisions of different models can be com- AdaBoost, which is the short form of
bined into a single prediction to get better results Adaptive Boosting, is a very simple yet elegant
which is done in various ensemble methods. The ensemble learning technique in which weights
most obvious way is to take a majority vote are assigned to the data points according to the
count in case of classification or just taking the prediction of the classifier. Initially, all the data
average of all model predictions if a numeric points are assigned equal weights and the clas-
output is desired. In Bagging the models have sifier is used to make the prediction. Now, the
equal weight; while in boosting weighting is predictions are compared with the ground truth
used to achieve more impact to the more effec- labels and the weights for the correct predic-
tive ones depending on how successful their pre- tions are decreased while those for the incorrect
dictions were in the past. In Bagging, numerous ones are increased. In the next iteration, the
training datasets of the same size are selected at classifier is built upon this newly weighted
random from the problem domain to build a dataset. This time, the difference is that since
decision tree for each dataset. These trees are the earlier misclassified data points have been
practically identical to make the same prediction given more weights, the classifier is a bit “hard-
for each new test instance. But this hypothesis is er” to them as compared to the data points with
generally not true, especially if the training data- low weight. After a series of iterations of this
sets are fairly small. This inevitably indicates that algorithm, some hard instances become harder,
there are test instances for the prediction of some while some of them become correct which var-
of the models which were not correct. Bagging ies from model to model but overall, the ensem-
tries to eliminate the instability of machine learn- ble method has its accuracy increased. Since the
ing methods by sampling the instances randomly algorithm adapts to the dataset by assigning
with replacement from the original dataset to and modifying weights according to the current
create a new one of the same size. This sampling predictions, it is called Adaptive Boosting [36].
procedure unavoidably deletes some of the
#Import Adaboost ensemble model
examples and duplicates the rest. So, Bagging from sklearn.ensemble import AdaBoostClassifier
uses each one of these resultant datasets in the #Create an Adaboost Ensemble Model
learning algorithm, and the outputs generated clf = AdaBoostClassifier(n_estimators=1000)
from them vote for the class to be predicted [36]. #Train the model using the training set
clf.fit(X_train,Y_train)
#Import Bagging Ensemble Model #Predict the response for test set
from sklearn.ensemble import BaggingClassifier Y_pred = clf.predict(X_test)
from sklearn import tree #Print performance
#Create a Bagging Ensemble Model print(clf.score(X_train,Y_train))
clf = BaggingClassifier(n_estimators=1000)
print(clf.score(X_val,Y_val))
#Train the model using the training set
print(clf.score(X_test,Y_test))
clf.fit(X_train,Y_train)
print_confusion_matrix(Y_test,Y_pred)
#Predict the response for test set
print_scores(Y_test,Y_pred)
Y_pred = clf.predict(X_test)
#Print performance
print(clf.score(X_train,Y_train))
print(clf.score(X_val,Y_val))
print(clf.score(X_test,Y_test))
8.5.6 XGBoost
print_confusion_matrix(Y_test,Y_pred)
print_scores(Y_test,Y_pred) XGBoost, also known as gradient boosting,
is a much recent ensemble boosting algorithm

Applications of Artificial Intelligence in Medical Imaging


8.6 Results and discussions 233
just like AdaBoost. The main difference lies in higher the accuracy, the more correct it is. But to
the way it boosts the predictions. Instead of assess the actual performance of the classifier, the
assigning weights according to the predicted accuracy on the training set is not a good mea-
values, the algorithm moves forward on the sure. Therefore the need of a test set comes into
results of a loss function. The loss function cal- picture, which is also a representative of the same
culates the loss value using the current predic- dataset but which the classifier has never seen
tions and the ground truth labels. The before. More formally, the training and test sets
algorithm tries to minimize this loss function are totally disjoint. The classifier is trained on the
by computing the gradients with respect to it training set to find a good classifier for the prob-
and taking one step in its suggested direction. lem and then its performance is evaluated by
The steps are taken according to a learning measuring its accuracy on the test set. Sometimes,
rate, which is a hyperparameter for the algo- the dataset is divided into three disjoint sets called
rithm. After repeated iterations, the algorithm train, test, and validation. Here, the validation set
has its accuracy increased with the loss being is used to fine-tune the hyperparameters involved
settled to its minimal value [37]. in the training of the classifier, while the purpose

% pip install xgboost


#Import XGBoost ensemble model
from xgboost import XGBClassifier
# Create XGB model
clf = GradientBoostingClassifier(n_estimators=1000,learning_rate=1)
#Train the model using the training set
clf.fit(X_train,Y_train)
# make predictions for test data
Y_pred = clf.predict(X_test)
#Print performance
print(clf.score(X_train,Y_train))
print(clf.score(X_val,Y_val))
print(clf.score(X_test,Y_test))
print_confusion_matrix(Y_test,Y_pred)
print_scores(Y_test,Y_pred)

8.6 Results and discussions


of the rest two remains the same. The training set
8.6.1 Performance evaluation measures is kept the largest to facilitate good training, while
the test set is kept small [36].
Performance on the training set is definitely
not a good indicator of performance on an inde-
pendent test set. The question of predicting per- 8.6.1.1 F1 measure and confusion matrix
formance based on limited data is an interesting, The true positives (TP) and true negatives
and still controversial, one. The classifier’s perfor- (TN) refers to the number of samples for which
mance is measured in terms of the number of the classification was correct. A false positive
samples for which the prediction was correct. (FP) is when the outcome is incorrectly pre-
This gives rise to accuracy of the classifier, the dicted as yes (or positive) when it is actually

Applications of Artificial Intelligence in Medical Imaging


234 8. A deep learning approach for COVID-19 detection from computed tomography scans

no (negative). A false negative (FN) is when classified just by chance by the classifier using
the outcome is incorrectly predicted as nega- the concepts of expectation [36]. The kappa sta-
tive when it is actually positive [36]. tistic (κ) always considers the predictions to be
Information retrieval researchers define para- by chance and probabilistic. The probabilities of
meters called recall and precision: getting a correct prediction by chance and confi-
dence are calculated using the concept of expec-
TP
Recall 5 tation of a random variable. The kappa statistic
TP 1 FN is the most frequently used statistic for the eval-
TP uation of categorical data when there is no inde-
Precision 5
TP 1 FP pendent means of assessing the probability of
Then F1 measure can be formulated as: chance agreement between two or more obser-
vers. A kappa value of 0 designates agreement
2  Precision  Recall equivalent to chance, whereas a kappa value of
F1 5
Precision 1 Recall 1 designates perfect agreement [38,39]. Cohen
Another important measure is the confusion [40] defined the kappa statistic as an agreement
matrix which can be formulated as: index and defined as the following:
  P0 2 Pe
TP FP K5
Confusion Matrix 5 1 2 Pe
FN TN
Where P0 is observed agreement and Pe mea-
sures the agreement expected by chance [41].
8.6.1.2 Receiver operating characteristic
(ROC) analysis
Receiver operating characteristic (ROC) curves
8.6.2 Experimental results
depict the performance of a classifier without
taking into account the actual error rate or cost. 8.6.2.1 Transfer learning
The curve is plotted by plotting “true positive Table 8.1 shows the experiment results for
rate” on the y-axis and “true negative rate” on transfer learning performed on various pre-
the x-axis. Formally: trained CNNs using our proposed architecture.
TP Rate 5 100 3 TP=ðTP 3 FN Þ We were able to achieve a high-test accuracy of

FP Rate 5 100 3 FP=ðFP 3 TN Þ TABLE 8.1 Transfer learning experiment results.


The area under the ROC curve (AUC) is a Model Accuracy F1 Score Kappa AUC
measure of the probability that the classifier
DenseNet121 97.58% 0.973 0.947 0.974
ranks a positively predicted instance in front of
a negatively predicted instance. Therefore this VGG16 95.97% 0.971 0.943 0.971
area is sometimes used as a performance met- ResNet50 95.37% 0.953 0.921 0.953
ric in various classification tasks. Several meth-
InceptionResNetV2 96.17% 0.945 0.891 0.946
ods are commonly employed for computing
the area under the ROC curve [36]. InceptionV3 95.97% 0.972 0.944 0.972
MobileNet 96.17% 0.947 0.895 0.947
8.6.1.3 Kappa statistic Xception 95.77% 0.941 0.883 0.941
Cohen’s kappa statistic is a measure which
VGG19 98.30% 0.982 0.964 0.982
takes into account the samples that were

Applications of Artificial Intelligence in Medical Imaging


8.6 Results and discussions 235
98.30% with VGG19. F1 score, AUC, and 8.6.2.2.2 Support vector machine
Cohen’s kappa were also found to be the high- We observed that the results for the SVM
est with VGG19 with values of 0.982, 0.982, classifier were slightly weaker as compared to
and 0.964, respectively. It should also be noted K-NN classifier for VGG16 as the accuracy
that all the models were able to yield test accu- decreased from 91.75% to 88.93%, whereas it
racy of more than 95% with F1 score greater was slightly better for VGG19 and ResNet50
than 0.94, AUC area greater than 0.94, and as it showed an increase. SVM classifier
kappa value greater than 0.88.

def print_scores(Y_test,Y_pred):
print("Confusion Matrix: ", confusion_matrix(Y_test,Y_pred))
print("F1 score: ", f1_score(Y_test,Y_pred))
print("Kappa: ", cohen_kappa_score(Y_test,Y_pred))
print("ROC area: ", roc_auc_score(Y_test,Y_pred))

8.6.2.2 Feature extraction with pretrained


models proved to be the best for VGG19 and
K-NN, SVM, Random Forest, AdaBoost, ResNet50 as the test accuracy of 87.92% and
XGBoost, and Bagging classifiers were applied 85.91% was the highest for them among all
to features extracted from each pretrained other methods. VGG16 achieved the highest
model using the proposed architecture as the test accuracy of 88.93% in case of SVM. The F1
second part of our experiments. The dimension score of 0.888, AUC of 0.889, and kappa value
of the extracted features was kept 256 after of 0.778 were also highest for VGG16. It is also
fine-tuning. worthy to mention that the performance of
MobileNet and DenseNet121 was almost as
good as VGG19 (Table 8.3).
8.6.2.2.1 K-nearest neighbors
K-NN algorithm is one of the simplest clas-
sification algorithms. Even with such simplic-
TABLE 8.2 Feature extraction along with K-NN
ity, it gave highly competitive results. The classifier.
highest test accuracy achieved with the K-NN
classifier was 91.75% with VGG16. The F1 Accuracy F1 score Kappa AUC
score, AUC, and kappa for VGG16 were 0.916, MobileNet 87.93% 0.873 0.758 0.879
0.917, and 0.835 which are also pretty high
Xception 82.89% 0.828 0.658 0.829
compared to other architectures. Except for
InceptionV3 and InceptionResNetV2, all other InceptionV3 78.67% 0.791 0.574 0.787
models were able to achieve a test accuracy of InceptionResNetV2 78.87% 0.785 0.577 0.788
more than 80%, F1 score greater than 0.82,
VGG19 87.73% 0.875 0.745 0.877
AUC greater than 0.82, and kappa value
greater than 0.65. The performance of VGG16 91.75% 0.916 0.835 0.917
MobileNet, VGG19, and DenseNet121 were DenseNet121 86.31% 0.86 0.762 0.863
good and comparable to our best performing
ResNet50 83.90% 0.832 0.678 0.838
candidate VGG16 (Table 8.2).

Applications of Artificial Intelligence in Medical Imaging


236 8. A deep learning approach for COVID-19 detection from computed tomography scans

TABLE 8.3 Deep feature extraction along with SVM TABLE 8.4 Deep feature extraction along with
classifier. Random Forest classifier.
Accuracy F1 score Kappa AUC Accuracy F1 score Kappa AUC

MobileNet 86.92% 0.869 0.738 0.869 MobileNet 81.69% 0.8169 0.634 0.817
Xception 79.07% 0.786 0.581 0.79 Xception 77.86% 0.786 0.558 0.779
InceptionV3 75.85% 0.763 0.517 0.758 InceptionV3 67.80% 0.676 0.356 0.678
InceptionResNetV2 78.87% 0.789 0.578 0.788 InceptionResNetV2 78.47% 0.784 0.569 0.784
VGG19 87.92% 0.882 0.758 0.879 VGG19 81.08% 0.816 0.622 0.811
VGG16 88.93% 0.888 0.778 0.889 VGG16 84.31% 0.846 0.686 0.843
DenseNet121 85.31% 0.853 0.706 0.853 ResNet50 85.11% 0.85 0.702 0.851
ResNet50 85.91% 0.864 0.718 0.859 DenseNet121 81.08% 0.811 0.622 0.811

8.6.2.2.3 Random Forest TABLE 8.5 Deep feature extraction along with
In the case of Random Forest, ResNet50 per- AdaBoost.
formed the best with an accuracy of 85.11%, beat- Accuracy F1 score Kappa AUC
ing VGG16 by about a difference of 0.8% which
MobileNet 77.66% 0.782 0.553 0.777
is not a large margin. Similarly, the F1 score,
AUC, and kappa values were also greater in case Xception 74.04% 0.746 0.481 0.741
of ResNet50 compared to VGG16 by a very small InceptionV3 64.19% 0.645 0.284 0.642
margin. InceptionV3 and InceptionResNetV2
InceptionResNetV2 73.84% 0.734 0.476 0.738
showed extremely poor compared to others, fol-
lowing the trend of the results of the K-NN and VGG19 80.28% 0.797 0.605 0.803
SVM classifier. Similar to SVM, the performance VGG16 82.49% 0.825 0.649 0.825
of MobileNet, DenseNet121, and VGG19 was
DenseNet121 78.67% 0.784 0.574 0.786
also very good with the test accuracy being
greater than 80% for all the three (Table 8.4). ResNet50 78.87% 0.791 0.577 0.788

8.6.2.2.4 AdaBoost and XGBoost


AdaBoost and XGBoost are the two famous
boosting algorithms that are used widely. performance of XGBoost was better than
AdaBoost is sequentially growing decision trees AdaBoost with a margin of 2%5% for every
as weak learners and punishing incorrectly pre- model except DenseNet121, which clearly shows
dicted samples by assigning a more significant that XGBoost is better at classifying the CT scan
weight to them after each round of prediction. features obtained from our proposed model. For
This way, the algorithm is learning from previous both the classifiers, VGG16 performed the best
mistakes. On the other hand, in XGBoost, also with a test accuracy of 82.49% and 85.71%,
known as gradient boosting, the boosting is done respectively. The F1 score, ROC AUC, and kappa
along with the gradient changes. The compared values also show the same pattern. Following
results of AdaBoost and XGBoost are shown in the trends, for both classifiers we again observed
Table 8.5 and 8.6. It was observed that the that ResNet50, VGG19, and MobileNet gave

Applications of Artificial Intelligence in Medical Imaging


8.6 Results and discussions 237
TABLE 8.6 Deep feature extraction along with TABLE 8.7 Deep feature extraction along with
XGBoost. Bagging.

Accuracy F1 score Kappa AUC Accuracy F1 score Kappa AUC

MobilNet 81.08% 0.816 0.622 0.811 MobileNet 80.88% 0.809 0.6177 0.809

Xception 77.06% 0.771 0.541 0.771 Xception 77.86% 0.782 0.557 0.778
InceptionV3 68.41% 0.688 0.368 0.684 InceptionV3 68.61% 0.691 0.372 0.686
InceptionResNetV2 74.64% 0.746 0.493 0.746 InceptionResNetV2 78.47% 0.787 0.569 0.785
VGG19 82.49% 0.821 0.649 0.825 VGG19 82.89% 0.828 0.658 0.829
VGG16 85.71% 0.857 0.714 0.857 VGG16 85.51% 0.858 0.71 0.855
DenseNet121 77.46% 0.776 0.549 0.775 DenseNet121 78.87% 0.793 0.577 0.788

ResNet50 83.90% 0.839 0.678 0.839 ResNet50 80.48% 0.804 0.609 0.805

comparable performance. VGG19 was the ResNet50 gave great results with the maximum
second-best performer after VGG16. It should be value of 87.92% and 85.91% with SVM, respec-
noted that both these methods could not beat K- tively. They both lagged behind VGG16 by only
NN, SVM, and Random Forest as the obtained a margin of 2%4% for all the classifiers used.
scores were lesser compared to them. InceptionV3 performed the worst, whereas the
performance of MobileNet and DenseNet121
8.6.2.2.5 Bagging was comparable to VGG16 and VGG19.
Table 8.7 shows the results of the Bagging clas-
sifier, which proved to be on average with the
8.6.3 Discussion
XGBoost algorithm. The results for all the models
were almost similar with no substantial changes Our proposed methods and architecture beat
observed in the test accuracies. The best test accu- various other deep learning models. Our trans-
racy of 85.51% obtained is for VGG16 compared fer learning approach yielded an accuracy of
to a value of 85.71% for XGBoost. The F1 score, 98.30% compared to an accuracy of 95.12% by
ROC AUC, and kappa are almost same for the DeTrac [42] and 96% by Bai et al. [43]. Our
cases. The results show that both Boosting and model also performed better on the AUC, with
Bagging algorithms are equally efficient in classi- a value of 0.982 compared to 0.96 by Li et al. [7].
fying the extracted features. Similarly, our machine learning-based approach
For almost all the machine learning classi- yielded an accuracy of 91.75%, which is by far
fiers, VGG16 proved to give the highest test better than the deep learning approaches, such
accuracy among all the models. The highest as accuracy of 90.1% by Zheng et al. [44]. AUC
value obtained was 91.75% with the K-NN clas- obtained by using SVM on our feature extractor
sifier. The F1 score, AUC, and Cohen’s kappa is 0.889, which is better than 0.862 reported by
were also the highest in this case. Among the Mei et al. [45], which shows that our proposed
machine learning classifiers, K-NN seemed to architecture is better at extracting features.
work the best followed by SVM, XGBoost, These results support other unique feature
Bagging, Random Forest, and AdaBoost, respec- extraction-based methods [46,47] and novel
tively. It should also be noted that VGG19 and techniques [48,49] in this area.

Applications of Artificial Intelligence in Medical Imaging


238 8. A deep learning approach for COVID-19 detection from computed tomography scans

8.7 Conclusion [6] A. Pezeshk, S. Hamidian, N. Petrick, B. Sahiner, 3-D


convolutional neural networks for automatic detection
of pulmonary nodules in chest CT, IEEE J. Biomed.
A deep learning approach for COVID-19 Health Inform. 23 (5) (2019) 20802090. Available
detection from CT images has been applied to from: https://doi.org/10.1109/JBHI.2018.2879449.
distinguish infection of COVID-19 from other [7] L. Li, et al., Using artificial intelligence to detect
common pneumonia diseases. A variety of COVID-19 and community-acquired pneumonia
based on pulmonary CT: evaluation of the diagnostic
deep and machine learning methods have been
accuracy, Radiology 296 (2) (2020) E65E71. Available
used to extract and identify images of COVID- from: https://doi.org/10.1148/radiol.2020200905.
19 to ensure a clinical diagnosis ahead of the [8] H. Panwar, P.K. Gupta, M.K. Siddiqui, R. Morales-
pathogenic test for the disease control. The pro- Menendez, P. Bhardwaj, V. Singh, A deep learning
posed architecture showed that the ImageNet and Grad-CAM based color visualization approach for
fast detection of COVID-19 cases using chest X-ray
pretrained models yielded 91.75% test accuracy
and CT-scan images, Chaos Solitons Fractals (2020)
using classical machine learning algorithms 110190. Available from: https://doi.org/10.1016/j.
with a test accuracy of 98.30% using transfer chaos.2020.110190.
learning. The present proposal increased the [9] H. Meng, et al., CT imaging and clinical course of
existing accuracy of the AUC area from 0.96 to asymptomatic cases with COVID-19 pneumonia at
admission in Wuhan, China, J. Infect. 81 (1) (2020)
a current value of 0.982. Similarly, the present
e33e39. Available from: https://doi.org/10.1016/j.
machine learning-based approach increased jinf.2020.04.004.
the result to 91.75% rather than 90.1%. This [10] N. Narayan Das, N. Kumar, M. Kaur, V. Kumar, D.
study showed that the present architecture Singh, Automated deep transfer learning-based
model is very promising in characterizing and approach for detection of COVID-19 infection in chest
X-rays, IRBM (2020). Available from: https://doi.org/
diagnosing COVID-19 infections.
10.1016/j.irbm.2020.07.001.
[11] S. Lalmuanawma, J. Hussain, L. Chhakchhuak,
Applications of machine learning and artificial intelli-
References gence for Covid-19 (SARS-CoV-2) pandemic: a review,
Chaos Solitons Fractals 139 (2020) 110059. Available
[1] WHO, Coronavirus disease (COVID-19) situation reports. from: https://doi.org/10.1016/j.chaos.2020.110059.
,https://www.who.int/emergencies/diseases/novel- [12] A.A. Ardakani, A.R. Kanafi, U.R. Acharya, N.
coronavirus-2019/situation-reports., 2020 (accessed Khadem, A. Mohammadi, Application of deep learn-
08.08.20). ing technique to manage COVID-19 in routine clinical
[2] A. Tahamtan, A. Ardebili, Real-time RT-PCR in practice using CT images: Results of 10 convolutional
COVID-19 detection: issues affecting the results, neural networks, Comput. Biol. Med. 121 (2020)
Expert Rev. Mol. Diagn. (2020) 12. Available from: 103795. Available from: https://doi.org/10.1016/j.
https://doi.org/10.1080/14737159.2020.1757437. compbiomed.2020.103795.
[3] S. Wang, et al., A deep learning algorithm using CT [13] H. Panwar, P.K. Gupta, M.K. Siddiqui, R. Morales-
images to screen for Corona Virus Disease (COVID-19), Menendez, V. Singh, Application of deep learning for
Infectious Diseases (except HIV/AIDS) (2020). Available fast detection of COVID-19 in X-Rays using nCOVnet,
from: https://doi.org/10.1101/2020.02.14.20023028. Chaos Solitons Fractals 138 (2020) 109944. Available
preprint. from: https://doi.org/10.1016/j.chaos.2020.109944.
[4] Y. Yu, H. Lin, J. Meng, X. Wei, H. Guo, Z. Zhao, Deep [14] D.-P. Fan, et al., Inf-Net: Automatic COVID-19 lung
transfer learning for modality classification of medical infection segmentation from CT images,
images, Information 8 (3) (2017). Available from: ArXiv200414133 Cs Eess, May 2020, ,http://arxiv.
https://doi.org/10.3390/info8030091. Art. no. 3. org/abs/2004.14133. (accessed 09.08.20).
[5] H.-C. Shin, et al., Deep convolutional neural networks [15] S. Minaee, R. Kafieh, M. Sonka, S. Yazdani, G.
for computer-aided detection: CNN architectures, data- Jamalipour Soufi, Deep-COVID: Predicting COVID-19
set characteristics and transfer learning, IEEE Trans. from chest X-ray images using deep transfer learning,
Med. Imaging 35 (5) (2016) 12851298. Available from: Med. Image Anal. 65 (2020) 101794. Available from:
https://doi.org/10.1109/TMI.2016.2528162. https://doi.org/10.1016/j.media.2020.101794.

Applications of Artificial Intelligence in Medical Imaging


References 239
[16] L. Brunese, F. Mercaldo, A. Reginelli, A. Santone, Biomed. 196 (2020) 105581. Available from: https://
Explainable deep learning for pulmonary disease and doi.org/10.1016/j.cmpb.2020.105581.
coronavirus COVID-19 detection from X-rays, [26] W.M. Shaban, A.H. Rabie, A.I. Saleh, M.A. Abo-
Comput. Methods Programs Biomed. 196 (2020) Elsoud, A new COVID-19 Patients Detection Strategy
105608. Available from: https://doi.org/10.1016/j. (CPDS) based on hybrid feature selection and
cmpb.2020.105608. enhanced KNN classifier, Knowl.-Based Syst. 205
[17] M. Nour, Z. Cömert, K. Polat, A novel medical diag- (2020) 106270. Available from: https://doi.org/
nosis model for COVID-19 infection detection based 10.1016/j.knosys.2020.106270.
on deep features and bayesian optimization, Appl. [27] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning
Soft Comput. (2020) 106580. Available from: https:// for image recognition, ArXiv151203385 Cs, Dec. 2015,
doi.org/10.1016/j.asoc.2020.106580. ,http://arxiv.org/abs/1512.03385. (accessed 08.08.20).
[18] T. Tuncer, S. Dogan, F. Ozyurt, An automated [28] G. Huang, Z. Liu, L. van der Maaten, K.Q.
Residual Exemplar Local Binary Pattern and iterative Weinberger, Densely connected convolutional net-
ReliefF based COVID-19 detection method using chest works, ArXiv160806993 Cs, Jan. 2018, ,http://arxiv.
X-ray image, Chemom. Intell. Lab. Syst. 203 (2020) org/abs/1608.06993..
104054. Available from: https://doi.org/10.1016/j. [29] A. G. Howard et al., MobileNets: Efficient
chemolab.2020.104054. Convolutional Neural Networks for Mobile Vision
[19] S. Hassantabar, M. Ahmadi, A. Sharifi, Diagnosis and Applications, ArXiv170404861 Cs, Apr. 2017,,http://
detection of infected tissue of COVID-19 patients arxiv.org/abs/1704.04861..
based on lung x-ray image using convolutional neural [30] F. Chollet, Xception: deep learning with depthwise
network approaches, Chaos Solitons Fractals 140 separable convolutions, in: 2017 IEEE Conference on
(2020) 110170. Available from: https://doi.org/ Computer Vision and Pattern Recognition (CVPR),
10.1016/j.chaos.2020.110170. Honolulu, HI, Jul. 2017, pp. 18001807, doi: 10.1109/
[20] T. Mahmud, M.A. Rahman, S.A. Fattah, CovXNet: a CVPR.2017.195.
multi-dilation convolutional neural network for auto- [31] K. Simonyan, A. Zisserman, Very deep convolutional
matic COVID-19 and other pneumonia detection from networks for large-scale image recognition,
chest X-ray images with transferable multi-receptive ArXiv14091556 Cs, ,http://arxiv.org/abs/1409.1556.,
feature optimization, Comput. Biol. Med. 122 (2020) Apr. 2015 (accessed 8.8.20).
103869. Available from: https://doi.org/10.1016/j. [32] C. Szegedy et al., Going deeper with convolutions, in:
compbiomed.2020.103869. 2015 IEEE Conference on Computer Vision and
[21] T.B. Alakus, I. Turkoglu, Comparison of deep learning Pattern Recognition (CVPR), Boston, MA, USA, Jun.
approaches to predict COVID-19 infection, Chaos 2015, pp. 19, doi: 10.1109/CVPR.2015.7298594.
Solitons Fractals 140 (2020) 110120. Available from: [33] X. Wu, et al., Top 10 algorithms in data mining,
https://doi.org/10.1016/j.chaos.2020.110120. Knowl. Inf. Syst. 14 (1) (2008) 137.
[22] A. Altan, S. Karasu, Recognition of COVID-19 disease [34] J. Han, J. Pei, M. Kamber, Data Mining: concepts and
from X-ray images by hybrid model consisting of 2D techniques, Elsevier, 2011.
curvelet transform, chaotic salp swarm algorithm and [35] L. Breiman, Random Forests, Mach. Learn. 45 (1)
deep learning technique, Chaos Solitons Fractals 140 (2001) 532.
(2020) 110071. Available from: https://doi.org/ [36] M. Hall, I. Witten, E. Frank, Data Mining: Practical
10.1016/j.chaos.2020.110071. Machine Learning Tools and Techniques, Kaufmann
[23] A. Sufian, A. Ghosh, A.S. Sadiq, F. Smarandache, A Burlington, 2011.
survey on deep transfer learning to edge computing [37] T. Chen, C. Guestrin, XGBoost: a scalable tree boosting
for mitigating the COVID-19 pandemic, J. Syst. Archit. system, in: Proc. 22nd ACM SIGKDD International
108 (2020) 101830. Available from: https://doi.org/ Conference on Knowledge Discovery and Data Mining,
10.1016/j.sysarc.2020.101830. pp. 785794, Aug. 2016, doi: 10.1145/2939672.2939785.
[24] T. Ozturk, M. Talo, E.A. Yildirim, U.B. Baloglu, O. [38] A.J. Viera, J.M. Garrett, Understanding interobserver
Yildirim, U. Rajendra Acharya, Automated detection of agreement: the kappa statistic, Fam. Med. 37 (5) (2005)
COVID-19 cases using deep neural networks with X-ray 360363.
images, Comput. Biol. Med. 121 (2020) 103792. Available [39] C.A. Lantz, E. Nebenzahl, Behavior and interpretation
from: https://doi.org/10.1016/j.compbiomed.2020.103792. of the κ statistic: Resolution of the two paradoxes, J.
[25] A.I. Khan, J.L. Shah, M.M. Bhat, CoroNet: a deep neu- Clin. Epidemiol. 49 (4) (1996) 431434.
ral network for detection and diagnosis of COVID-19 [40] J. Cohen, A coefficient of agreement for nominal
from chest x-ray images, Comput. Methods Programs scales, Educ. Psychol. Meas. 20 (1) (1960) 3746.

Applications of Artificial Intelligence in Medical Imaging


240 8. A deep learning approach for COVID-19 detection from computed tomography scans

[41] Z. Yang, M. Zhou, Kappa statistic for clustered physi- 15. Available from: https://doi.org/10.1038/s41591-
cianpatients polytomous data, Comput. Stat. Data 020-0931-3.
Anal. 87 (2015) 117. [46] F. Ozyurt, T. Tuncer, A. Subasi, An automated COVID-
[42] (PDF) Classification of COVID-19 in chest X-ray 19 detection based on fused dynamic exemplar pyramid
images using DeTraC deep convolutional neural network feature extraction and hybrid feature selection using
(2020). https://www.researchgate.net/publication/ deep learning, Comput. Biol. Med. 132 (2021) 104356.
340332332_Classification_of_COVID-19_in_chest_X-ray_ [47] T. Tuncer, F. Ozyurt, S. Dogan, A. Subasi, A novel
images_using_DeTraC_deep_convolutional_neural_net- Covid-19 and pneumonia classification method based
work (accessed 09.08.20). on F-transform, Chemometr. Intell. Lab. Syst. 210
[43] H.X. Bai, et al., AI augmentation of radiologist perfor- (2021) 104256. Available from: https://doi.org/
mance in distinguishing COVID-19 from pneumonia 10.1016/j.chemolab.2021.104256. 15 March 2021.
of other etiology on chest CT, Radiology 296 (2020). [48] A. Subasi, S.A. Qureshi, T. Brahimi, A. Serireti,
Available from: https://pubs.rsna.org/doi/full/ COVID-19 detection from X-Ray images using artifi-
10.1148/radiol.2020201491 (accessed 09.08.20). cial intelligence, Artificial Intelligence and Big Data
[44] C. Zheng et al., Deep learning-based detection for Analytics for Smart Healthcare, Elesevier, 2021.
COVID-19 from chest CT using weak label, infectious [49] A. Subasi, A. Mitra, F. Ozyurt, T. Tuncer, Automated
diseases (except HIV/AIDS), preprint, Mar. 2020. doi: Covid-19 detection from CT images using deep learn-
10.1101/2020.03.12.20027185. ing, in: V. Bajaj, G.R. Sinha (Eds.), Computer-aided
[45] X. Mei, et al., Artificial intelligence-enabled rapid Diagnosis and Design Methods for Biomedical
diagnosis of patients with COVID-19, Nat. Med. (2020) Applications, CRC Press, Taylor & Francis, 2021.

Applications of Artificial Intelligence in Medical Imaging


C H A P T E R

9
Detection and classification of Diabetic
Retinopathy Lesions using deep learning
Siddhesh Shelke1 and Abdulhamit Subasi2,3
1
Indian Institute of Technology, Indore, Madhya Pradesh, India 2Institute of Biomedicine, Faculty of
Medicine, University of Turku, Turku, Finland 3Department of Computer Science, College of
Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

9.1 Introduction 241 9.4 Diabetic retinopathy detection using deep


learning 251
9.2 Literature survey on diabetic retinopathy
9.4.1 Prediction and classification 252
detection 244
9.4.2 Performance evaluation metrics 253
9.2.1 Traditional diabetic retinopathy detection
9.4.3 Experimental results 254
approach 244
9.2.2 Binary and multilevel classification 245 9.5 Discussion 261
9.2.3 Datasets 246
9.6 Conclusion 262
9.3 Deep learning methods for diabetic
References 262
retinopathy detection 247
9.3.1 Deep neural networks 247 Further reading 264
9.3.2 Convolutional neural networks 249
9.3.3 Transfer learning 250

9.1 Introduction visual loss. DR is responsible for 2.6% of blind-


ness worldwide. Diabetes patients who have been
Diabetic retinopathy (DR) is a diabetic condi- sick with the condition for a long time are more
tion that causes the retina’s blood vessels to likely to get DR. Regular retina screening is neces-
enlarge and leak fluids and blood. If DR pro- sary for diabetic people to identify and treat DR
gresses to an advanced degree, it might result in early enough to avoid blindness. The presence of

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00004-9 241 © 2023 Elsevier Inc. All rights reserved.
242 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

various sorts of lesions on a retina picture is used impossible to treat. Early detection by diagno-
to identify DR. DR is one of the leading causes of sis is essential because it is often treated effec-
blindness in the working age of developing coun- tively in the early stages. The cost of this task
tries. It is estimated that more than 93 million peo- is high, so early detection is important to
ple will be affected. It is an “eye disease” that reduce labor. It is needed to automatically
causes meningitis as a long-term consequence of detect defects in the eye image at low cost
diabetes, which results in progressive damage to using digital image processing and artificial
the eye and even blindness [1]. Because diabetes intelligence (AI) algorithms. In DR, blood ves-
is a progressive disease, doctors recommend that sels that help in sustaining the retina begins to
people with diabetes be checked at least twice a leak fluid and blood onto the retina, which can
year to identify symptoms regularly. result in formation of visual features known as
Risks factors of DR: lesions such as microaneurysms, hemorrhages,
hard exudates, cotton spots, and vessels area
• Duration of diabetes:
[1]. In a medical diagnosis, an ophthalmologist
• A patient diagnosed before age 30 years
examines an image of a colored background to
• 50% DR after 10 years
examine the patient’s condition. This diagnosis
• 90% DR after 30 years
is difficult and time-consuming and introduces
• Poor metabolic control:
additional errors. Furthermore, because of the
• It is less essential but quite pertinent to
vast number of diabetics and a lack of health
the onset and progression of DR.
resources in some locations, most DR patients
• Increased HbA1c is associated with the
are unable to be detected and treated on time,
increased risk.
suffer permanent eyesight loss, and even lack
• Pregnancy:
vision.
• It is linked to a rapid progression of DR
Rapid detection of DR, especially in early
• Prediction factors—Poor pregnancy
stages, can effectively control and delay degen-
control of diabetes mellitus (DM), too
erative conditions. At the same time, the
rapid control during the early stages of
impact of hand interpretation depends to a
pregnancy, preeclampsia, and fluid
large extent on the understanding of the doc-
imbalance are all risk factors.
tor. Medical malpractice occurs due to incom-
• Hypertension:
petence of doctors. Convolutional neural
• It is most common in patients with DM
networks (CNNs) have surpassed all previous
type 2.
image analysis techniques in computer vision
• Should strictly control (,140/80 mm of Hg).
and image classification tasks over the past
• Nephropathy:
decade. Computer-assisted diagnosis is more
• Associated with the worsening of DR.
effective because it allows screening for a large
• Renal transplantation may be linked to a
number of diseases. The conditions that cause
reduction in DR and a better response to
microangiopathy can lead to the formation of
photocoagulation.
microaneurysms. Hard exudates are white or
• Others:
creamy colors that are very bright in the retina.
• Smoking
If they appear near the middle of the macula
• Obesity
and show fluid in the eyeball, they are consid-
• Hyperlipidemia
ered very dangerous. Bleeding scarring is the
• Anemia
most common type of bleeding due to DR. It is
DR is a chronic disease that appears only a small hemorrhage that originates from the
in the late stages when it is difficult and cervical network. Planting lesions are upper

Applications of Artificial Intelligence in Medical Imaging


9.1 Introduction 243
lesions on the yellow or white eyeball along with or without cotton wool spots are
the side of the hair. They describe eyelid present.
edema as a result of ischemia. They finally 4. Severe nonproliferative DR—(1) numerous
recovered on their own after 3 months. In case hemorrhages and microaneurysms in all
foundational ischemic condition is not diag- four quadrants of the retina, (2) cotton wool
nosed, fresh lesions can form in a variety of spots in two or more quadrants, and (3)
locations [2]. intraretinal microvascular abnormalities in
As per the International Clinical Diabetic at least one quadrant.
Retinopathy and Early Treatment Diabetic 5. Proliferative DR—an advanced stage in
Retinopathy Study Research Group [3] severity which new thin and delicate blood vessels
scale, various categories of DR (Fig. 9.1) are are formed, increasing the risk of spillage
defined as: and causing severe vision loss or blindness.
6. Macular edema exudates or visible
1. There is no visible retinopathy.
stiffening within one disk diameter of the
2. Diabetic with mild nonproliferative
fovea could also be caused by DR, which
complications—retinopathy is defined as the
can be vision-threatening.
presence of at least one microaneurysm, with
or without the presence of other lesions. Diabetes is the leading cause of blindness in
3. Moderate nonproliferative DR—numerous many developed countries. The same applies
microaneurysms and retinal hemorrhages to developing countries. People with diabetes

Retina Images DR Levels

Normal (No DR)

Mild DR

Moderate DR

Severe DR

Proliferative DR

FIGURE 9.1 Diabetic retinopathy levels of the retinal images.

Applications of Artificial Intelligence in Medical Imaging


244 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

can be more blind than patients without diabe- 9.2.1 Traditional diabetic retinopathy
tes. AIDS and macular edema (both essential in detection approach
the hospital) can cause severe vision loss. This
affects the eyeballs and can cause blindness in Chandrashekar [7] proposed a method for
diabetics. DR affects many diabetics in devel- extracting retinal vessels from retinal fundus
oped countries [2]. images using morphological approaches. Kaur
The goal of this chapter is to use fundus and Sinha [8] suggested a blood vessel segmen-
image classification to enforce a direct synthe- tation approach based on morphological filters.
sis of DR. We are working on categorizing fun- There is no noticeable improvement in perfor-
dus images depending on the level of DR, with mance when increasing the number of filter
the goal of achieving end-to-end actual classifi- banks; instead, the convolution process, which is
cation from fundus images to medical status. time-consuming task, is increased. Jaffar et al.
Rather than the doctors’ manual control with [9] proposed a way that uses reconciling thresh-
expertise, it helps to relieve their pressure on olding for exudate detection and eliminates arti-
the diagnosing and treating of DR in a simple facts from the exudates; the retinal structure
and accurate manner. For this task, a variety of area unit is utilized in classification. The pro-
image preprocessing and AI techniques to jected technique failed to cowl all the DR signs;
extract many key features is used and then it has to be explored. Jiang and Mojon [10]
classify them into their corresponding classes. proposed a way, reconciling thresholding on
We use CNN architecture to detect DR in verification-based multithreshold inquisitory
two datasets. The precision, recall, accuracy, approach. With international thresholding, the
receiver operating characteristic (ROC), and blood vessels can’t be divided because of the
area under curve (AUC) measures are all eval- image gradients. So, image inquisitory with var-
uated. We also plot the confusion matrix, ied threshold values does not extract the thre-
which helps us to confirm the strength of the sholded image. Clara I Sánchez et al. [11]
model visually. proposed a combination model that separates
the exudates from the image background, and
edge detection strategies area unit want to sepa-
rate laborious and soft exudates.
9.2 Literature survey on diabetic Goh et al. [12] classified retinal images using
retinopathy detection various classifiers. On fundus images, segmen-
tation was used to differentiate blood vessels,
Much effort has been made in DR detection. microaneurysms, and exudates. The classifiers
There are many ways to find a DR. Scientists were given the segmented region, textural data,
have worked on a variety of techniques to treat and other information derived from Gray-Level
a variety of injuries such as blood vessels, Co-Occurrence Matrix (GLCM) to categorize the
microaneurysms, secretions, and bleeding. normal and abnormal images. On normal
Changes in the shape and size of blood vessels images, the detection system has a success rate
can be a positive sign of DR, as well as the of 92%, while on aberrant images, it has a suc-
presence of various types of lesions that con- cess rate of 91%. Liew et al. [13] employed a sta-
tribute to the diagnosis of diabetes. As a result, tistical technique to demonstrate the
various studies on the autonomic nervous sys- relationship between retinal vascular indicators
tem fall into two categories [46]. and the importance of both qualitative and

Applications of Artificial Intelligence in Medical Imaging


9.2 Literature survey on diabetic retinopathy detection 245
quantitative evaluation of retinal vasculature. 72% that extracts six options. In general, the nor-
The suggested technology will require skilled mal approach for DR detection and classification
assistance in identifying blood vessels. Reza will be created, as shown in Fig. 9.2. Initially, the
et al. [14] used a marker-controlled primarily images are collected and preprocessed. Image
based watershed segmentation methodology phase action will be performed to segment the
that detects the exudates and points. Before essential half, image options square measure
that, average filtering and distinction improve- extracted manually, and classification will be mis-
ment were applied to remove artifacts. treated binary or multiclass classifiers singly.
Hoover and Goldbaum [15] suggested a
fuzzy-based voting system for detecting the loca-
tion of the optic disk, where numerous feature 9.2.2 Binary and multilevel classification
elements overlap. Li and Chutatape [16] In DR disease detection, good number of
employed an active shape model to extract the studies are created on many different classifi-
vascular components based on the optic disk cation techniques. In binary classification, the
location, which is a model-based technique. The DR illness has been classified into two catego-
central macular region was discovered using the ries solely. As per studies, the two categories
retrieved data. Gardener et al. [17] used artificial are often DR or No-DR. In multiclass classifica-
neural network to seek out completely different tion problem, the DR illness has been classified
options such as exudates, blood vessels, and into several categories as No-DR, Mild,
hemorrhages with 93.1%, 91.7%, and 73.8% of Moderate, Severe, Proliferate-DR.
accuracy. Yun et al. [18] planned associate auto-
matic DR arrangement that classifies delicate, 9.2.2.1 Binary classification
moderate, associated severe nonproliferative DR Quellec et al. [19] projected an automatic
and proliferative DR and achieved an accuracy of detection methodology in which three CNNs

Image
Segmentation

No-DR
Mild, Moderate,
Severe,
Proliferate-DR

FIGURE 9.2 Diabetic retinopathy detection and classification process using traditional approach.

Applications of Artificial Intelligence in Medical Imaging


246 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

(AlexNet and two different networks) were used normalized and resized, and to scale back overfit-
to observe microaneurysms, hemorrhages, soft ting, L2 regularization and dropout techniques
and laborious exudates from three completely were used. The system created a specificity of
different datasets, Kaggle, DiaretDB1, and E- 95%, accuracy of 75%, and a sensitivity of 30%.
ophtha (private). The images of complex body Gulshan et al. [23] planned a way wherever
part were resized, cropped, normalized, aug- 10 CNNs (pre-trained Inceptionv3) were trained
mented, and therefore the morphological filter to discover diabetic macular dropsy (DME) and
was applied within the preprocessing section. DR. Eyepacs-1 and Messidor-2 datasets were
The illness was classified into two categories, accustomed to check the CNN model. The data-
attributable and nonreferable DR, and made a set images were initially normalized, resized,
mythical creature price of 0.954 and 0.949 in and fed into the CNN model to classify the
Kaggle and E-ophtha, respectively. images into ascribable DME, moderate/worse
Jiang et al. [20] projected a model wherever DR, severe/worse DR, or totally hierarchical
three pretrained CNNs (Inceptionv3, ResNet152, DR. The model created a specificity of 93% in
and InceptionResNetv2) do not to classify the two of the datasets taken and sensitivity of
dataset as attributable DR or nonreferable DR. 97.5% and 96.1% in yepacs-1 and Messidor-2
Before CNN training, the images were resized, datasets, respectively.
enhanced, and improved, and so the models
were integrated with AdaBoost technique.
Further to update the network weights, Adam
9.2.3 Datasets
optimizer was used and therefore the system
achieved 88.21% accuracy and AUC of 0.946. a) Diabetic-Ratinopathy_Sample_Dataset_
Zago et al. [21] projected a technique wher- Binaryi - This dataset is sample data from
ever two CNNs (pretrained VGG16 and a the Diabetic Retinopathy Competition. It
CNN) were utilized to observe DR or non-DR takes an excessive amount of time in
images supported the red lesion patches likeli- preprocessing, so this dataset contains
hood. This model was trained on the resized images (90,128,264) for saving a
DIARETDB1 dataset, and it had been tested on while. It contains only 526 samples 1/2 of
the few datasets: IDRiD, Messidor, Messidor-2, the samples has DR, and half have not.
DDR, DIARETDB0, and Kaggle. The model Metadata of array are given in Binary
achieved the good results on the Messidor data- Dataframe CSV, and images are on an
set with a sensitivity of 0.94 and AUC of 0.912. exhibition on the identical index.
b) Diabetic Retinopathy 224 3 224 Gaussian
Filteredii - The given image used to identify
9.2.2.2 Multilevel classification the diabetes virus. The first dataset can be
Pratt et al. [22] planned a way wherever a purchased at Discover Blind APTOS 2019.
CNN was used with 10 CNN layers, 8 max- This image has been reshaped to 224 3 224
pooling layers, and 3 totally connected layers, and pixels, making it easier to use with many
a softmax classifier was accustomed classify the pretrained advanced learning models. With
Kaggle dataset images into five categories accord- the dashboard, all images are stored in a
ing to the severity levels of DR. Throughout folder, depending on the intensity/degree
the preprocessing part, the images are color of diabetes. A CSV file is provided. There
i
https://www.kaggle.com/sohaibanwaar1203/prepossessed-arrays-of-binary-data.
ii
https://www.kaggle.com/sovitrath/diabetic-retinopathy-224x224-gaussian-filtered.

Applications of Artificial Intelligence in Medical Imaging


9.3 Deep learning methods for diabetic retinopathy detection 247
idea has led to efforts to improve other aspects
of CNN, such as offline gaming activities,
directors, limiting methods, and regularization
strategies. Another well-known area of
research in this area is the integration of state-
of-the-art methods for better methods, technol-
ogies, and algorithms. While several parts of
the world have supported CNN’s mission,
many readers use their critical thinking posi-
tion in neuroscience, neurobiology, and
research science to explore new ways of evolv-
FIGURE 9.3 Comparison between normal and DR- ing with living things. We tried to improve
infected retina. DR, Diabetic retinopathy. performance by combining and developing
new discoveries to help build the system.
are 5 different image folders: 0—DR None, DL is a form of engineering learning system
1—Easy, 2—Medium, 3—Intense, 4— that uses offline workflows for learning and pro-
lifearaDR. The database contains output. cess experience [30]. One of the computer-aided
The pkl file contains a ResNet34 model that diagnoses and detection methods is DL [31].
has been trained on datasets for 20 years Image processing, detection, search, and registra-
using the FastAI library. Sample images are tion are examples of DL applications in medical
given in Fig. 9.3. imaging research. Recently, DL has been used to
detect and classify DR. Many different methods
are combined to manage the characteristics of
the installation files [32]. There are numerous
9.3 Deep learning methods for diabetic DL-supported models such as restricted
retinopathy detection Boltzmann machines, CNNs, autoencoders, and
sparse coding [33]. When the amount of training
This study utilizes some deep learning (DL) data increases, the performance of those methods
techniques for the early detection of DR. This improves [34]. This is due to an increase in the
disease could be a real threat to the human learned features, as opposed to machine learning
species and be kept in check as soon as possi- methods. Furthermore, DL techniques eliminated
ble. With the advancement of computer vision the need for handcrafted feature extraction.
and CNNs, finding out about diabetes has
become a major topic of debate. In recent years,
emerging biological networks have emerged as
9.3.1 Deep neural networks
a major model of in-depth imaging, discovery,
and dissemination activities. This type of neu- Because this system uses a mathematical
ral network has been discussed for many years. “weight scale” to determine the possible output
[24,25]. Nonetheless, it gained popularity fol- of the input data, deep neural network (DNN)
lowing the advent of DL [2628] which were analysis is more likely to handle such transac-
powered by graphic processing units [29]. tions. The “weights” are adjusted by creating a
However, while deep CNNs are becoming network with data from well-known databases.
more and more popular today, strong evidence You can also test the web with new data to give
suggests that medium to medium-sized inter- the possibility that this data is included in a par-
ventions do not improve performance. This ticular output. Such computer programs have

Applications of Artificial Intelligence in Medical Imaging


248 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

been used in medicine to evaluate chest radio- trained DNN can be used in the diagnosis of
graphs [35] and images from histopathology [36]. diabetic patients, the system’s predictive power
The use of a DNN in the screening of patients by to correctly detect retinopathy in fundus images
mammography provided a better prediction of was investigated. While using this method, 10
the detection of malignancy than inexperienced layers of neural networks can be used by the
radiographers [37]. Moreover, DNNs are often number of neurons varying from 8 to 256 in a
used to medicate visual perception deficiencies manner mainly in 8 3 16 3 32 3 64 3 128 3
in ophthalmology [38] (Fig. 9.4). 256 3 128 3 64 3 32 3 16 3 8 neuron configura-
A neural network was trained in this study to tion. The accuracy was not impressive since the
recognize the features of diabetic fundus images, data was imbalanced concerning the number of
and its accuracy, F1 score, was tested. To see if a images in each class.
import tensorflow as tf
from tensorflow.keras.applications import *
from tensorflow.keras.optimizers import *
from tensorflow.keras.losses import *
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.preprocessing.image import *
from tensorflow.keras.utils import *

dnn_model=Sequential()
dnn_model.add(Dense(8, input_dim=2, kernel_initializer = 'uniform', activation = 'relu'))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(16, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(32, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(64, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization()
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(128, kernel_initializer = 'uniform', activation = 'relu'))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(256, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(128, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(64, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(32, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(16, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(8, kernel_initializer = 'uniform', activation = 'relu' ))
# dnn_model.add(BatchNormalization())
# dnn_model.add(Dropout(0.2))
dnn_model.add(Dense(2,activation='softmax'))
dnn_model.summary()
9.3 Deep learning methods for diabetic retinopathy detection 249

FIGURE 9.4 Deep learning model for diabetic retinopathy detection and classification.

9.3.2 Convolutional neural networks level DR signal, gradually converting and inte-
A CNN is a process model that takes small grating it into higher-resolution DR functions.
pixels as input and routes them through a These properties are automatically combined
defined set of elements in the network. During to provide a great opportunity to map the
learning, the network itself generates a low- image as normal or abnormal [39].

model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=(264,264,3)))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.25))
model.add(Dense(16, activation='relu'))model.add(Dropout(0.5))
model.add(Flatten())model.add(Dense(16, activation='relu'))
model.add(Dropout(0.5))model.add(Dense(5, activation='softmax'))

Applications of Artificial Intelligence in Medical Imaging


250 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

9.3.2.1 Layers in convolutional neural this manually designed method does not
network cover all DR signals in the image, which is a
In this experiment, the CNN-DRD model waste of time investigating common
developed has two to eight flexible layers, sche- problems. As a result, the DR solution is
matics, and a reading system. CNN is a well- limited. Unlike hand-drawn images, a DL
known DL model used by scientists to distribute technique known as CNN has significantly
natural images and has proven to be a successful increased DR values and reduced the time
method for medical images. For example, using constraint on the need to use more data to
monetary imaging, the CNN model plays an know financial characteristics. CNN’s
important role in the proper distribution of the traditional technique and functional design
Nonproliferative Diabetic Retinopathy (NPDR). It fall into the accessories category. The
also increases the efficiency, availability, and cost- manually designed method demonstrates the
effectiveness of your Diabetic Retinopathy (DR) process of moving an object using advanced
scoring system. The DR scoring system is opti- algorithms and expert techniques. CNN is a
mized for a variety of high-quality images and DL process that works and is created by the
different settings compared to traditional hand- human brain system. Learn key ideas from
crafted methods [39]. Considering the popularity your data and understand the design process
of CNN architectures for DR diagnosis, they have with minimal control. CNN is very motivated
significant limitations that are described below. to practice accurate predictions, even if there
are blockages, posts, and the true nature of
1. The current CNN model for detecting DR the object is not possible. In addition, the
focuses only on the DR distribution and does benefits of expanding data are used to learn
not detect the status of DR lesions in the how to make changes and improve mapping
background image. They work from start to capabilities through the CNN system.
finish. From end to end of the CNN means Scientists are currently adapting several
that the input image is imported directly into modifications to the CNN system to improve
the CNN and the output image provides DR their applications in physics [40].
intensity. Although the details of the DR
injury are important to hospital staff.
2. The latest CNN models require high-quality
information models for training, which is an 9.3.3 Transfer learning
expensive and time-consuming task. In In order to reach high accuracy in training a
contrast, CNN models, which can be deep CNN, a substantial amount of training data
learned from several examples, are the most is typically required, which can be costly to obtain.
effective and demanding practice in medical This problem is addressed by transfer learning,
education today in terms of the value of DR. which transfers knowledge learnt on a big dataset
3. CNN models that had been developed failed with a similar domain to the training dataset. A
to learn the complex behavior of DR lesions. common method for CNNs is to train them on a
Initially, the fundus image was split into big source dataset and then exploit their feature
discrete segments that were used to feed into extraction skills. Furthermore, fine-tuning the
the CNN. As a result, small lesions are resulting pre-trained model on a smaller target
difficult to detect due to the incomprehensible dataset with a comparable domain but a different
nature of the optic nerve of the eye. Studying goal has been shown to enhance task accuracy
these vulnerabilities is essential and necessary even further. DL techniques require many exam-
for a real DR classification system. Moreover, ples or a large set of data. The number of samples

Applications of Artificial Intelligence in Medical Imaging


9.4 Diabetic retinopathy detection using deep learning 251
in our dataset are fewer as compared to modern DenseNet121, InceptionResNetV2, and Mobile-
data standards, that is, the number of MRI images NetV2. Simonyan and Zisserman introduced the
for the classification function is fewer, so we can VGG network architecture in the 2014 article
go for transfer learning. Transfer learning is about “Very Deep Convolutional Networks for Large-
taking features learned on one issue, and using Scale Image Recognition.” This network is char-
them on a different, related issue. Export learning acterized by its simplicity, using only three uni-
is aimed at improving learning in the target task versal layers on top of each other in detail.
by incorporating information from the source task. Maximum pooling works up to large volumes.
The key facts to test before using transfer learning Compared to traditional sequential network
are that the source task should have been trained architectures such as AlexNet, Over Feat, and
on a larger dataset than the target task, and the VGG, ResNet is a type of “exotic architecture”
source and target tasks should be identical in based on microarchitecture modules (also known
nature. There are three common behaviors that as “network-in-network architectures”). The
can improve learning through a transfer. First is term microarchitecture refers to the collection of
the first achievable output in the target task that building blocks used to create a network. Was an
uses only the information sent, compared to the original study showing it can be given? There
first output of the ignorant person before further are 3 and 5 brackets in the same network mod-
learning takes place. Second, there is a relationship ule, and these filter outputs are stacked along the
between the time it takes to fully understand the dimensions of the channel before being sent to
target’s mission and the time it takes to under- the next network layer (Fig. 9.5).
stand it from scratch, even though the information
is being transmitted and third, the final level of
output which can be reached in the target role rel- 9.4 Diabetic retinopathy detection using
ative to the final level without transition. deep learning
Here we used different models such as
ResNet50, ResNet101, VGG16, VGG19, Incep- AI is a critical approach that enhances
tionV3, MobileNet, MobileNetV2, DenseNet169, human expertise in the health-care industry by

Normal

Diabetic
Retina Retinography
Image
Pre-trained Model

Fully Output
Connected Layer
Layer

FIGURE 9.5 Transfer learning model for diabetic retinopathy detection and classification.

Applications of Artificial Intelligence in Medical Imaging


252 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

training complex medical data using algo- imbalanced dataset consists of Binary. npz files
rithms that create advanced research data. AI that have 1000 images. We split them into 600
and DL are applications in the field of ophthal- training, 200 validation, and 200 testing
mology because most of the data is image- images. We predict only 2 Classes (0 and 1)
based and the results are consistent with image using this dataset. The balanced dataset con-
recognition. AI applications in DR research sists of 3662 224 3 224 Gaussian images. We
have succeeded in providing recommendations split them 2966 training images, 329 validation
and research presentations. The role of AI in images, and 367 testing images. We predict
the diagnosis of rheumatoid arthritis (RDR), both 2 Classes (0 and 1) and 5 Classes (0, 1, 2,
expressed as moderate to severe or higher DR 3, and 4) using this dataset.
retinopathy (NPDR), is very consistent with or
without macular degeneration (DME). This
offers the right benefits. Over 90% of the stud- 9.4.1.1 For Dataset a) Diabetic-Retinopathy
ies use various AI algorithms [41]. AI algo- Sample_Dataset_Binary
rithms require skilled professionals to obtain In this experiment, we apply CNN with dif-
clear and concise images for use as input data, ferent layers and different neuron configura-
and ophthalmologists/ophthalmologists (infor- tions. This approach did not give significant
mants) to provide full field accuracy for imag- differences in the evaluation metrics.
ing. A recent study of eye imaging on
import numpy as np
smartphones and EyeArt AI software showed import matplotlib.pyplot as plt
a high efficiency of 95.8% for detecting DR of y_test=np.argmax(y_test, axis=1)
pred=np.argmax(model.predict(x_test),axis=-1)
any severity and over 99% efficiency for exper- cm=confusion_matrix(y_test,pred)
tise in detecting RDR and STDR [42]. cm_plot=plot_confusion_matrix(cm,classes=['0','1'])

9.4.1 Prediction and classification We can observe that in some circumstances,


CNNs are similar to neural networks, with a the evaluation metrics are not so good. This
few exceptions. CNN has three aspects: width, occurs because the dataset is severely imbal-
height, and depth. There is no absolute rela- anced, resulting in relatively poor scores. To
tionship between the inner line, that is the deal with this, we could use data augmentation
nerves in one layer are not connected to the or Generative Adversarial Networks (GAN) to
nerves in the next line. The end result was opti- balance the dataset.
mized for a single vector. Two main steps of
CNNs are to improve features and distribution. 9.4.1.2 For Dataset b) Diabetic Retinopathy
Collaboration is the process of removing tasks 224 3 224 Gaussian Filtered
through a series of conflicts and integration This dataset is relatively balanced than the
tasks. The issue of this extraction feature will previous dataset, which was imbalanced. In
be the input for distribution. On top of this experiment, we have applied CNN with
extracted features, a fully connected layer will different layers and other DL. Also, we predict
act as a classifier. Once the feature extraction is both 2 Classes and 5 Classes. We get good
done, the classification of images will start. results in the 2 Classes since the number of
This section will go through two datasets, images for Class 0 is almost equal to the num-
one imbalanced and the other is balanced. The ber of images for Class 1.

Applications of Artificial Intelligence in Medical Imaging


9.4 Diabetic retinopathy detection using deep learning 253

9.4.2 Performance evaluation metrics the F1 score. The F1 score is the harmonic mean
of precision and recall, ranging from 0 to 1.
It is critical to estimate how precisely a classi- Many performance metrics are used to assess
fication model predicts the correct result when the classification performance of DL methods.
developing one. This estimation, however, is Accuracy, recall, precision, and area under the
insufficient because it can produce deceptive ROC curve are some of the commonly used
results in some cases. And it is at this point metrics in DL. The percentage of abnormal
where the new requirements become an impor- images classified as abnormal is referred to as
tant factor in determining the more significant sensitivity, and the percentage of normal images
estimations of the constructed model. classified as usual is referred to as specificity [43].
For classification models, accuracy is a criti- AUC is area under the curve that is formed
cal metric. It is straightforward to understand by plotting sensitivity versus specificity. The
and apply to binary and multiclass classification percentage of correctly classified images is
problems. Accuracy indicates the proportion of referred to as accuracy. The equations for each
tangible results in the total number of records measurement are listed below:
tested. The classification model, which is
entirely built from balanced datasets, is accurate (1) Accuracy 5 TN 1 TP/(TN 1 TP 1 FN 1 FP)
enough to be tested. Precision is defined as the (2) Precision 5 TP/(TP 1 FP)
ratio of true positives to predicted positives. (3) Recall 5 TP/(TP 1 FN)
Another important metric is recall, which gives (4) F1 score 5 (2 3 Precision 3 Recall)/
information if all potential positives must be (Precision 1 Recall)
captured. Recall is the percentage of overall pos- (5) The area under the curve is abbreviated as
itive samples that were correctly predicted as AUC. AUC is a more comprehensive
positive. The recall is one if all positive samples measurement that takes into account both
are predicted to be positive. If an optimal com- true negative and true positive outcomes.
bination of precision and recall is required, The higher the AUC, the better the model’s
these two measures can be combined to form performance.

from sklearn.metrics import classification_report


y_test = list(test_df.Label)
from sklearn import metrics
print('Accuracy:', np.round(metrics.accuracy_score(y_test, pred),5))
print('Precision:', np.round(metrics.precision_score(y_test, pred, average='weighted'),5))
print('Recall:', np.round(metrics.recall_score(y_test,pred, average='weighted'),5))
print('F1 Score:', np.round(metrics.f1_score(y_test, pred, average='weighted'),5))
print('Cohen Kappa Score:', np.round(metrics.cohen_kappa_score(y_test, pred),5))
print(classification_report(y_test, pred))

from sklearn.metrics import confusion_matrix


import seaborn as sns
cf_matrix = confusion_matrix(y_test, pred, normalize='true')
plt.figure(figsize = (10,6))
sns.heatmap(cf_matrix,annot=True, xticklabels = sorted(set(y_test)),
yticklabels = sorted(set(y_test)))
plt.title('Normalized Confusion Matrix')
plt.show()

Applications of Artificial Intelligence in Medical Imaging


254 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

The number of disease images classified as architectures, we wished to see how the perfor-
true positives (TP) is equal to the number of mance would change as the CNNs of higher
disease images classified as true positives (TP). levels were being used. We conducted experi-
A true negative is an outcome where the ments on the data for the same and came up
model correctly predicts the negative class (TN). with the following tables.
The number of standard images classified as
disease is referred to as the false-positive (FP) rate. 9.4.3.1 For Dataset a) Diabetic-
The number of false-negative (FN) disease Retinopathy Sample Dataset Binary
images is equal to the number of normal dis- This section contains all the experimental
ease images. results and observations for the abovemen-
In practice, a prototype must have precision tioned dataset (Table 9.1).
and recall of 1, resulting in an F1 score of 1, that Since the given dataset might be imbalanced
is, 100% accuracy, which is not possible in a clas- as a result, and the Kappa score turned out to
sification task. As a result, the classifier that is be hostile or equal to 0. Due to the high num-
created should have higher precision and recall. ber of layers with no residual skip in between
In addition to the agreement observed in the the layers, the information could not get propa-
confusion matrix, Cohen Kappa accounts for gated throughout the model, and thus it gave
agreement that occurs by chance. an abysmal performance. The softmax activa-
tion function has to be added to the output to
9.4.3 Experimental results create normalized arrays that determine
Different architectures were trained with whether the retinal image is healthy or dis-
pretrained weights. Apart from the popular abled (Table 9.2).

TABLE 9.1 The comparison of convolutional neural network (CNN) layers using Diabetic Retinopathy Detection
binary dataset.
CNN layers Training accuracy Validation accuracy Test accuracy F1 score ROC area

2 Layer CNN 0.7726 0.7556 0.7 0.668 0.96


3 Layer CNN 0.7034 0.7556 0.785 0.69 0.9407
4 Layer CNN 0.705 0.7556 0.785 0.69 0.94

5 Layer CNN 0.7285 0.7556 0.785 0.69 0.9358


6 Layer CNN 0.7172 0.7556 0.785 0.69 0.9358
7 Layer CNN 0.70 0.76 0.785 0.69 0.915
8 Layer CNN 0.69 0.7556 0.785 0.69 0.9133

Applications of Artificial Intelligence in Medical Imaging


9.4 Diabetic retinopathy detection using deep learning 255
model = Sequential()
model.add(ResNet50(input_shape=(264,264,3),include_top=True,weights=None))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(16,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(64,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(128,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(128,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(64,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(16,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
# Creating an output layer
model.add(Dense(units= 5, activation='softmax'))
c3=tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_loss",
factor=0.1,
patience=2,
mode="auto",
min_delta=0.0001,
cooldown=0,
min_lr=0.001)
model.compile(optimizer='adam',loss='categorical_crossentropy', metrics=['accuracy','AUC'])
history=model.fit(x_train,y_train,epochs=20,batch_size=16,validation_split=0.2)

For the models mentioned earlier, the softmax and negative in some cases. This happens because
activation function must be added to the output the dataset is highly unbalanced as a result the
to create normalized arrays that determine scores are relatively not good. To encounter this,
whether the retinal image is healthy or disabled we shall make the dataset balanced using data
(0 or 1). We can see that the Kappa score is zero augmentation or GAN.

Applications of Artificial Intelligence in Medical Imaging


256 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

class Generator(nn.Module):
def __init__(self, ngpu):
super(Generator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is Z, going into a convolution

nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),


nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# state size. (ngf*8) x 4 x 4

nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),


nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
# state size. (ngf*4) x 8 x 8

nn.ConvTranspose2d(ngf * 4, ngf * 4, 3, 1, 1, bias=False),


nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
# state size. (ngf*4) x 8 x 8

nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),


nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
# state size. (ngf*2) x 16 x 16

nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False),


nn.BatchNorm2d(ngf),
nn.ReLU(True),
# state size. (ngf) x 32 x 32

nn.ConvTranspose2d( ngf, ngf, 3, 1, 1, bias=False),


nn.BatchNorm2d(ngf),
nn.ReLU(True),
# state size. (ngf) x 32 x 32

nn.ConvTranspose2d( ngf, ngf, 4, 2, 1, bias=False),


nn.BatchNorm2d(ngf),
nn.ReLU(True),
# state size. (ngf) x 32 x 32

nn.ConvTranspose2d( ngf, ngf, 4, 2, 1, bias=False),


nn.BatchNorm2d(ngf),
nn.ReLU(True),
# state size. (ngf) x 32 x 32

nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),


nn.Tanh()
# state size. (nc) x 64 x 64
)

def forward(self, input):


return self.main(input)
# Create the generator
netG = Generator(ngpu).to(device)
netG.apply(weights_init)
# Print the model
print(netG)

class Discriminator(nn.Module):
def __init__(self, ngpu):
super(Discriminator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is (nc) x 256 x 256
nn.Conv2d(nc, ndf , 4, 4, 1, bias=False),
nn.BatchNorm2d(ndf),
nn.LeakyReLU(0.2, inplace=True),

# input is (ndf) x128 x 128


nn.Conv2d(ndf, ndf * 2, 4, 4, 1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),

# input is (ndf*2)x 32 X 32
nn.Conv2d(ndf*2, ndf*4, 4, 4, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(ndf * 4, 1, 4, 1, 0, bias=False),
nn.Sigmoid())

def forward(self, input):


return self.main(input)

# Create the Discriminator


netD = Discriminator(ngpu).to(device)
netD.apply(weights_init)
print(netD)

TABLE 9.2 The comparison of other deep learning models using Diabetic Retinopathy Detection binary dataset.
CNN layers Training accuracy Validation accuracy Test accuracy F1 score ROC area

ResNet50 0.689 0.7556 0.785 0.69 0.9255


VGG16 0.73 0.7556 0.79 0.69 0.9326
VGG19 0.731 0.7556 0.78 0.69 0.9302
Inception_v3 0.78 0.75 0.78 0.687 0.93
MobileNet 0.707 0.7556 0.79 0.69 0.9322
DenseNet169 0.694 0.7556 0.785 0.69 0.9402

DenseNet121 0.722 0.7556 0.79 0.69 0.93


InceptionResNetV2 0.7 0.7556 0.785 0.69 0.9324
MobileNetV2 0.672 0.7556 0.785 0.69 0.9286
ResNet101 0.77 0.7556 0.79 0.69 0.9405
258 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

9.4.3.2 For Dataset b) Diabetic Retinopathy The dataset, as mentioned above, is balanced
224x224 Gaussian Filtered with around 1700 images in both classes 0 and 1.
This section contains all the experimental The Cohen Kappa Score turned out to be greater
results and observations for the dataset, as than 0.8 in all the experiments, which are good
mentioned earlier. We can infer two differ- signs showing an excellent strength of agree-
ent possibilities in this dataset. We can use ment. The softmax activation function has to be
this dataset to predict both 2 Classes and added to the output to create normalized arrays
5 Classes in diabetic retinopathy disease that determine whether the retinal image is
(Table 9.3). healthy or disabled (0 or 1) (Table 9.4).

model = tf.keras.Sequential([
layers.Conv2D(16, (3,3), padding="same", input_shape=(224,224,3), activation = 'relu'),
layers.MaxPooling2D(pool_size=(2,2)),
layers.BatchNormalization(),

layers.Conv2D(32, (3,3), padding="same", activation = 'relu'),


layers.MaxPooling2D(pool_size=(2,2)),
layers.BatchNormalization(),

layers.Conv2D(64, (3,3), padding="same", activation = 'relu'),


layers.MaxPooling2D(pool_size=(2,2)),
layers.BatchNormalization(),

layers.Conv2D(64, (3,3), padding="same", activation = 'relu'),


layers.MaxPooling2D(pool_size=(2,2)),
layers.BatchNormalization(),

layers.Conv2D(128, (3,3), padding="same", activation = 'relu'),


layers.MaxPooling2D(pool_size=(2,2)),
layers.BatchNormalization(),

layers.Conv2D(128, (3,3), padding="same", activation = 'relu'),


layers.MaxPooling2D(pool_size=(2,2)),
layers.BatchNormalization(),

layers.Conv2D(256, (3,3), padding="same", activation = 'relu'),


layers.MaxPooling2D(pool_size=(2,2)),
layers.BatchNormalization(),

layers.Conv2D(256, (3,3), padding="same", activation = 'relu'),


layers.MaxPooling2D(pool_size=(1,1)),
layers.BatchNormalization(),

layers.Flatten(),
layers.Dense(32, activation = 'relu'),
layers.Dropout(0.15),
layers.Dense(2, activation = 'softmax')])
model.compile(optimizer=tf.keras.optimizers.Adam(lr = 1e-5),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['acc','AUC',tensorflow_addons.metrics.F1Score(num_classes=2, average='weighted'),
tensorflow_addons.metrics.CohenKappa(num_classes=5)])
history = model.fit(train_batches,
epochs=12,
validation_data=val_batches)

Applications of Artificial Intelligence in Medical Imaging


9.4 Diabetic retinopathy detection using deep learning 259
TABLE 9.3 The comparison of convolutional neural network (CNN) layers using Diabetic Retinopathy Detection
224 3 224 Gaussian images for 2 Class classification.

CNN layers Training accuracy Validation accuracy Test accuracy F1 score Kappa score ROC area

2 Layer CNN 0.9844 0.9236 0.9527 0.9527 0.9055 0.9874

3 Layer CNN 0.9676 0.96 0.9272 0.9273 0.8546 0.9664


4 Layer CNN 0.9602 0.9327 0.9181 0.9182 0.8363 0.9763
5 Layer CNN 0.9465 0.9218 0.9127 0.9127 0.8255 0.9554
6 Layer CNN 0.9559 0.9364 0.9072 0.9073 0.8145 0.9682
7 Layer CNN 0.98 0.92 0.9272 0.9273 0.8546 0.9716
8 Layer CNN 0.9859 0.9309 0.93 0.929 0.8581 0.9724

TABLE 9.4 The comparison of DL model using Diabetic Retinopathy Detection 224 3 224 Gaussian images for 2
Class classification.
Models Training accuracy Validation accuracy Test accuracy F1 score Kappa score ROC area

ResNet50 0.9257 0.92 0.9183 0.9182 0.84 0.9613


VGG16 0.9556 0.9666 0.962 0.9618 0.923 0.9898
VGG19 0.9495 0.9483 0.9427 0.9428 0.8851 0.985
Inception_v3 0.9708 0.9544 0.9619 0.9618 0.9235 0.996
MobileNet 0.9795 0.9574 0.9564 0.9564 0.9128 0.9978

DenseNet169 0.9672 0.9848 0.9646 0.9646 0.9292 0.9921


DenseNet121 0.9652 0.9666 0.9782 0.9782 0.9564 0.9934
InceptionResNetV2 0.96 0.9544 0.951 0.9508 0.9015 0.9909
MobileNetV2 0.9707 0.9514 0.9427 0.9428 0.885 0.9932
ResNet101 0.8907 0.8693 0.8855 0.8857 0.77 0.9506

For the models mentioned above, the that determine whether the retinal image
softmax activation function has to be added is healthy or disabled (0 or 1) (Tables 9.5
to the output to create normalized arrays and 9.6).

Applications of Artificial Intelligence in Medical Imaging


260 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

train_generator,test_generator,train_images,val_images,test_images=create_gen()
# Load the pretained model
pretrained_model = tf.keras.applications.ResNet50(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling='avg')
pretrained_model.trainable = False

inputs = pretrained_model.input
x = tf.keras.layers.Dense(128, activation='relu')(pretrained_model.output)
x = tf.keras.layers.Dense(128, activation='relu')(x)
outputs = tf.keras.layers.Dense(2, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy','AUC'])
history = model.fit(
train_images,
validation_data=val_images,
batch_size = 32,
epochs=10,
callbacks=[
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)
])

TABLE 9.5 The comparison of different convolutional neural network (CNN) layers using Diabetic Retinopathy
Detection 224 3 224 Gaussian images for 5 Class classification.
CNN layers Training accuracy Validation accuracy Test accuracy F1 score Kappa score ROC area

2 Layer CNN 0.9059 0.9025 0.9003 0.69 0.58 0.9346


3 Layer CNN 0.8953 0.8927 0.8916 0.6444 0.5 0.9152
4 Layer CNN 0.9179 0.9055 0.901 0.71 0.59 0.94
5 Layer CNN 0.8884 0.8909 0.8843 0.6142 0.47 0.88
6 Layer CNN 0.8646 0.8793 0.8687 0.62 0.41 0.85
7 Layer CNN 0.85 0.85 0.84 0.55 0.33 0.87

8 Layer CNN 0.8869 0.8775 0.8822 0.66 0.47 0.9212

Applications of Artificial Intelligence in Medical Imaging


9.5 Discussion 261
TABLE 9.6 The comparison of deep learning model using Diabetic Retinopathy Detection 224 3 224 Gaussian
images for 5 Class classification.

Models Training accuracy Validation accuracy Test accuracy F1 score Kappa score ROC area

ResNet50 0.7272 0.71 0.71 0.632 0.522 0.92

VGG16 0.7459 0.7324 0.7575 0.704 0.613 0.9364


VGG19 0.7342 0.772 0.725 0.676 0.5542 0.941
Inception_v3 0.8358 0.73 0.7438 0.7049 0.6 0.9753
MobileNet 0.8472 0.7872 0.7929 0.7755 0.67 0.9823
DenseNet169 0.8188 0.7781 0.7847 0.754 0.665 0.9714
DenseNet121 0.8235 0.7751 0.8092 0.80226 0.71 0.9697

InceptionResNetV2 0.7827 0.7781 0.8065 0.7674 0.685 0.957


MobileNetV2 0.8102 0.7568 0.7874 0.7693 0.6556 0.9668
ResNet101 0.6766 0.7325 0.703 0.6416 0.5268 0.9023

9.5 Discussion DR screening system. With the increasing num-


ber of diabetics, the need for a better eye exami-
The current study aims to detect DR in reti- nation system has become a major problem.
nal images by employing a CNN deep network Using DL to detect DR and distribution solves
approach. Unlike some other authors who are the problem of improved ML feature selection.
using CNN to produce a label on an image’s On the other hand, training requires a lot of
pixels, our model deals with patches and there- data. Many researchers have used data enhance-
fore can localize possible future regions of ment to increase the number of images and con-
lesions, creating a strong method for a red reti- trols during training.
nal lesion specialist and resulting to further A few of the drawbacks of the usage of deep
improvements in DR detection. gaining knowledge of withinside the clinical
Our primary objective was to simplify the area is the dimensions of the datasets required
framework while enhancing its efficiency. To to teach the DL structures, as DL calls for a
that final moment, we developed an effective huge volume of information. The consequences
framework for finding training blotches, ensur- of DL structures are closely decided with the
ing that complicated examples obtain extra atten- aid of using the quantity of the schooling facts
tion during the training process. Furthermore, in addition to its exceptional and sophistication
because our goal was to achieve a rough localiza- equilibrium. As a result, the modern-day pub-
tion of the regions rather than precise segmenta- lic dataset sizes need to be increased, while
tion, we undersampled the images, resulting in a huge datasets, consisting of the general public
significant reduction in generalization processing Kaggle dataset, need to be delicate to cast off
time, which is critical in a real-world application mislabeled and low-exceptional facts.
scenario. All of the studies mentioned in this The studying defined right here differs from
chapter used DL techniques to manipulate the the usage of DL technology. They vary withinside

Applications of Artificial Intelligence in Medical Imaging


262 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

the quantity of reads they devise their personal framework to identify the diabetic-related disease,
CNN gadget and the quantity of reads they opt to study how to disseminate images of those with
use for present structures which includes VGG, the disease, and learn the function that directly
ResNet, and MobileNet, and withinside the transi- removes the image’s feature. Image courtesy is
tion to the smaller ones. Building a brand-new standardized and expanded as a product selection
CNN model from scratch takes a whole lot of and model training. The maximum accuracy of
attempt and time. It is an awful lot simpler to the test was 97.82% using the DenseNet121 archi-
apply switch learning, which hastens the layout tecture, but the two-dimensional projection uses a
and improvement of latest buildings. On the alter- 224 3 224 Gaussian image dataset to improve the
native hand, it ought to be cited that the consis- accuracy of the diabetic image distribution. Also,
tency of the gadget with the CNN model itself is unlike modern ones, the number of images
extra than that utilized in present structures. required to train a small sample, which is very
Researchers must focus on this issue and important due to the weight of the coverage.
need to do more research to determine both The automated DR detection system reduces
conditions. Improved DR detection system that the time required to perform diagnostic work,
can detect different types of lesions and stages saves ophthalmologists work and costs, and
DR results in an improved monitoring system results in timely patient care. Automatic DR
for DR patients which avoids the risk of vision detection plays an important role in early DR
loss. Five DR levels need to be identified cor- detection. The DR level is supported by the
rectly and the gap closed in a system that can type of color that appears on the eyeball. This
detect DR damage. This can be seen as a cur- chapter presents modern automation systems
rent challenge for researchers for future for diabetes diagnosis and classification using
research. Use the map to show the DR level of advanced learning techniques. Most research-
each pixel to determine the distribution. We ers use CNN for classification, so they detect
used a simple method to calculate the label DR images from their size. This chapter also
function for the entire image. The advantage is describes the main methods which could be
that it does not need to be reused. In particular, used to classify and diagnose DR using DL.
our method distinguishes between injury
detection time (studied by CNN) and DR tag-
ging by CNN events. As a result, this product References
can be used in any live database without any
additional changes. [1] C. Agurto, et al., Multiscale AM-FM methods for dia-
betic retinopathy lesion detection, IEEE Trans. Med.
Imaging 29 (2) (2010) 502512.
[2] R. Acharya, Y.K.E. Ng, J.S. Suri, Image Modeling of the
9.6 Conclusion Human Eye, Artech House, 2008.
[3] Early Treatment Diabetic Retinopathy Study Research
Group, Grading diabetic retinopathy from stereoscopic
DM is one of the complications of diabetes and
color fundus photographs—an extension of the modi-
is a cause of blindness. An effective and automatic fied Airlie House classification: ETDRS report number
study of the incidence of DM is of clinical signifi- 10, Ophthalmology 98 (5) (1991) 786806.
cance. Early detection allows faster treatment. [4] P. Liskowski, K. Krawiec, Segmenting retinal blood ves-
This is important because early detection can pre- sels with deep neural networks, IEEE Trans. Med.
vent disability. Automatic diagnosis of DM from Imaging 35 (11) (2016) 23692380.
[5] J. Odstrcilik, et al., Retinal vessel segmentation by
previous images can help clinicians effectively improved matched filtering: evaluation on a new high-
with the diagnosis of DM, which can improve the resolution fundus image database, IET Image Processing
quality of diagnosis. This chapter presents a 7 (4) (2013) 373383.

Applications of Artificial Intelligence in Medical Imaging


References 263
[6] J. Jan, et al., Retinal image analysis aimed at blood ves- [20] H. Jiang, K. Yang, M. Gao, D. Zhang, H. Ma, W. Qian,
sel tree segmentation and early detection of neural- An interpretable ensemble deep learning model for
layer deterioration, Comput. Med. Imaging Graph. 36 diabetic retinopathy disease classification in 41st
(6) (2012) 431441. Annual International conference of the IEEE engineer-
[7] M. Chandrashekar, An approach for the detection of ing in medicine and biology society (EMBC), 2019,
vascular abnormalities in diabetic retinopathy, Int. j. pp. 20452048.
data min. Tech. App. 02 (2013) 246250. [21] G.T. Zago, R.V. Andreao, I.B. Dorizz, E.O. Teatini
[8] J. Kaur, H.P. Sinha, Automated localization of optic Salles, Diabetic retinopathy detection using red lesion
disc and macula from fundus images, Int. j. adv. res. 2 localization and convolutional neural networks,
(4) (2012) 242249. Comput. Biol. Med. (2020).
[9] H.F. Jaafar, A.K. Nandi, W. Al-Nuaimy, Detection of [22] H. Pratt, F. Coenen, D.M. Broadbent, S.P. Harding, Y.
exudates from digital fundus images using a region Zheng, Convolutional neural networks for diabetic ret-
based segmentation technique, in: 19th European inopathy, Procedia Comput Sci 90 (2016) 200205.
Signal Processing Conference, Barcelona, Spain, 2011. [23] V. Gulshan, L. Peng, M. Coram, et al., Development
[10] X. Jiang, D. Mojon, Adaptive local thresholding by veri- and validation of a deep learning algorithm for detec-
fication based multi-threshold probing with application tion of diabetic retinopathy in retinal fundus photo-
to vessel detection in retinal imagesin IEEE Trans. graphs, JAMA 316 (22) (2016) 24022410.
Pattern Anal. Mach. Intell. 25 (1) (2003) 131137. [24] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E.
[11] C.I. Sanchez, M. Garcia, A. Mayo, M. Lopez, R. Howard, W. Hubbard, et al., Backpropagation applied
Hornero, Retinal image analysis based on mixture to handwritten zip code recognition, Neural Comput.
models to detect hard exudates, Med. Image Anal. 13 1 (4) (1989) 541551. Available from: https://doi.org/
(2009) 650658. 10.1162/neco.1989.1.4.541.
[12] J. Goh, L. Tang, G. Saleh, L. Al Turk, Y. Fu, A. [25] Y. LeCun, B. Boser, J. Denker, D. Henderson, R.
Browne, Filtering normal retinal images for diabetic Howard, W. Hubbard, et al., Handwritten digit recog-
retinopathy screening using multiple classifiers, in: nition with a back-propagation network, Adv. Neural
International Conference on Information Technology Inf. Process. Syst. 2 (1990) 396404URL. Available
and Applications in Biomedicine, 2009, pp. 14. from: http://citeseerx.ist.psu.edu/viewdoc/summary?
[13] G. Liew, T.Y. Wong, P. Mitchell, J.J. Wang, Retinal vas- doi 5 10.1.1.32.5076.
cular imaging  a new tool in micro vascular disease [26] G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning
research, Circ.: Cardiovasc. Imaging (2008) 156161. algorithm for deep 310 belief nets, Neural Comput. 18
[14] A.W. Reza, C. Eswaran, K. Dimyati, Diagnosis of dia- (7) (2006) 15271554. Available from: https://doi.
betic retinopathy: automatic extraction of optic disc org/10.1162/neco.2006.18.7.1527.
and exudates from retinal images using marker- [27] G.E. Hinton, R.R. Salakhutdinov, Reducing the
controlled watershed transformation, J Med Systems dimensionality of data with neural networks, Science
35 (2011) 14911501. 313 (5786) (2006) 504507. Available from: https://
[15] A. Hoover, M. Goldbaum, Locating the optic nerve in doi.org/10.1126/science.1127647.
a retinal image using the fuzzy convergence of the [28] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle,
blood vesselsin IEEE Trans. Med. Imaging 22 (8) Greedy layer-wise training of deep networksin Adv.
(2003) 951958. Neural Inf. Process. Syst. (2007) 153160.
[16] H. Li, O. Chutatape, Automated feature extraction in [29] L. Deng, D. Yu, 7, Deep Learning: Methods and
color retinal images by a model based approachin Applications, Now Publishers Inc, 2013. arXiv:1309.1501.
IEEE. Trans. Biomed. Eng. 51 (2) (2004) 246254. Available from: https://doi.org/10.1561/2000000039.
[17] G. Gardener, D. Keating, T. Williamson, A. Elliott, [30] D. Li., A tutorial survey of architectures, algorithms,
Automatic detection of diabetic retinopathy using an and applications for deep learning, APSIPA Trans.
artificial neural network: a screening tool, Br. J. Signal Inf Process 3 (2) (2014) 129.
Ophthalmol. (1996). [31] V. Vasilakos, A. Tang, Y. Yao, Neural networks for
[18] L.W. Yun, U.R. Acharya, Y.V. Venkatesh, C. Chee, L. computer-aided diagnosis in medicine: a review,
C. Min, E.Y.K. Ng, Identification of different stages of Neurocomputing 216 (2016) 700708.
diabetic retinopathy using retinal optical images, Inf. [32] X.W. Chen, X. Lin, Big data deep learning: challenges
Sci. (Ny) 178 (2008) 106121. and perspectives, IEEE Access 2 (2014) 514525.
[19] G. Quellec, K. Charriere, Y. Boudi, B. Cochener, M. [33] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, M.S. Lew,
Lamard, Deep image mining for diabetic retinopathy Deep learning for visual understanding: a review,
screening, Med. Image Anal. 39 (2017) 178193. Neurocomputing 187 (2016) 2748.

Applications of Artificial Intelligence in Medical Imaging


264 9. Detection and classification of Diabetic Retinopathy Lesions using deep learning

[34] L. Deng, D. Yu, Deep learning: methods and applica- [40] H. Greenspan, B. van Ginneken, M. Ronald, Summers
tions, Found Trendss Signal Process 7 (34) (2014) Guest editorial deep learning in medical imaging:
197387. Overview and future promise of an exciting new tech-
[35] S.C. Lo, M.T. Freedman, J.S. Lin, S.K. Mun, Automatic nique, IEEE Trans. Med. Imaging 35 (5) (2016)
lung nodule detection using profile matching and 11531159. Available from: https://doi.org/10.1109/
back-propagation neural network techniques, J. Digit. tmi.2016.2553401.
Imaging 6 (1993) 4854. [41] R. Raman, S. Srinivasan, S. Virmani, S. Sivaprasad, C.
[36] M.L. Astion, P. Wilding, The application of back- Rao, R. Rajalakshmi, Fundus photograph-based deep
propagation neural networks to problems in pathol- learning algorithms in detecting diabetic retinopathy,
ogy and laboratory medicine, Arch. Path. Lab. Med. Eye 33 (2019) 97109.
116 (1992) 9951001. [42] R. Rajalakshmi, R. Subashini, R.M. Anjana, V. Mohan,
[37] Y. Wu, M.L. Giger, K. Doi, C.J. Vyborny, R.A. Automated diabetic retinopathy detection in
Schmidt, C.E. Metz, Artificial neural networks in smartphone-based fundus photography using artificial
mammography: application to decision making in intelligence, Eye 32 (2018) 11381144.
the diagnosis of breast cancer, Radiology, 187, 1993, [43] W. Zhang, et al., Automated identification and grad-
pp. 8187. ing system of diabetic retinopathy using deep neural
[38] S.E. Spenceley, D.B. Henson, D.R. Bull, Visual field networks, Knowl. Base Syst. 175 (2019) 1225.
analysis using artificial neural networks, Ophthal.
Physiol. Opt. 14 (1994) 239248.
[39] Q. Abbas, M.E.A. Ibrahim, M.A. Jaffar, Video scene Further reading
analysis: an overview and challenges on deep learning
algorithms, Multimed. Tools Appl. 77 (16) (2018) W.L. Alyoubi, W.M. Shalash, M.F. Abulkhair, Diabetic reti-
2041520453. Available from: https://doi.org/ nopathy detection through deep learning techniques: a
10.1007/s11042-017-5438-7. review, Inform. Med. Unlocked 20 (2020) 10037.

Applications of Artificial Intelligence in Medical Imaging


C H A P T E R

10
Automated detection of colon cancer
using deep learning
Aayush Rajput1 and Abdulhamit Subasi2,3
1
Indian Institute of Technology, Kharagpur, West Bengal, India 2Institute of Biomedicine, Faculty of
Medicine, University of Turku, Turku, Finland 3Department of Computer Science, College of
Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

10.1 Introduction 265 10.4.1 Feature extraction using deep


learning 271
10.2 Literature review 267
10.4.2 Dimension reduction 272
10.3 Artificial intelligence for colon cancer 10.4.3 Prediction and classification 272
detection 268 10.4.4 Experimental data 274
10.3.1 Artificial neural networks 269 10.4.5 Performance evaluation measures 274
10.3.2 Deep learning 269 10.4.6 Experimental results 274
10.3.3 Convolutional neural networks 270
10.5 Discussion 280
10.4 Disease detection using artificial
10.6 Conclusion 280
intelligence 271
References 280

10.1 Introduction they became cancerous. In the initial phase,


they are not cancerous. The small polyps can
Colon cancer is a type of cancer that affects be removed easily in the initial stage to avoid
and starts from the large intestine end of the the threat of cancer. The average age of men
digestive tract. In starting phase of colon can- and women diagnosed with colon cancer is 68
cer, clumps of cells begin to grow on the inner and 72, respectively [1]. Although it can occur
side of the large intestine. As these clumps of in young persons also, most people having
cells called polyps grow and multiply further, colon cancer are older than 50. The probability

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00001-3 265 © 2023 Elsevier Inc. All rights reserved.
266 10. Automated detection of colon cancer using deep learning

of development of colon cancer is slightly more colon cancer, a small tissue from the affected
in men than women. One man among 23 and region’s large intestine is removed from the
one woman in every 25 have colon cancer in patient’s body and examined under a micro-
the United States [2]. Colorectal cancer is the scope by a pathologist [6]. The biopsy is capa-
term used for colon and rectal cancer collec- ble of making a definite diagnosis of colon
tively. Colorectal cancer is the third most com- cancer. Colonoscopy is a method performed by
mon type of cancer, excluding the skin cancers the colonoscopist in which a colonoscopist
in the United States, with 104,270 cases of colon checks inside the colon in the patient’s body.
cancer and 45,230 cases of rectal cancer in 2021 During the colonoscopy, if cancer is detected,
[2]. People with a family history of colon can- then the tumor has to be removed surgically
cer have more risk of developing it, especially from the site for a complete diagnosis. In CT
when any close family member below the age scans, X-rays are used to produce a 3D image
of 60 is diagnosed with colon cancer. Other sit- of the human body, and sometimes a die is
uation can also increase the risk of developing also used for more precise results [7]. The die
colon cancer, such as people who do not do can be given to the patient by injecting directly
much physical activity, overweight, smokers, into the veins or can be taken as a pill.
people having a history of any other type of From the resulting image of the CT scan,
cancer such as ovarian cancer or uterine cancer, abnormalities can be detected. In colon cancer,
and people having adenomas [1]. internal bleeding in the large intestine occurs.
Sometimes it is possible that a person hav- The person becomes anemic, so a blood test is
ing colon cancer does not show any symptoms done to detect the blood count; a lower value of
in the initial stage of development of cancer blood count shows the presence of internal
[3]. The symptoms of colon cancer include a bleeding and hence the colon cancer. Blood tests
frequent change in bowel habits, diarrhea, the are also done to detect the carcinoembryonic
feeling of bowel does not empty, constipation, antigen level in blood. A higher value of carci-
thinner stools than usual, tiredness or fatigue, noembryonic antigen (CEA) indicates the
anemia, blood in stool [3]. A person should spread of cancer to other parts of the body. MRI
immediately consult a doctor if any of the works similarly to the CT scan. Only the differ-
symptoms last for a long time regardless of age ence is that here magnetic rays are used to
because colon cancer can develop at any age. make the 3D image of the body, and the size of
The most common type of colon cancer is the tumor can be detected. The treatment of
adenocarcinoma colon cancer [4]. Benign colon colon cancer has physical, emotional, social, and
cancer is the initial stage of colon cancer which financial effects on the patient. These effects can
can be treated easily and is not life-threatening. vary from person to person for the same treat-
Metastasis is when cancer develops in one part ment for the same type of cancer. To cope with
of body and spread to other parts [5]. Doctors the side effect of the treatment, various things
use many types of tests for the diagnosis of need to be done. After the detection of a cancer,
colon cancer. For checking metastasis, tests are patient can have sadness, anxiety, or anger.
also done. The test used by the doctor depends Taking help from the counselor can relieve the
on the various factors and symptoms seen in patient from the emotional side effect of the
the patient. The first physical examination of treatment. The treatment of colon cancer is
the patient is done. Then tests such as biopsy, costly, due to which sometimes patients cannot
colonoscopy, computed tomography (CT), bio- take complete treatment and put their life at
marker testing, blood test, magnetic resonance risk. Any financial problem should be consulted
imaging (MRI) are used. In a biopsy to detect with the supportive team. Family and friends

Applications of Artificial Intelligence in Medical Imaging


10.2 Literature review 267
play an essential role in dealing with the side Song et al. [9] used 411 slides from the
effects of colon cancer treatment. They also help Chinese People’s Liberation Army General
in providing emotional support to the patient. Hospital, of which 232 slides were of colorectal
This study aims to use the power of artificial adenomas and 179 of normal mucosa. They used
intelligence and deep learning (DL) to detect the deep learning model based on DeepLab v2
the type of colon cancer (benign or adenocarci- with ResNet-34 with the addition of skip layer
noma) from the tissue images like in biopsy. fusion approach in which the higher level fea-
Using DL, the time can be reduced, and the tures extracted by the model are combined with
speed to detect cancer from tissue images can the lower level features. The evaluation metrics
be increased to a large extent. The manual pro- on which the model’s performance was mea-
cess of doing biopsy took a lot of time of a sured were accuracy, sensitivity, and specificity.
skilled doctor, and that also can sometimes give They compared their model results with the
wrong results that can have a very negative other DL model architectures, ResNet50,
effect on the patient as the treatment has many DenseNet, InceptionV3, U-Net, and DeepLab
side effects. DL will help doctors to determine v3. These models gave an accuracy score of
colon cancer in a faster way accurately. 89.8%, 87.8%, 90.3%, 77.7%, and 88.3%, while
the improved DeepLab v2 got an accuracy score
of 90.4%. The models were tested on the 194
10.2 Literature review cases which were not used for training them.
Ribeiro et al. [10] trained different models to
DL is gaining the attention of many researchers classify the image as nonneoplastic and neoplas-
in computer vision, where objects are detected tic. The database used in the study was an
with images. As convolutional neural networks endoscopic image database with eight different
(CNNs) are excellent in detecting objects from imaging modalities acquired by an HD endo-
images, in the field of medical science, DL is bene- scope with an image size of 256 3 256 pixels.
ficial in many situations. Various research have They first trained the CNN model from scratch
been done to detect diseases such as cancer from and got average accuracy of 79%, which is not
the images. very good. Then the hyperparameters of the
Kim et al. [8] used semantic gland segmenta- model were tuned, such as changing the size
tion from morphology images to detect colon and number of filters. Five different models
cancer. They detected polyp-causing cancer with different hyperparameters were used. The
from the images. They proposed a method in best result got from tuning the hyperparameter
which features are extracted from diagnostic was an 89% accuracy score. In the next experi-
tests are given as input to the DL network with ment, they trained CNN for a particular group
the semantic segmentation algorithm. They of databases using leave one out cross-
used different architecture for detecting the validation with a different stride; the result
cancer-causing polyp. CNN, SegNet, U-Net, showed that there is not much difference in the
and FCN are used, and the U-Net architecture increasing the stride for an image. In further
gave the highest intersection over union score experiments, transfer learning is used with dif-
of 92%. Various metrics evaluate predicted ferent data augmentations, and the results
images with the actual label, particularly dice showed that features extracted from CNN and
index, sensitivity, and Hausdorff distance. VGG-16 provided better feature descriptors of
CNN, SegNet, and FCN gave the 74.80% sensi- colonic polyps.
tivity score, 86.36% dice score, and 62.7% mean Walker et al. [11] used the 307 colorectal
intersection over union score. cancer-related digital slides from St. Paul’s

Applications of Artificial Intelligence in Medical Imaging


268 10. Automated detection of colon cancer using deep learning

Hospital. Of these slides, 85 were from normal machine (SVM) classifier and a fine-tuned pre-
colorectal tissue, 222 slides were colorectal can- trained model fixing the weights of lower level
cer, 275 slides were used for training, and the blocks were used. The three techniques gave an
remaining for testing. Pretrained InceptionV3 accuracy score of 90.37%, 96.46%, and 96.82%.
architecture was used with the input size as Li et al. [14] used DL methods to classify colo-
299 3 299 pixels. The training was done to 20 rectal cancer lymph node metastasis images.
epochs with a batch size of 92 and a learning Different ML techniques were used for the task,
rate 3 3 1024. The overall classification accu- and their result was compared. Data used in the
racy and Receiver operating characteristic study was taken from the Harbin Medical
(ROC) score were 95.1% and 99%, respectively. University Cancer Hospital; total data consist of
The image segmentation was also done. The 1646 positive and 1718 negative samples. Features
prediction performance across all the slides on used for training the models are extracted by
the independent dataset has the mean accuracy three techniques: gray-level histogram, textural
score, specificity score, sensitivity score, and features by the gray-level cooccurrence matrix,
dice score of 87.8%, 90.0%, 85.2%, and 87.2%. and the scale-invariant feature transform. The pre-
Hornbrook et al. [12] used machine learning trained CNN model AlexNet is also used. CNN
(ML) tools for the early detection of colorectal model LeNet and AlexNet are used with full
cancer using gender, age, and complete blood training, classical ML models AdaBoost, Decision
count data. The data consist of 900 colorectal can- Tree, KNN, Logistic Regression, and multilayer
cer cases and 9108 no cancer cases. The data was perceptron, Naive Bayes (NB), Stochastic gradient
taken from the Kaiser Permanente Northwest descent (SGD), and SVM are used. The evaluation
Region’s Tumour Registry. The performance of metrics used for comparing the model’s perfor-
the model was evaluated using the specificity, mance are accuracy, AUC, sensitivity, specificity,
Area under the ROC curve (AUC), and odds positive predictive value, and negative predictive
ratio. The model gave the area under curve score value. The pretrained AlexNet model outper-
of 80% with 99% specificity. They used the formed the other model giving accuracy, AUC,
ColonFlagR model for the detection of undiag- sensitivity, specificity, Positive Predictive Value
nosed colorectal cancer. The study showed that (PPV), and Negative Predictive Value (NPV)
ColonFlag identifies individuals 10 3 higher risk scores of 75.83%, 79.41%, 80.04%, 79.97%, 79.92%,
of undiagnosed colon cancer and is more accu- and 80.09%, respectively.
rate at identifying right-sided colorectal cancers.
Ponzio et al. [13] proposed a DL technique
using CNNs to detect adenocarcinoma from
healthy and benign lesions. The data used was 10.3 Artificial intelligence for colon
taken from a public repository of H&E stained cancer detection
whole-slide images available on the website of
University of Leeds Virtual Pathology Project. DL is a very useful tool in medical science
The total data consist of 13,500 patches, from that can significantly affect many ways, like giv-
which 9000 were used for training and the rest ing accurate and fast predictions. Normally
4500 for testing. They also used a pretrained detection of colon cancer from the tissue slides
model, VGG16, and principal component analy- takes a lot of time for a skilled doctor to detect a
sis (PCA) was used for the feature reduction cancer-causing tumor, which can be avoided
technique. A fully trained CNN model and a using DL. The important techniques used in this
pretrained VGG16 model with support vector study are explained in the following sections.

Applications of Artificial Intelligence in Medical Imaging


10.3 Artificial intelligence for colon cancer detection 269

10.3.1 Artificial neural networks is computationally more expensive, so it is


avoided in most cases. It is defined as
Artificial neural network (ANN) is a brain
sigðxÞ 5 1 11e2x [17].
neuron-inspired model used for processing the
It has the range of 01 and is not zero cen-
input to give a prediction based on the input.
tered and causes a vanishing gradient problem.
Neural networks are based on the radial basis ðy Þ
function. ANN can also learn to detect the fea- Softmax is defined as SðxÞi 5 Pne i ðyj Þ It is simi-
e
ðj51Þ
tures in an image. ANN consists of one input lar to sigmoid and is used in the output layer of
and output layer with hidden layers in a multiclass classification model. It has a range
between them. While training an ANN, infor- from 0 to 1. Tanh is defined as tanh(x) 5 tan21x
mation is feed into the forward and backward and is zero centered but have other problem
direction. In the forward propagation, the such as vanishing gradient and is computation-
values of each node of previous layers are mul- ally expensive. ReLU is the most widely used
tiplied by its respective weight. Then the sum activation function among the other functions, it
of these products is passed to another function is defined as f(x) 5 max (0, x) [18]. It is computa-
known as activation function; the output of tionally inexpensive and does not have the
this function is the value of the node of the problem of vanishing gradient. It is mostly used
next element; this process is repeated for every in CNN models. The deeper an ANN is, the
node of the next layer. In ANN, every node of more features it can detect from the data can
a layer is connected to every node of the next give better results. Still, sometimes it can lead to
layer; this connection from one node to the overfitting the training data so ANN with dif-
node of the next layer has its value called its ferent hidden layers should be trained. The
weight. In backpropagation, complex mathe- model giving the best results of the validation
matical calculations are done, which change data should be selected.
the weights to make more accurate predictions
[15]. The backpropagation method was derived
in 1960 by Seppo Linnainmaa; however,
Rumelhart et al. [16] showed how it could be
10.3.2 Deep learning
used to train the ANN models. Each layer also DL is a subset of ML. It is used for a com-
has a bias node, which helps the model better plex task on which classical ML algorithms
fit the training data. The activation function is cannot give good results. ANNs are the core of
associated with each layer of ANN; it also any DL model. The algorithms used in this
helps the model fit the data better. Activation study are DL algorithms. Classical ML models
functions keep the input value for a node in cannot fit the nonlinear data as the ANN can,
the layers after the input layer to a fixed range. but the data required to train DL models is
It also makes the model capable of fitting into more. With the increase in the amount of data,
nonlinear data. An activation function should ANNs perform better than classical ML mod-
be computationally inexpensive. Otherwise, els. The only disadvantage of using ANN is
it will increase the training time and take that they are computationally expensive.
more memory for the model to train; it Medical science data used to be more complex,
should also be symmetric about zero, differ- so classical ML models cannot get good results.
entiable function, and vanishing gradient. DL also gives the ability to combine two or
Some of the famous activation functions are more models to get better results [19]. Transfer
sigmoid, softmax, tanh, and ReLU. Sigmoid learning is a way through which the huge DL

Applications of Artificial Intelligence in Medical Imaging


270 10. Automated detection of colon cancer using deep learning

models trained for weeks by the people having a small portion of the layer is used for getting
best resources can be used directly by any the value of one neuron of the next layer. In
other person for a different problem. These CNN, no feature engineering is done to get
pretrained models give very good results in a good results CNN takes care of the features
much shorter time because they already have itself while using classical ML; a lot of work
the best weights and do not need to be chan- has to be done to get the important features.
ged [20]. The initially hidden layers of CNN are respon-
The results using transfer learning are gen- sible for detecting the low-level features in the
erally better than training a DL model from image, and high-level features are detected
scratch. For the computer vision tasks, CNNs using the farther hidden layers. The feature
outperform the other algorithms. In this study, detection in CNN is done by the filters associ-
CNN models are trained from scratch, pre- ated with each layer; a layer can have many fil-
trained models and pretrained models with ters [22]. A filter is a matrix that moves over a
classical ML algorithms are used, and their CNN layer, and the value of its weights has
results are compared. Besides computer vision, multiplied the value of the neurons to get an
DL is used in many other fields such as natural output value for the next layer’s neuron. This
language processing, visual recognition, rec- operation is called the convolution operation.
ommendation systems, self-driving cars, etc. The pooling layer is used for reducing the size
[21]. Today, we have enough data resources to of a layer to reduce the computation cost.
train these DL models that take more data than There are no weights associated with filters in
other ML models. DL has a direct effect on these layers. Max pooling and average pooling
people’s lives. DL in medical science has made are the two types of pooling mainly used [23].
it easier for people to get accurate and faster The maximum value from a portion of layer on
results. From getting recommendations on which filter is applied is taken in max pooling.
social media to robotics everywhere, DL is In average pooling takes the average of the
used. Many research are going on DL to make portion on which kernel is applied is taken.
it better. During the convolution operation, the pixel in
the middle will affect more neurons of the out-
put layer than the corners. It can lead to loss of
10.3.3 Convolutional neural networks information present in the corner pixels; to
CNN is a type of ANN which can take a 3D avoid this problem, padding is done. Padding
array as input and give the required results. is adding zeros to the convolution layer [24].
Every image is a 3D array with each pixel as This can keep the shape of the next layers the
the value so that images can be given as input same as the previous layer—this kind of pad-
in CNN. CNN can detect the features in an ding is called the same padding. Valid padding
image using the filters. During the training, the means no padding at all. A filter of CNN has
weights of these filters are changed to get a the property of stride; it defines the movement
minimum possible error in the true and pre- of a filter after doing one convolution opera-
dicted values. Like ANN, in CNN, there is an tion. If the stride of a filter is n, it means that
input layer, an output layer, and in between after doing one convolution operation, it will
input and output layers, there are hidden move n pixels to reach its new position. These
layers. In CNN, each neuron or unit is not con- all factors are to be given by the user designing
nected to every neuron of the next layers. Only the CNN.

Applications of Artificial Intelligence in Medical Imaging


10.4 Disease detection using artificial intelligence 271
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout,
Input
from keras.layers.normalization import BatchNormalization
from keras.models import Model, Sequential

inp = Input(shape = (128, 128, 3))


model = BatchNormalization()(inp)
model = Conv2D(filters = 64, kernel_size = (3, 3), padding = 'same',
activation='relu')(model)
model = BatchNormalization()(model)
model = Conv2D(filters = 128, kernel_size = (3, 3), padding = 'same',
activation='relu')(model)
model = MaxPooling2D()(model)
model = Dropout(0.2)(model)
model = Conv2D(filters = 128, kernel_size = (3, 3), padding = 'same' ,
activation='relu')(model)
model = MaxPooling2D()(model)
model = Dropout(0.2)(model)
model = Conv2D(filters = 64, kernel_size = (3, 3), padding = 'same',
activation='relu')(model)
model = MaxPooling2D()(model)
model = Dropout(0.2)(model)
model = Flatten()(model)
output = Dense(units = len(train_it.class_indices), activation =
'softmax')(model)

model = Model(inputs=inp, outputs=output)

10.4 Disease detection using artificial


But with the development of DL and CNN, the
intelligence
feature extraction is done automatically by the
CNN. In a CNN, filters are used to extract the
10.4.1 Feature extraction using deep
features from a layer and pass them to the next
learning layers. The filters of the beginning layers detect
Feature extraction is a technique of converting lower level features while the filters of ending
the original data to a more useful and lower size. layers extract the high-level features. Better the
Before developing DL, various techniques were weights of filters of the model, better the results
used to extract important features from the origi- it will give. A large CNN model is computation-
nal data, which was very time-consuming and ally very expensive to get the best weights of the
required expertise. Redundant and unnecessary complex data. To avoid this computation cost,
features can decrease the performance model. pretrained models are used for extracting the

Applications of Artificial Intelligence in Medical Imaging


272 10. Automated detection of colon cancer using deep learning

features from the data. The pretrained models reducing the size. After applying PCA, data
already have the best weights, which can detect can be converted to original dimensions, but
important features in the data; there is no need that will not be the same as the original data.
to change the weights of the pretrained model. PCA will project the data on a hyperplane,
In this study pretrained CNN architectures and the value after the project will be the
VGG16 [25], VGG19 [25], ResNet50 [26], value of data. This is the most widely used
ResNet101 [26], MobileNetV2 [27], MobileNet and older dimension reduction algorithm. In
[27], InceptionV3 [28], InceptionResNetV2 [29], LDA, linear combinations of input columns
DenseNet169 [30], DenseNet121 [30], and are calculated in a way that the calculated
Xception [31] are used as feature extractor and features of a particular class are extracted
then feature extracted using these models are from another class of data. Neural autoenco-
then flattened and used for training the classical ders convert the data to a lower size by
ML model. The time taken to train models by removing the noisy and redundancy in it. T-
the extracted features is very less than training SNE is a recent dimension reduction tech-
the model from scratch. The performance of pre- nique. It is a nonlinear technique which maps
trained models is also very good. the data to a lower dimensional space where

def get_features(base_model, train, validate):


X_train = base_model.predict(train)
y_train = train.classes
X_val = base_model.predict(validate)
y_val = validate.classes

X_val, X_test, y_val, y_test = train_test_split(X_val,y_val,test_size=0.5,shuffle=True)


return (X_train, X_val, X_test, y_train, y_val, y_test)

10.4.2 Dimension reduction the similarity of data point to its neighbors is


Dimension reduction reduces the size of maintained. This is also used for visualizing
data without losing much important informa- the data
tion of the data. In ML, various algorithms from sklearn.decomposition import PCA
are used for reducing the size of data. PCA
pca = PCA(n_components=7000)
[32], linear discriminant analysis (LDA) [33],
X_train = pca.fit_transform(X_train)
neural autoencoder [34], and t-distributed
X_val = pca.transform(X_val)
stochastic neighbor embedding (t-SNE) [35]
model.fit(X_train, y_train)
are some dimension reduction algorithms. In
val_pred = model.predict(X_val)
this study, PCA is used for reducing the
dimension of the features extracted by the
.
pretrained models. In PCA, data is projected
on a plane on which the variance of the data
is nearest to the original data. As PCA is a
10.4.3 Prediction and classification
distance-based algorithm, data should be Every ML task can be classified into two types:
normalized before performing the PCA for one is prediction and another is classification. In

Applications of Artificial Intelligence in Medical Imaging


the prediction task, continuous-valued functions particular class. The sum of the values predicted
are used to assign a value to an instance of data. In by the softmax layer is equal to one. It can also be
contrast, classification is the task of assigning a par- applied for a binary classification task. The task
ticular instance of data to a class label. In predic- here is binary classification. The number of units
tion, only one neuron is present in the output layer in the last or output layer will be equal to the num-
of CNN. The activation function used in the pre- ber of classes in data. The output value given by
diction task is linear or ReLU. The activation func- the model will be an array having n values.
tion used for predicting the class in a multiclass Classical ML algorithms KNN [36], SVM [37],
classification is the softmax activation function. It Random Forest [38], AdaBoost [39], and XGBoost
predicts the probability of a sample to belong a [40] can be used for multiclass classification.

from keras.models import Sequential


from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from xgboost import XGBClassifier

def get_models():
ANN = Sequential()
ANN.add(Dense(128, input_dim = X_train.shape[1], activation = 'relu'))
ANN.add(BatchNormalization())
ANN.add(Dropout(0.2))
ANN.add(Dense(64, activation='relu'))
ANN.add(Dense(32, activation='relu'))
ANN.add(Dense(16, activation='relu'))
ANN.add(Dense(8, activation='relu'))
ANN.add(Dense(len(train_it.class_indices), activation='softmax'))

ANN.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
KNN = KNeighborsClassifier()

SVM = SVC(kernel = 'linear')

RF = RandomForestClassifier(n_estimators = 50)

ADB = AdaBoostClassifier()

XGB = XGBClassifier(n_estimators = 50, use_label_encoder=False)

return (ANN, KNN, SVM, RF, ADB, XGB)

def fit_model(model, X_train, y_train):


model.fit(X_train, y_train)
return model
274 10. Automated detection of colon cancer using deep learning

10.4.4 Experimental data of the data is used for training, and for each vali-
dation and testing, 15% of data is used.
The data used in this study is taken from the
images generated from the original sample of
Health Insurance Portability and Accountability 10.4.6 Experimental results
Act (HIPAA) compliant and validated sources,
First, the CNNs with different layers are
consisting of 500 total images of colon tissue
trained from scratch, and then the pretrained
and augmented to 10,000 using the augmenter
CNN architectures are used for the task. The
package [41]. There are two classes present in
weights of pretrained models are fixed and
the data: colon adenocarcinoma and benign
fully connected layer is the attached as the last
colon tissue. Each has 5000 images. This dataset
layer of pretrained CNN, which takes the flat-
is available publicly on Kaggle as colon cancer
tened output from the CNN as its input and
histopathological images [42]. Each image is in
two neurons at last with softmax activation
jpeg file format, and the size of each image is
function for predicting the class an instance
768 3 768 pixels. The images were reshaped to
belongs (Fig. 10.1). The results of CNN models
128 3 128 pixels to reduce the computational
with different layers are given in (Table 10.1).
cost and training time.
The results obtained by training the whole
CNNs are not very good. They are not able to
classify the images correctly and giving just ran-
10.4.5 Performance evaluation measures
dom results. This is quite expected because the
For measuring the model’s performance on the images are complex and simple CNNs do not
data, various metrics can be used as accuracy have the ability to give the results.
scores can sometimes be misleading. High accu- Table 10.2 clearly shows that the performance
racy scores do not always imply that the model is of the pretrained models is far better than the
accurate. Sometimes, the imbalanced model can own designed and trained CNN models. This is
predict a certain class and get a high accuracy because the data is very complex, and it needs a
score. Thus, to avoid this situation, training accu- huge network and good weights to classify image
racy score, validation accuracy score, test accuracy classes correctly. In the above table, ResNet50 is
score, F1 score, Cohen kappa score, ROC AUC giving the best results, followed by VGG16,
score, recall, and precision score are calculated. ResNet101, and VGG19 with an F1 score of
Based on these scores, best model is selected. 70% 99.8%, 99.33%, 99.26%, and 99.06%, respectively.

Benign Tissue

Adenocarcinoma

Histopathological
Colon Tissue

FIGURE 10.1 The general framework for Histopathological image classification using a CNN model. CNN,
Convolutional neural network.

Applications of Artificial Intelligence in Medical Imaging


10.4 Disease detection using artificial intelligence 275
TABLE 10.1 Results of different convolutional neural network (CNN) models.
Classifier Training accuracy Validation accuracy Test accuracy F1 score KAPPA ROC score

CNN 2 Layer 0.67343 0.67533 0.64267 0.60878 0.28869 0.78553


CNN 3 Layer 0.5 0.502 0.498 0.33111 0 0.5
CNN 4 Layer 0.5 0.498 0.502 0.33556 0 0.5
CNN 5 Layer 0.5 0.496 0.50333 0.33749 2 0.00133 0.49868
CNN 6 Layer 0.91386 0.90667 0.91133 0.91115 0.82262 0.97803
CNN 7 Layer 0.50 0.50 0.50267 0.3363 0 0.5

CNN 8 Layer 0.5 0.504 0.496 0.3289 0 0.65413

TABLE 10.2 Results of different transfer learning models.


Classifier Training accuracy Validation accuracy Test accuracy F1 score KAPPA ROC score

ResNet50 0.99929 0.998 0.998 0.998 0.996 0.99933


VGG16 0.99957 0.99067 0.99333 0.99333 0.98666 0.99656
VGG19 0.99757 0.988 0.99067 0.99067 0.98131 0.99516
Inception_v3 0.67986 0.70267 0.66867 0.63419 0.35282 0.69598
MobileNet 0.98671 0.934 0.932 0.93183 0.86407 0.9883
DenseNet169 0.86757 0.85067 0.876 0.87402 0.75183 0.96002

DenseNet121 0.98157 0.966 0.96467 0.96467 0.92936 0.99618


InceptionResNetV2 0.62929 0.62267 0.64667 0.64667 0.29329 0.66052
MobileNetV2 0.88443 0.86667 0.85933 0.85691 0.71828 0.96966
ResNet101 1 0.992 0.99267 0.99267 0.98532 0.99923

Then pretrained models are used with clas- also nearly same as the VGG16. The best
sical ML algorithms. In this method, pretrained result is obtained by the VGG19 is with the
models are used as a feature extractor, and the SVM model with test accuracy of 99.66%
output is flattened and fed to the classical ML and worst result are obtained with the KNN
models as input (Fig. 10.2). model with test accuracy of 88.26%
Table 10.3 shows that VGG16 is giving very (Table 10.4).
good results on the dataset. The best result is The results obtained by using ResNet50 are
given by the VGG16 as feature extractor fol- very accurate, with any ML model the test
lowed by SVM with test accuracy of 99.73% accuracy is around 96%. The best result with
and the worst result is given by the KNN with ResNet50 is given by the SVM with test accu-
test accuracy of 84.8%. racy of 99.86% and worst is given by KNN
VGG19 achieved very closely similar with test accuracy of 95.93%, which is similar
results as VGG16 model so the results are to the previous case (Table 10.5).

Applications of Artificial Intelligence in Medical Imaging


276 10. Automated detection of colon cancer using deep learning

ANN
K-NN
SVM
RF
Adaboost
XGBoost

Histopathological
Colon Tissue

Dimension
Deep Feature Extraction Classification
Reduction

FIGURE 10.2 The general framework for Histopathological image classification using deep feature extraction.

TABLE 10.3 Performance of VGG16 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.99943 0.99533 0.99667 0.99667 0.99333 0.99667 0.99667


KNN 0.896 0.848 0.848 0.8469 0.69517 0.848 0.87796
SVM 1 0.99467 0.99733 0.99733 0.99467 0.99733 0.99733
Random Forest 1 0.988 0.986 0.986 0.97199 0.986 0.98607
AdaBoost 0.99286 0.97933 0.984 0.984 0.968 0.984 0.984
XGBoost 1 0.99333 0.98733 0.98733 0.97466 0.98733 0.98737

TABLE 10.4 Performance of VGG19 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.99957 0.99333 0.99267 0.99267 0.98533 0.99267 0.99274


KNN 0.904 0.894 0.88267 0.88105 0.76482 0.88267 0.90271

SVM 1 0.99467 0.99667 0.99667 0.99333 0.99667 0.99667


Random Forest 1 0.98267 0.97333 0.97332 0.94665 0.97333 0.97373
AdaBoost 0.98857 0.97867 0.97733 0.97733 0.95466 0.97733 0.97733
XGBoost 1 0.99 0.97933 0.97933 0.95865 0.97933 0.97958

Applications of Artificial Intelligence in Medical Imaging


10.4 Disease detection using artificial intelligence 277
TABLE 10.5 Performance of ResNet50 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.99929 0.99667 0.99533 0.99533 0.99067 0.99533 0.99536


KNN 0.979 0.96867 0.95933 0.95933 0.91867 0.95933 0.95952
SVM 1 0.998 0.99867 0.99867 0.99733 0.99867 0.99867
Random Forest 1 0.98267 0.976 0.976 0.95199 0.976 0.97603
AdaBoost 0.99686 0.98267 0.97867 0.97867 0.95733 0.97867 0.97867
XGBoost 1 0.99267 0.98533 0.98533 0.97066 0.98533 0.98536

TABLE 10.6 Performance of ResNet101 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.99957 0.99067 0.99067 0.99067 0.98133 0.99067 0.9907


KNN 0.977 0.95533 0.95067 0.95066 0.90135 0.95067 0.95107
SVM 1 0.99267 0.997 0.994 0.988 0.994 0.994

Random Forest 1 0.97133 0.97133 0.97133 0.94265 0.97133 0.97147


AdaBoost 0.99057 0.96867 0.968 0.968 0.936 0.968 0.96801
XGBoost 1 0.984 0.98733 0.98733 0.97466 0.98733 0.98733

TABLE 10.7 Performance of MobileNetV2 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.98929 0.93667 0.944 0.94397 0.88795 0.944 0.94452


KNN 0.929 0.87533 0.89267 0.8925 0.78517 0.89267 0.89457
SVM 1 0.92267 0.92267 0.92267 0.84532 0.92267 0.92267
Random Forest 0.99986 0.87267 0.87267 0.87265 0.7454 0.87267 0.87319
AdaBoost 0.892 0.84067 0.87333 0.87334 0.74667 0.87333 0.87339
XGBoost 1 0.91 0.90933 0.90933 0.81866 0.90933 0.90935

ResNet101 achieved similar results as the accuracy given by this is with ANN model
ResNet50 model with deeper network. The with test accuracy value of 94.4% and worst is
results are also similar to the results of given with AdaBoost with a value of 87.33%
ResNet50 model. Here best test accuracy is which is quite low (Table 10.7).
given by the SVM with a value of 99.7% and MobileNet is the earlier version of
worst test accuracy is given by KNN model MobileNetV2 model and is similar to the V2 of
with test accuracy of 95.06% (Table 10.6). MobileNet. Therefore the results are also simi-
The performance of MobileNetV2 model is lar of this model’s best test accuracy score is
not as good as VGG or ResNet. The best 95.9% with ANN model and worst accuracy

Applications of Artificial Intelligence in Medical Imaging


278 10. Automated detection of colon cancer using deep learning

TABLE 10.8 Performance of MobileNet deep features extraction with different machine learning models.
Model Training accuracy Val accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.99357 0.95467 0.95933 0.95933 0.91865 0.95933 0.95939


KNN 0.911 0.86 0.858 0.85668 0.71651 0.858 0.87334
SVM 1 0.948 0.95667 0.95667 0.91334 0.95667 0.95677
Random Forest 1 0.87533 0.874 0.87386 0.74816 0.874 0.87626
AdaBoost 0.913 0.866 0.886 0.886 0.77202 0.886 0.88618
XGBoost 1 0.91933 0.92333 0.92333 0.84669 0.92333 0.92369

TABLE 10.9 Performance of InceptionV3 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.90143 0.882 0.88667 0.88648 0.77316 0.88667 0.88861


KNN 0.91 0.852 0.86533 0.86515 0.73048 0.86533 0.86676
SVM 1 0.91067 0.88867 0.88867 0.77733 0.88869 0.88869
Random Forest 1 0.86667 0.86667 0.86664 0.73341 0.86667 0.8673
AdaBoost 0.85271 0.826 0.82333 0.82334 0.64666 0.82333 0.82336

XGBoost 0.90143 0.882 0.88667 0.88648 0.77316 0.88667 0.88861

TABLE 10.10 Performance of InceptionResNetV2 deep features extraction with different machine learning models.

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.59686 0.602 0.584 0.51663 0.17377 0.584 0.70005


KNN 0.791 0.674 0.69667 0.69563 0.39267 0.69667 0.69868

SVM 0.82086 0.76867 0.76 0.76 0.51995 0.76 0.76


Random Forest 0.99986 0.734 0.74 0.73999 0.48007 0.74 0.7402
AdaBoost 0.74829 0.70467 0.69733 0.69731 0.39455 0.69733 0.69733
XGBoost 0.99129 0.75267 0.72467 0.72467 0.44929 0.72467 0.72467

score is with AdaBoost with test accuracy score not very good. The best test accuracy is given
of 88.6% which is similar to the MobileNetV2 by InceptionV3 is with SVM with value of
(Table 10.8). 88.86% and worst is AdaBoost with a value of
InceptionV3 is developed by the Google and 82.33% (Table 10.9).
is upgraded version of InceptionV1. This is not This model is inspired by the Inception
suitable for the colon dataset as the results are and ResNet architecture. It has 164 layers and

Applications of Artificial Intelligence in Medical Imaging


10.4 Disease detection using artificial intelligence 279
can classify up to 1000 different classes on a of test images correctly with SVM. The worst test
single dataset. But the result of this model is accuracy is given with the AdaBoost with value
not good highest test accuracy is reached of 93.66% (Table 10.11).
with the SVM with value of 76%, while the DenseNet121 is a simpler version of
worst value is given with KNN with test DenseNet169. The results obtained by this model
accuracy value of 69.66% which are very low are worse than DenseNet169. The best accuracy
(Table 10.10). is given with the SVM model with value of
The results obtained with DenseNet169 are 98.86% and worst accuracy is given with
impressive. This model is able to classify 99.13% AdaBoost with value of 92.46% (Table 10.12).

TABLE 10.11 Performance of DenseNet169 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.93514 0.92267 0.92067 0.92014 0.84109 0.92067 0.93071

KNN 0.969 0.94733 0.96133 0.96133 0.92266 0.96133 0.96134

SVM 1 0.98933 0.99133 0.99133 0.98267 0.99133 0.99138

Random Forest 1 0.972 0.954 0.954 0.90802 0.954 0.95444

AdaBoost 0.97314 0.95133 0.93667 0.93667 0.87332 0.93667 0.93667

XGBoost 1 0.98067 0.974 0.974 0.948 0.974 0.97415

TABLE 10.12 Performance of DenseNet121 deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.98914 0.97267 0.97067 0.97066 0.94135 0.97067 0.97154

KNN 0.956 0.93667 0.93733 0.93731 0.87471 0.93733 0.93849

SVM 1 0.98867 0.98867 0.98867 0.97733 0.98867 0.98869

Random Forest 1 0.94533 0.93933 0.93933 0.87868 0.93933 0.93952

AdaBoost 0.94871 0.92933 0.92467 0.92467 0.84933 0.92467 0.92469

XGBoost 1 0.97133 0.97333 0.97333 0.94667 0.97333 0.9735

TABLE 10.13 Performance of Xception deep features extraction with different machine learning models.
Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.87357 0.83933 0.83133 0.8287 0.66188 0.83133 0.85104

KNN 0.874 0.76133 0.76733 0.76671 0.53421 0.76733 0.76955

SVM 1 0.89 0.87067 0.87066 0.74138 0.87067 0.87096

Random Forest 1 0.81867 0.81733 0.81734 0.63466 0.81733 0.81737

AdaBoost 0.83786 0.79867 0.77333 0.77333 0.54671 0.77333 0.77347

XGBoost 1 0.85533 0.85333 0.85328 0.70667 0.85333 0.85421

Applications of Artificial Intelligence in Medical Imaging


280 10. Automated detection of colon cancer using deep learning

Xception model has 73 layers which are References


lesser than the above used models, so the
[1] Cancer.Net, Colorectal cancer - risk factors and preven-
results are also worse than many of the previ- tion, ,https://www.cancer.net/cancer-types/colorec-
ously used models. Here the best accuracy is tal-cancer/risk-factors-and-prevention., Jun. 25, 2012
given with the SVM model with a test accuracy (accessed 16.05.21).
value of 87.06% and the worst accuracy is given [2] Colorectal Cancer Statistics | How common is colorectal
cancer? ,https://www.cancer.org/cancer/colon-rectal-
with 77.33% with AdaBoost (Table 10.13).
cancer/about/key-statistics.html. (accessed 16.05.21).
[3] Mayo Clinic, Colon cancer - symptoms and causes,
,https://www.mayoclinic.org / diseases-conditions /
10.5 Discussion colon-cancer/symptoms-causes/syc-20353669.
(accessed 25.05.21).
[4] Stanford Health Care, Types of colorectal cancer.
From the above tables of the second method, it
https://stanfordhealthcare.org/medical-conditions/can-
can be seen that ANN, SVM, and XGBoost are cer/colorectal-cancer/types.html (accessed 25.05.21).
giving the best results with a particular model. [5] Memorial Sloan Kettering Cancer Center, Treatment
Here also, ResNet and VGG models are outper- for metastatic colon cancer, ,https://www.mskcc.
forming other pretrained models. The best results org/cancer-care/types/colon/treatment/metastases.
(accessed 25.05.21).
in given by the ResNet50 with the SVM model
[6] Testing for Colorectal Cancer | How is colorectal can-
with test accuracy score, F1 score, kappa score, cer diagnosed? ,https://www.cancer.org/cancer/
recall score, and a precision score of 99.867%, colon-rectal-cancer/detection-diagnosis-staging/how-
99.867%, 99.733%, 99.869%, and 99.867%, respec- diagnosed.html. (accessed 25.05.21).
tively. While using the classical ML models, the [7] Cancer.Net, Colorectal cancer - diagnosis, ,https://
www.cancer.net/cancer-types/colorectal-cancer/diag-
extracted features from the pretrained models
nosis., Jun. 25, 2012 (accessed 25.05.21).
were reduced to lower dimensions, still there are [8] T.-H. Kim, D. Bhattacharyya, M. Mahanty, D.
no significant differences in the results obtained Midhunchakkaravarthy, Detection of colorectal cancer
by both methods. Using pretrained models drasti- by deep learning an extensive review, Int. J. Curr. Res.
cally increased the performance of models. Rev. 12 (2020). Available from: https://doi.org/
10.31782/IJCRR.2020.122234.
[9] Z. Song, et al., Automatic deep learning-based colorec-
tal adenoma detection system and its similarities with
10.6 Conclusion pathologists, BMJ Open 10 (9) (2020) e036423.
[10] E. Ribeiro, A. Uhl, G. Wimmer, M. Häfner, Exploring
Colon cancer is a very common disease that deep learning and transfer learning for colonic polyp
classification, Comput. Math. Methods Med. (2016)
can lead to loss of life if not treated early. The can-
e6584725. Available from: https://doi.org/10.1155/
cer is very hard to treat in the last stages. All the 2016/6584725. Oct. 2016.
steps should be taken to avoid the development [11] L. Xu, et al., Colorectal cancer detection based on deep
of colon cancer in the body, and immediate action learning, J. Pathol. Inform. 11 (2020).
should be taken if any of the cancer symptoms [12] M.C. Hornbrook, et al., Early colorectal cancer detected by
machine learning model using gender, age, and complete
are detected. The detection of cancer through
blood count data, Dig. Dis. Sci. 62 (10) (2017) 27192727.
biopsy is one of the most trusted methods by a [13] F. Ponzio, E. Macii, E. Ficarra, S. Di Cataldo, Colorectal
doctor. Artificial intelligence can help the doctor cancer classification using deep convolutional net-
reduce the time taken for detection of cancer and works, in: Proc. 11th International Joint Conference on
help get accurate results as the incorrect results Biomedical Engineering Systems and Technologies,
2018, 2, pp. 5866.
can lead to very bad results and sometimes even
[14] J. Li, P. Wang, Y. Zhou, H. Liang, K. Luan, Different
the loss of life. The false-positive result will also machine learning and deep learning methods for clas-
be very bad for the patient as the treatment sification of colorectal cancer lymph node metastasis
evolves many side effects and huge cost. images, Front. Bioeng. Biotechnol. 8 (2020) 1521.

Applications of Artificial Intelligence in Medical Imaging


References 281
[15] Scribd, Artificial neural network a study | PDF | Computer Vision and Pattern Recognition, 2016,
Artificial neural network | Nervous system,. ,https:// pp. 28182826.
www.scribd.com/document/525467375/artificial-neural- [29] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi,
network-a-study. (accessed 04.04.22). Inception-v4, inception-resnet and the impact of resid-
[16] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning ual connections on learning, in: Proc. AAAI
representations by back-propagating errors, Nature Conference on Artificial Intelligence, 2017, 31, 1.
323 (6088) (1986) 533536. [30] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger,
[17] M. Manavazhahan, A study of activation functions for Densely connected convolutional networks, in: Proc.
neural networks, 2017. IEEE Conference on Computer Vision and Pattern
[18] S. Sharma, Activation functions in neural networks, Recognition, 2017, pp. 47004708.
medium, ,https://towardsdatascience.com/activation- [31] F. Chollet, Xception: deep learning with depthwise
functions-neural-networks-1cbd9f8d91d6., Jul. 04, 2021 separable convolutions, in: Proc. IEEE Conference on
(accessed 04.04.22). Computer Vision and Pattern Recognition, 2017,
[19] N. Ganatra, A. Patel, A comprehensive study of deep pp. 12511258.
learning architectures, applications and tools, Int. J. [32] I.T. Jolliffe, J. Cadima, Principal component analy-
Comput. Sci. Eng. 6 (12) (2018) 701705. sis: a review and recent developments, Philos.
[20] F. Zhuang, et al., A comprehensive survey on transfer Trans. R. Soc. Math. Phys. Eng. Sci. 374 (2065)
learning, Proc. IEEE 109 (1) (2020) 4376. (2016) 20150202.
[21] MathWorks, What is deep learning? | How it works, [33] A. Tharwat, T. Gaber, A. Ibrahim, A.E. Hassanien,
techniques & applications. ,https://in.mathworks.com/ Linear discriminant analysis: a detailed tutorial, AI
discovery/deep-learning.html. (accessed 04.04.22). Commun. 30 (2) (2017) 169190.
[22] S. Indolia, A.K. Goswami, S.P. Mishra, P. Asopa, [34] F.-N. Yuan, L. Zhang, J.T. Shi, X. Xia, G. Li, Theories and
Conceptual understanding of convolutional neural applications of auto-encoder neural networks: a litera-
network-a deep learning approach, Procedia Comput. ture survey, Chin. J. Comput. 42 (01) (2019) 203230.
Sci. 132 (2018) 679688. [35] G. Hinton, S.T. Roweis, Stochastic neighbor embed-
[23] J. Brownlee, A Gentle Introduction to Pooling Layers for ding, in NIPS, 2002, 15, pp. 833840.
Convolutional Neural Networks, Machine Learning [36] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer, KNN
Mastery, Apr. 21, 2019. https://machinelearningmastery. model-based approach in classification, in: OTM
com/pooling-layers-for-convolutional-neural-networks/ Confederated International Conferences On the Move
(accessed Apr. 04, 2022). to Meaningful Internet Systems, 2003, pp. 986996.
[24] Padding (Machine Learning), DeepAI, May 17, 2019. [37] S.R. Gunn, Support vector machines for classification
https://deepai.org/machine-learning-glossary-and- and regression, ISIS Tech. Rep. 14 (1) (1998) 516.
terms/padding (accessed Apr. 04, 2022). [38] L. Breiman, Random forests, Mach. Learn. 45 (1)
[25] K. Simonyan, A. Zisserman, Very deep convolutional (2001) 532.
networks for large-scale image recognition, ArXiv [39] R.E. Schapire, Explaining adaboost, Empirical Inference,
Prepr. ArXiv14091556, 2014. Springer, 2013, pp. 3752.
[26] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning [40] T. Chen, C. Guestrin, Xgboost: a scalable tree boosting
for image recognition, in: Proc. the IEEE Conference system, in: Proc. 22nd ACM SIGKDD International
on Computer Vision and Pattern Recognition, 2016, Conference on Knowledge Discovery and Data
pp. 770778. Mining, 2016, pp. 785794.
[27] A.G. Howard et al., MobileNets: efficient convolu- [41] A.A. Borkowski, M.M. Bui, L.B. Thomas, C.P. Wilson,
tional neural networks for mobile vision applications, L.A. DeLand, S.M. Mastorides, Lung and Colon
ArXiv170404861 Cs, [Online]. ,http://arxiv.org/abs/ Cancer Histopathological Image Dataset (LC25000),
1704.04861., Apr. 2017 (accessed 14.05.21). ArXiv Prepr. ArXiv191212142, 2019.
[28] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. [42] Kaggle, Lung and colon cancer histopathological images.
Wojna, Rethinking the inception architecture for ,https://kaggle.com/andrewmvd/lung-and-colon-can-
computer vision, in: Proc. IEEE Conference on cer-histopathological-images. (accessed 14.05.21).

Applications of Artificial Intelligence in Medical Imaging


This page intentionally left blank
C H A P T E R

11
Brain hemorrhage detection using computed
tomography images and deep learning
Abdullah Elen1, Aykut Diker1 and Abdulhamit Subasi2,3
1
Department of Software Engineering, Faculty of Engineering and Natural Sciences, Bandirma Onyedi
Eylul University, Bandirma, Balikesir, Turkey 2Institute of Biomedicine, Faculty of Medicine,
University of Turku, Turku, Finland 3Department of Computer Science, College of Engineering, Effat
University, Jeddah, Saudi Arabia

O U T L I N E

11.1 Introduction 283 11.4 Experimental results 289


11.4.1 Dataset 290
11.2 Literature survey in brain hemorrhage
detection 285 11.5 Discussions 299
11.3 Deep learning methods 286 11.6 Conclusion 300
11.3.1 ResNet-18 286
References 300
11.3.2 EfficientNet-B0 287
11.3.3 VGG-16 288
11.3.4 DarkNet-19 288

11.1 Introduction brain cells [2]. ICH is known to be clinically


hazardous since it has a high risk for causing
Intracranial hemorrhage (ICH) is a fatal brain injuries that may lead to paralysis and
lesion that occurs in the brain, and it has a even death. Since ICH is fatal, opportune diag-
high rate of mortality as reported in Ref. [1]. nosis of ICH is significant for taking appropri-
ICH is a type of stroke, which is caused by an ate actions such as surgical intervention [3].
artery in the brain. If the stroke bursts and Depending on where it is located in the brain,
bleeds, it could harm the surrounding brain tis- ICH can be divided into five distinct categories:
sues. Hence, the continuous bleeding kills intraventricular (IVH), intraparenchymal (IPH),

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00014-1 283 © 2023 Elsevier Inc. All rights reserved.
284 11. Brain hemorrhage detection using computed tomography images and deep learning

subarachnoid (SAH), epidural (EDH), and sub- weight sharing, pooling, and convolution that
dural (SDH) [4]. Studies conducted on popula- can be used for maintaining essential opera-
tions indicate that incidence of ICH is estimated tions of CNNs [23]. CNNs can extract mean-
to be about 1030 per 100 000 people [5,6]. ingful features and classify them further, for
Furthermore, 3 months later, more than one- the diagnosis of hemorrhage from CT scan
third of survivors have significant disabilities [7]. images. Sample normal brain CT image and
Computed tomography (CT) can be used to hemorrhage CT image are given in Fig. 11.1.
determine the source of hemorrhage and its Since CNNs are popular in the field of com-
localization. CT uses consecutive 2D slices and puter vision, researchers proposed more
stacks them to generate 3D image as an output advanced deep neural networks to obtain more
[8]. The types of ICH can be diagnosed by an accuracy. Deep neural network models such as
expert with the help of their properties in the residual networks (ResNet) [24], AlexNet [25],
CT images such as lesion shape, size, etc. and VGGNet [26], and other different models are
therefore manual diagnosis is a tedious proce- proposed, which are usually large and special-
dure [9]. Hence, automated diagnosis and ized networks relative to the classical CNNs.
detection of hemorrhage has gained attention Each network has its own pros and cons, as
in the last decades [1012]. Deep learning- mentioned in their original study. Variants or
based methods have shown good performance competitors of these deep neural network mod-
in medical image classification [1214], medi- els are usually employed in transfer learning
cal image segmentation [1517], and disease [2729]. In transfer learning, the model is
diagnosis [1821]. Convolutional neural net- trained on a dataset, and weights of the trained
work (CNN) is employed for various classifica- model are saved for further usage. Then, most
tion tasks related with the medical images of the layers are frozen during the training
[1921]. CNNs are capable of extracting fea- phase, and only a few layers are trained. It can
tures from images and learning automatically. be seen that these networks are accepted as
They are devised such that they can under- state-of-the-art methods in the literature and
stand the image content [22]. CNNs are opti- preferred by most of the researchers in the field
mized for images. There are concepts such as of computer vision and medical imaging.

FIGURE 11.1 Normal and hemorrhage image samples.

Applications of Artificial Intelligence in Medical Imaging


11.2 Literature survey in brain hemorrhage detection 285
Starting from this point, in this chapter, some gradually increased the size of input image
of the popular deep learning models are layer and number of hidden layers. The best
employed for hemorrhage detection using accuracy rate, which is 83.30%, is obtained for
brain CT images. Moreover, the brain hemor- subdivision 41%50%. In Ref. [32], a deep
rhage CT image dataset is exploited for hemor- learning-based system is proposed. The pro-
rhage detection. posed system uses multiple ImageNet [33] pre-
The rest of this chapter is organized as fol- trained CNNs, some preprocessing techniques,
lows: some of the methods proposed for brain atlas creation module, and prediction-based
hemorrhage detection are reviewed and pre- selection module. The preprocessing provides
sented in Section 11.2. The deep learning models high performance for small and imbalanced
and methods employed in this chapter are datasets. The proposed system reached an area
explained in Section 11.3. The obtained experi- under the curve (AUC) score of 99.30%. In Ref.
mental results are provided in Section 11.4. [34], a deep learning-based method is pre-
Discussions and drawn conclusions are included sented for achieving accurate ICH detection
in Section 11.5 and Section 11.6, respectively. and classification. Authors employed 2D slices
of 3D CT scans, by processing DICOM files. In
the first step, the method exploits CNN to
11.2 Literature survey in brain extract features from the slices and classify
hemorrhage detection them. Then, in the second step, feature vectors
generated from the slices are supplied to the
In the recent decades, many studies have been first sequence model for providing spatially
conducted for ICH detection from the brain CT coherent results. In the final stage, classifica-
scan images. In this section, some of these stud- tion results of CNN and the first sequence
ies are reviewed in an elaborative manner. The model through a second sequence model is
methods proposed in the literature that use vari- implemented. The second sequence model is
ous CT image datasets are examined. employed for adaptive model averaging,
In Ref. [30], authors observed the perfor- therefore the final decision is made by the sec-
mance of different frameworks. They used ond sequence model. In Ref. [35], brain hemor-
AlexNet, a modified AlexNet-support vector rhage classification based on neural network
machine (AlexNet-SVM), and AlexNet-SVM (BHCNet) is proposed. Authors employed data
with principal component analysis (PCA), augmentation and combined this augmentation
separately, for classifying brain CT images. method with the CNN model. They also evalu-
Using the AlexNet-SVM model, on the original ated the performance of CNN 1 long short-
images and preprocessed images, highest accu- term memory and CNN 1 gated recurrent unit.
racy rates of 99.86% and 99.60% are obtained, Their method has reached and accuracy rate of
respectively. Hence, AlexNet-SVM framework 100.00% on the imbalanced data they
outperformed AlexNet-SVM with PCA frame- employed for benchmarking. In Ref. [36], a
work and AlexNet. In Ref. [31], researches joint model is proposed which combined CNN
employed summation images of the brain CT with recurrent neural network (RNN). The
images in order to detect hemorrhage. Six dis- CNN-RNN model is employed for both five-
tinct subdivisions are created according to the type classification and two-type classification.
height of images and CT images belong to the The two-type classification is simply made for
same subdivision are summed. Authors used detecting the presence of hemorrhage and the
various artificial neural network (ANN) archi- five-type classification is made for ICH subtype
tectures for classifying these images and detection. Their models named Sub-Lab and

Applications of Artificial Intelligence in Medical Imaging


286 11. Brain hemorrhage detection using computed tomography images and deep learning

Sub-Lab reached accuracy, sensitivity, and 11.3 Deep learning methods


specificity scores that are greater than or equal
to 0.98. In Ref. [37], authors proposed a CNN In this chapter, a model based on CNN has
model for the detection of ICH. They reached been proposed to classify it as hemorrhage and
an AUC score of 0.846. Their ICH detection normal, which consists of 2 classes in total. The
method is also speed efficient, which processes block diagram of the model is given in Fig. 11.2.
images in 2.3 seconds on average. In Ref. [38],
the authors presented a new classification tech-
nique, which is based on minimalist machine
11.3.1 ResNet-18
learning (MML) [39]. They also employed a
feature selection algorithm, namely, dMeans. Residual Network, or ResNet in short, is a
Their best model has reached an accuracy rate form of neural network first presented by He
of 86.50%. In Ref. [40], authors proposed et al. in 2015 in their study “Deep Residual
computer-aided diagnosis (CAD) method for Learning for Image Recognition” [24]. Additional
SAH (which is subtype of ICH) detection. layers have been added to deep neural networks
Their method applies techniques such as mor- to handle a mostly complex problem, resulting
phological operations, noise reduction, and in better accuracy and performance. The idea
segmentation, on the images. The CAD system behind adding more layers is that they would
outputs the final image which indicates hemor- learn increasingly complicated features. For
rhage and classification result together. They example, in image recognition and classifica-
reached a classification accuracy of 88.89%. In tion processes, the first layer can learn to rec-
Ref. [41], authors proposed an artificial ognize edges, the second layer can learn to
intelligence-based tool for ICH detection. The identify textures, and the third layer can
tool makes prediction about the presence of learn to recognize objects. The standard CNN
ICH by localizing it using heat map. An atlas is model has a maximum depth threshold. With
created from the training images by obtaining the introduction of a new neural network
class activation maps and visual evidence of layer, the Residual Block, the challenge of
later layers belong to the CNN model. They training very deep networks was relieved
reached an AUC score of 0.994. shown in Fig. 11.3.

Image resize
Hemorrhage
Training Model

10-Fold ResNet-18
Hemorrhage Grayscale cross EfficientNetB0
and VGG-16
Dataset validation DarkNet-19
Sharpness
Normal

Preprocessing Training and Testing process Classification process


FIGURE 11.2 Block diagram of the deep learning architecture.

Applications of Artificial Intelligence in Medical Imaging


11.3 Deep learning methods 287

11.3.2 EfficientNet-B0 can then be scaled up with composite scaling


from EfficientNet-B1 to EfficientNet-B7 to
The EfficientNet model was presented increase accuracy and model size. With each
by Tan and Le from the Google research team. increase, the processing power nearly doubles.
These researchers examined CNN model scal- In previous scaling studies, only one of the
ing and determined that balancing in scaling parameters among depth, width, and resolu-
the depth, width, and resolution of the net- tion is usually scaled. The depth of the network
work affects network performance. They pro- corresponds to the number of layers in a net-
posed a new scaling method that equally scales work. The width is related to the number of fil-
all dimensions of the network’s depth, width, ters in a convolutional layer. Resolution is the
and resolution based on this observation. height and width of the input image. While it is
EfficientNet consists of eight models from B0 possible to arbitrarily scale two or three para-
to B7. As the model grows, the number of para- meters, arbitrary scaling requires laborious
meters used and the success rate increase. manual adjustment and still often does not
According to the study, the basic model, achieve the desired efficiency and accuracy. The
EfficientNet-B0, is created first. The base model EfficientNet model not only provides higher
accuracy, but also increases the efficiency of the
models by reducing parameters compared to
X
cutting-edge models. Contrary to the traditional
practice of randomly scaling factors with the
Weight Layer
EfficientNet model, it is proposed to evenly
X
F(X) ReLu scale the network width, depth, and resolution
identity
Weight Layer
with a set of fixed scaling coefficients with com-
posite scaling. The proposed composite scaling
method in the EfficientNet model is given in
Fig. 11.4.
F(X)+X
The researchers conceived by executing a
ReLu
baseline network using a technique called neu-
FIGURE 11.3 ResNet’s architecture. ral architecture search, which automates the

Width scaling

Filter number
Width scaling

Depth scaling Depth scaling

Layer
Resolution Resolution
Resolution
scaling scaling
(A) (B) (C) (D) (E)
FIGURE 11.4 (A) Basic network example. (BD) Traditional scaling that increases only one dimension of network
width, depth, or resolution. (E) EfficientNet’s composite scaling method.

Applications of Artificial Intelligence in Medical Imaging


288 11. Brain hemorrhage detection using computed tomography images and deep learning

structure of neural networks. It optimizes both after each convolution process. For nonlinear
the accuracy and efficiency as measured on the activation, the convolution output is employed.
floating-point operations per second basis. The Spatial pooling is handled via five pooling layers.
improved inverted bottleneck convolution is A 2 3 2 size filter and stride 2 are used for max-
used in this architecture (MBConv). The pooling. Following a succession of convolutional
researchers then scaled up this baseline net- and max-pooling layers, three fully connected
work to create the EfficientNets family of deep layers are built. The final layer [44] is the softmax
learning models [42,43]. Its architecture is layer. The architecture of VGG-16 is depicted in
given in the below diagram in Fig. 11.5. Fig. 11.6.

11.3.3 VGG-16 11.3.4 DarkNet-19


VGG-16 is a 16-layered CNN architecture that A network can be improved to be small and
employs a 3 3 3 filter for all convolution layers, effective at the same time. Many earlier ideas,
making it the smallest size filter available. RGB such as the Darknet reference, Network in
samples with a resolution of 224 3 224 pixels are Network, Inception, and Batch Normalization,
input to the model. The image features are have been incorporated. Convolutional layers are
extracted using a collection of convolutional used instead of completely connected layers in
layers. The stride of convolution is one. The con- Darknet-19. It is composed of 19 convolutional
volutional layer input must be spatially padded and 5 max-pooling layers. To decrease the

224x224x3 28x28x40 7x7x192

MBConv6, 3x3 MBConv6, 5x5


Conv 3x3
112x112x32 28x28x80 7x7x192
MBConv6, 3x3 MBConv6, 5x5
MBConv1, 3x3
112x112x16 28x28x80 7x7x192

MBConv6, 3x3 MBConv6, 5x5


MBConv6, 3x3
28x28x80 7x7x192
56x56x24
MBConv6, 5x5 MBConv6, 3x3
MBConv6, 3x3
14x14x112 7x7x320
56x56x24
MBConv6, 5x5
MBConv6, 5x5
14x14x112
28x28x40
MBConv6, 5x5
MBConv6, 5x5
14x14x112

MBConv6, 5x5

7x7x192
MBConv6, 5x5

FIGURE 11.5 EfficientNet-B0’s architecture.

Applications of Artificial Intelligence in Medical Imaging


11.4 Experimental results 289
number of parameters, it only employs 3 3 3 and 2 GB GeForce GT 730 graphic card hard-
convolutional kernels and numerous 1 3 1 con- ware in the computer. The environment for
volutional kernels. In YOLO algorithms such as code development and experimental work was
YOLOv2, DarkNet-19 is frequently used [45]. MATLAB (2020b).
The architecture of DarkNet-19 is shown in To evaluate the proposed model, we have
Fig. 11.7. used the confusion matrix and commonly used
performance parameters, which are obtained
from this matrix such as accuracy (ACC), sensitiv-
11.4 Experimental results ity (SEN), specificity (SPE), F-score, Matthews cor-
relation coefficient (MCC), and receiver operating
The experiments were carried out on Intel characteristic (ROC). It consists of four compo-
(R) Core (TM) i7 2.90 GHz CPU, 8GB RAM, nents: true positive (TP), true negative (TN), false

224x224x3 224x224x64
112x112x128

56x56x256
28x28x512 7x7x512
14x14x512
1x1x4096 1x1x1000

Convolution +ReLU

Max pooling

Fully connected +ReLU

Softmax

FIGURE 11.6 VGG-16’s architecture.

224x224x3 224x224
112x112

56x56
28x28
14x14
7x7 1000

Convolution

Max pooling

Average pool

Softmax

FIGURE 11.7 DarkNet-19’s architecture.

Applications of Artificial Intelligence in Medical Imaging


290 11. Brain hemorrhage detection using computed tomography images and deep learning

positive (FP), and false negative [46]. The perfor- samples. Additionally, ROC has been considered
mance measures are given in Eqs. (11.211.6). to evaluate the model performance. The ROC is
a 2D graph that is drawn the true-positive rate
TP 1 TN
ACC 5 (11.2) against the false-negative rate. The training of
TP 1 TN 1 FP 1 FN the CNN models was realized in 50 epochs, the
TP mini-batch size was 16 as can be seen from the
SEN 5 (11.3)
TP 1 FN given MATLAB code of each CNN models in
TN Section 11.3.
SPE 5 (11.4)
TN 1 FP
2 3 TP
F 2 score 5 (11.5) 11.4.1 Dataset
2 3 TP 1 FP 1 FN
TPxTN 2 FPxFN The Head CT Hemorrhage Image Dataset
MCC 5 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðTP 1 FPÞðTP 1 FNÞðTN 1 FPÞðTN 1 FNÞ (HCTHID) contains a total of 200 CT images of
100 normal (healthy) and hemorrhage [47]. The
(11.6)
images in the dataset have different sizes and are
Hereby, TP and TN represent the number of all in PNG format. In this study, we used all
correctly predicted positive and negative sam- images by converting them to 32 3 32 pixels.
ples, whereas FP and FN correspond to the num- Sample images of the HCTHID are shown in
ber of incorrectly predicted positive and negative Fig. 11.8, the first row of the figure depicts

FIGURE 11.8 Sample CT images of (A) normal and (B) hemorrhage brain. CT, Computed tomography.

Applications of Artificial Intelligence in Medical Imaging


11.4 Experimental results 291
images of cases which do not have any problem, 10-fold cross-validation technique of the
that is, normal and the images of hemorrhages HCTHID are given in Figs. 11.911.12 and
are depicted in the second row. Tables 11.111.5.
The performance of deep learning models The ACC, SEN, SPE, F-score, and MCC
was investigated in different experiments values were obtained by using the 10-fold cross
for classification of the HCTHID. The confu- validation of the DarkNet-19 CNN model. As
seen in Fig. 11.9, the accuracy rate of the confu-
sion matrix and mean score of the ROC curve,
respectively 83.5%, and 91.50%

% Datastore for the image dataset.


dataStore = imageDatastore('Repository\BrainHemorrhage', ...
'LabelSource', 'foldernames', 'IncludeSubfolders', true);
% Partition data for cross-validation.
cv = cvpartition(dataStore.Labels, 'Kfold', 10);
% Create cell array for train and test data.
trainSet = cell(10, 1); testSet = cell(10, 1);
for i = 1 : 10 % Create subset of datastore.
% Train cases for current fold.
trainSet{i} = subset(dataStore, find(cv.training(i)));
% Test cases for current fold.
testSet{i} = subset(dataStore, find(cv.test(i)));
end
% Options for training deep learning neural networks.
% Set the options to the default sett ings for the stochastic gradient
descent with momentum. Set the maximum number of epochs at 50,
and start the training with an initial learning rate of 0.0001.
options = trainingOptions('sgdm', ...
'MiniBatchSize', 16, ...
'MaxEpochs', 50, ...
'InitialLearnRate', 1e-4);
% Load DarkNet-19 convolutional neural network.
deepNet = darknet19;
layers = layerGraph(deepNet.Layers);
for i = 1 : 10 % 10-fold cross validation.
% Train neural network for deep learning.
[net, info] = trainNetwork(trainSet{i}, layers, options);
% Test the performance of the network by evaluating the prediction
accuracy of the test data.
[predictions, ~] = classify(net, testSet{i});
% Evaluate test accuracy of the model.
% The accuracy is the ratio of the number of true labels in the
testdata matching the classifications from classify to the number
of images in the test data.
ACC = mean(predictions == testSet{i}.Labels);
disp(['Test accuracy is ', num2str(ACC)]);
end

sion matrix, ROC curves, performance The ACC, SEN, SPE, F-score, and MCC
metrics tables of the CNN models, and clas- values were obtained by using the 10-fold cross
sification results obtained by using the validation of the EfficientNet-B0 CNN model.

Applications of Artificial Intelligence in Medical Imaging


292 11. Brain hemorrhage detection using computed tomography images and deep learning

FIGURE 11.9 Test result of the DarkNet-19; (A) confusion matrix and (B) ROC curve. ROC, Receiver operating
characteristic.

FIGURE 11.10 Test result of the EfficientNet-B0; (A) confusion matrix and (B) ROC curve. ROC, Receiver operating
characteristic.

Applications of Artificial Intelligence in Medical Imaging


11.4 Experimental results 293

FIGURE 11.11 Test result of the ResNet-18; (A) confusion matrix and (B) ROC curve. ROC, Receiver operating
characteristic.

FIGURE 11.12 Test result of the VGG-16; (A) confusion matrix and (B) ROC curve. ROC, Receiver operating
characteristic.

Applications of Artificial Intelligence in Medical Imaging


294 11. Brain hemorrhage detection using computed tomography images and deep learning

TABLE 11.1 Test result of the DarkNet-19.


Fold Accuracy Sensitivity Specificity Precision F-score MCC

Hemorrhage 1 0.650 0.400 0.900 0.800 0.533 0.346

2 0.850 0.800 0.900 0.889 0.842 0.704

3 0.800 0.800 0.800 0.800 0.800 0.600

4 0.950 0.900 1.000 1.000 0.947 0.905

5 0.800 0.800 0.800 0.800 0.800 0.600

6 1.000 1.000 1.000 1.000 1.000 1.000

7 0.850 0.900 0.800 0.818 0.857 0.704

8 0.900 0.800 1.000 1.000 0.889 0.817

9 1.000 1.000 1.000 1.000 1.000 1.000

10 0.550 0.800 0.300 0.533 0.640 0.116

Normal 1 0.650 0.900 0.400 0.600 0.720 0.346

2 0.850 0.900 0.800 0.818 0.857 0.704

3 0.800 0.800 0.800 0.800 0.800 0.600

4 0.950 1.000 0.900 0.909 0.952 0.905

5 0.800 0.800 0.800 0.800 0.800 0.600

6 1.000 1.000 1.000 1.000 1.000 1.000

7 0.850 0.800 0.900 0.889 0.842 0.704

8 0.900 1.000 0.800 0.833 0.909 0.817

9 1.000 1.000 1.000 1.000 1.000 1.000

10 0.550 0.300 0.800 0.600 0.400 0.116

TABLE 11.2 Test result of the EfficientNet-B0.


Fold Accuracy Sensitivity Specificity Precision F-score MCC

Hemorrhage 1 0.600 0.600 0.600 0.600 0.600 0.200

2 0.750 0.800 0.700 0.727 0.762 0.503

3 0.900 0.900 0.900 0.900 0.900 0.800

4 0.850 0.900 0.800 0.818 0.857 0.704

5 0.700 0.700 0.700 0.700 0.700 0.400

6 1.000 1.000 1.000 1.000 1.000 1.000

7 0.950 0.900 1.000 1.000 0.947 0.905

8 0.700 0.600 0.800 0.750 0.667 0.408

9 0.900 0.800 1.000 1.000 0.889 0.817

10 0.850 0.900 0.800 0.818 0.857 0.704

(Continued)

Applications of Artificial Intelligence in Medical Imaging


11.4 Experimental results 295
TABLE 11.2 (Continued)
Fold Accuracy Sensitivity Specificity Precision F-score MCC

Normal 1 0.600 0.600 0.600 0.600 0.600 0.200

2 0.750 0.700 0.800 0.778 0.737 0.503

3 0.900 0.900 0.900 0.900 0.900 0.800

4 0.850 0.800 0.900 0.889 0.842 0.704

5 0.700 0.700 0.700 0.700 0.700 0.400

6 1.000 1.000 1.000 1.000 1.000 1.000

7 0.950 1.000 0.900 0.909 0.952 0.905

8 0.700 0.800 0.600 0.667 0.727 0.408

9 0.900 1.000 0.800 0.833 0.909 0.817

10 0.850 0.800 0.900 0.889 0.842 0.704

TABLE 11.3 Test result of the ResNet-18.


Fold Accuracy Sensitivity Specificity Precision F-score MCC

Hemorrhage 1 0.800 0.700 0.900 0.875 0.778 0.612

2 0.700 0.900 0.500 0.643 0.750 0.436

3 0.850 0.700 1.000 1.000 0.824 0.734

4 0.600 0.500 0.700 0.625 0.556 0.204

5 0.800 0.800 0.800 0.800 0.800 0.600

6 0.900 0.800 1.000 1.000 0.889 0.817

7 0.900 1.000 0.800 0.833 0.909 0.817

8 0.650 0.500 0.800 0.714 0.588 0.315

9 1.000 1.000 1.000 1.000 1.000 1.000

10 0.850 0.900 0.800 0.818 0.857 0.704

Normal 1 0.800 0.900 0.700 0.750 0.818 0.612

2 0.700 0.500 0.900 0.833 0.625 0.436

3 0.850 1.000 0.700 0.769 0.870 0.734

4 0.600 0.700 0.500 0.583 0.636 0.204

5 0.800 0.800 0.800 0.800 0.800 0.600

6 0.900 1.000 0.800 0.833 0.909 0.817

7 0.900 0.800 1.000 1.000 0.889 0.817

8 0.650 0.800 0.500 0.615 0.696 0.315

9 1.000 1.000 1.000 1.000 1.000 1.000

10 0.850 0.800 0.900 0.889 0.842 0.704

Applications of Artificial Intelligence in Medical Imaging


296 11. Brain hemorrhage detection using computed tomography images and deep learning

TABLE 11.4 Test result of the VGG-16.


Fold Accuracy Sensitivity Specificity Precision F-score MCC

Hemorrhage 1 0.850 0.700 1.000 1.000 0.824 0.734

2 0.950 0.900 1.000 1.000 0.947 0.905

3 0.500 0.000 1.000  0.000 0.000

4 0.850 0.900 0.800 0.818 0.857 0.704

5 0.900 0.900 0.900 0.900 0.900 0.800

6 1.000 1.000 1.000 1.000 1.000 1.000

7 1.000 1.000 1.000 1.000 1.000 1.000

8 0.800 0.600 1.000 1.000 0.750 0.655

9 0.500 0.000 1.000  0.000 0.000

10 0.850 1.000 0.700 0.769 0.870 0.734

Normal 1 0.850 1.000 0.700 0.769 0.870 0.734

2 0.950 1.000 0.900 0.909 0.952 0.905

3 0.500 1.000 0.000 0.500 0.667 0.000

4 0.850 0.800 0.900 0.889 0.842 0.704

5 0.900 0.900 0.900 0.900 0.900 0.800

6 1.000 1.000 1.000 1.000 1.000 1.000

7 1.000 1.000 1.000 1.000 1.000 1.000

8 0.800 1.000 0.600 0.714 0.833 0.655

9 0.500 1.000 0.000 0.500 0.667 0.000

10 0.850 0.700 1.000 1.000 0.824 0.734

TABLE 11.5 Mean test scores of deep learning methods.


Model Accuracy Sensitivity Specificity Precision F-score MCC

Hemorrhage DarkNet-19 0.835 0.820 0.850 0.864 0.831 0.679

EfficientNet-B0 0.820 0.810 0.830 0.831 0.818 0.644

ResNet-18 0.805 0.780 0.830 0.831 0.795 0.624

VGG-16 0.820 0.700 0.940 0.936 0.715 0.653

Normal DarkNet-19 0.835 0.850 0.820 0.825 0.828 0.679

EfficientNet-B0 0.820 0.830 0.810 0.816 0.821 0.644

ResNet-18 0.805 0.830 0.780 0.807 0.809 0.624

VGG-16 0.820 0.940 0.700 0.818 0.855 0.653

Applications of Artificial Intelligence in Medical Imaging


11.4 Experimental results 297
As seen in Fig. 11.10, the accuracy rate of the The ACC, SEN, SPE, F-score, and MCC
confusion matrix and mean score of the ROC values were obtained by using the 10-fold cross
curve, respectively, 82.00%, and 87.90%. validation of the ResNet-18 CNN model. As

% Datastore for the image dataset.


dataStore = imageDatastore('Repository\BrainHemorrhage', ...
'LabelSource', 'foldernames', 'IncludeSubfolders', true);
% Partition data for cross-validation.
cv = cvpartition(dataStore.Labels, 'Kfold', 10);
% Create cell array for train and test data.
trainSet = cell(10, 1); testSet = cell(10, 1);
for i = 1 : 10 % Create subset of datastore.
% Train cases for current fold.
trainSet{i} = subset(dataStore, find(cv.training(i)));
% Test cases for current fold.
testSet{i} = subset(dataStore, find(cv.test(i)));
end
% Options for training deep learning neural networks.
% Set the options to the default settings for the stochastic gradient descent
with momentum. Set the maximum number of epochs at 50, and start the training
with an initial learning rate of 0.0001.
options = trainingOptions('sgdm', ...
'MiniBatchSize', 16, ...
'MaxEpochs', 50, ...
'InitialLearnRate', 1e-4);
% Load EfficientNet-B0 convolutional neural network.
deepNet = efficientnetb0;
layers = layerGraph(deepNet.Layers);
for i = 1 : 10 % 10-fold cross validation.
% Train neural network for deep learning.
[net, info] = trainNetwork(trainSet{i}, layers, options);
% Test the performance of the network by evaluating the prediction accuracy
of the test data.
[predictions, ~] = classify(net, testSet{i});
% Evaluate test accuracy of the model.
% The accuracy is the ratio of the number of true labels in the test data
matching the classifications from classify to the number of images in the test data.
ACC = mean(predictions == testSet{i}.Labels);
disp(['Test accuracy is ', num2str(ACC)]);
end

Applications of Artificial Intelligence in Medical Imaging


298 11. Brain hemorrhage detection using computed tomography images and deep learning

seen in Fig. 11.11, the accuracy rate of the con- The ACC, SEN, SPE, F-score, and MCC
fusion matrix and mean score of the ROC values were obtained by using the 10-fold cross
curve, respectively 80.50%, and 89.50%. validation of the VGG-16 CNN model. As seen

% Datastore for the image dataset.


dataStore = imageDatastore('Repository\BrainHemorrhage', ...
'LabelSource', 'foldernames', 'IncludeSubfolders', true);
% Partition data for cross-validation.
cv = cvpartition(dataStore.Labels, 'Kfold', 10);
% Create cell array for train and test data.
trainSet = cell(10, 1); testSet = cell(10, 1);
for i = 1 : 10 % Create subset of datastore.
% Train cases for current fold.
trainSet{i} = subset(dataStore, find(cv.training(i)));
% Test cases for current fold.
testSet{i} = subset(dataStore, find(cv.test(i)));
end
% Options for training deep learning neural networks.
% Set the options to the default settings for the stochastic gradient descent
with momentum. Set the maximum number of epochs at 50, and start the training
with an initial learning rate of 0.0001.
options = trainingOptions('sgdm', ...
'MiniBatchSize', 16, ...
'MaxEpochs', 50, ...
'InitialLearnRate', 1e-4);
% Load ResNet-18 convolutional neural network.
deepNet = resnet18;
layers = layerGraph(deepNet.Layers);
for i = 1 : 10 % 10-fold cross validation.
% Train neural network for deep learning.
[net, info] = trainNetwork(trainSet{i}, layers, options);
% Test the performance of the network by evaluating the prediction accuracy
of the test data.
[predictions, ~] = classify(net, testSet{i});
% Evaluate test accuracy of the model.
% The accuracy is the ratio of the number of true labels in the test data
matching the classifications from classify to the number of images in the test data.
ACC = mean(predictions == testSet{i}.Labels);
disp(['Test accuracy is ', num2str(ACC)]);
end

Applications of Artificial Intelligence in Medical Imaging


11.5 Discussions 299
in Fig. 11.12, the accuracy rate of the confusion pressure, and intracranial tumors are all possible
matrix and mean score of the ROC curve, causes of brain hemorrhages. Deep learning mod-
respectively, 82.00%, and 90.20%. els can never replace doctors and radiologists. But

% Datastore for the image dataset.


dataStore = imageDatastore('Repository\BrainHemorrhage', ...
'LabelSource', 'foldernames', 'IncludeSubfolders', true);
% Partition data for cross-validation.
cv = cvpartition(dataStore.Labels, 'Kfold', 10);
% Create cell array for train and test data.
trainSet = cell(10, 1); testSet = cell(10, 1);
for i = 1 : 10 % Create subset of datastore.
% Train cases for current fold.
trainSet{i} = subset(dataStore, find(cv.training(i)));
% Test cases for current fold.
testSet{i} = subset(dataStore, find(cv.test(i)));
end
% Options for training deep learning neural networks.
% Set the options to the default settings for the stochastic gradient descent
with momentum. Set the maximum number of epochs at 50, and start the training
with an initial learning rate of 0.0001.
options = trainingOptions('sgdm', ...
'MiniBatchSize', 16, ...
'MaxEpochs', 50, ...
'InitialLearnRate', 1e-4);
% Load VGG-16 convolutional neural network.
deepNet = vgg16;
layers = layerGraph(deepNet.Layers);
for i = 1 : 10 % 10-fold cross validation.
% Train neural network for deep learning.
[net, info] = trainNetwork(trainSet{i}, layers, options);
% Test the performance of the network by evaluating the prediction accuracy
of the test data.
[predictions, ~] = classify(net, testSet{i});
% Evaluate test accuracy of the model.
% The accuracy is the ratio of the number of true labels in the test data matching
the classifications from classify to the number of images in the test data.
ACC = mean(predictions == testSet{i}.Labels);
disp(['Test accuracy is ', num2str(ACC)]);
end

All scores of the classifiers CNN architectures with its image processing analysis, it can make a
are reported in Table 11.5. According to Table 11.5, big impact. Computer-aided, especially deep
the best hemorrhage classification accuracy was learning-based medical image methods have
obtained 83.50% with DarkNet-19 CNN model. increased in recent years. In this chapter, four
deep CNN approaches are considered in hemor-
rhage classification. Several machine learning
11.5 Discussions models, CNN models and our method are com-
pared according to performance criteria, which
ICH is one of the most serious risks to human consist of accuracy, sensitivity, specificity and
healthcare. Head trauma, excessive blood F-score in Table 11.6.

Applications of Artificial Intelligence in Medical Imaging


300 11. Brain hemorrhage detection using computed tomography images and deep learning

TABLE 11.6 Classification performance of the proposed method and comparison with other studies.
References Methods Image Datasets Accuracy

Dawud et al. (2019) [48] Deep learning and machine learning CT image dataset 93%
(AlexNet 1 SVM)

Shahangian et al. [49] K-NN and MLP CT scan image dataset 93.3%
Lee [50] ResNet-50 CT scan image dataset F1 score
with 674258 samples 88%

Lee et al. [31] Deep learning algorithm for ANN 250 cases with 9085 CT 91.7%
images samples
Chilamkurthy et al. [51] Deep Learning CT scan images from AUC 94.19%
4304 samples
This study ResNet-18 CT image dataset with 200 ACC:
EfficientNet-B0 samples 83.50%
DarkNet-19 SEN: 82%
VGG-16 SPE: 85%
F-score
83.20%

In Ref. [31] ICH classification and its subtype normal on brain CT images has emerged. The
classification were made. The aim of this study dataset used consists of two classes. The classi-
was to evaluate if the method could be used to fication accuracy of DarkNet-19 CNN model
identify ICH and classify its subtypes without has been superior compared to the other CNN
requiring a CNN. CT images were split into 10 models. The proposed model was tested by
subdivisions based on the intracranial height. operating the Brain CT Image database. As a
For the classification of the ICH to subtypes, the result of the best classification, the values of
accuracy success for subarachnoid hemorrhage ACC 83.50%, SEN 82%, SPE 85%, F-Score
was substantially impeccable at 91.7%. 83.20%, and MCC 65% with DarkNet-19 CNN
Ref. [48] addressed the problem of detecting model were obtained. Considering the classifi-
cerebral hemorrhage in the early stages of hemor- cation accuracies of the study, it can be said
rhage, which is a challenging task for radiologists. that it has a low accuracy rate as a disadvan-
AlexNet, which is the popular CNN architecture, tage. The reason for this is that the dataset
is used to solve this problem. Besides, the modi- used has less data than the sources in the dis-
fied AlexNet is supported by the SVM classifier. cussion section. Nevertheless, it can be said
With the Alexnet 1 SVM classifier structure used that promising results were obtained.
in the study, the classification of CT image accu-
racy rate of 93% was obtained.
References
[1] C.J. van Asch, M.J. Luitse, G.J. Rinkel, I. van der Tweel,
11.6 Conclusion A. Algra, C.J. Klijn, Incidence, case fatality, and func-
tional outcome of intracerebral haemorrhage over time,
In this chapter, a pretrained CNN model according to age, sex, and ethnic origin: a systematic
that can distinguish between hemorrhage and review and meta-analysis, Lancet Neurol. 9 (2) (2010).

Applications of Artificial Intelligence in Medical Imaging


References 301
Available from: https://doi.org/10.1016/S1474-4422 [12] C.S.S. Anupama, M. Sivaram, E.L. Lydia, D. Gupta, K.
(09)70340-0. Shankar, Synergic deep learning modelbased auto-
[2] H. Chen, S. Khan, B. Kou, S. Nazir, W. Liu, A. Hussain, mated detection and classification of brain intracranial
A smart machine learning model for the detection of hemorrhage images in wearable networks, Pers.
brain hemorrhage diagnosis based Internet of Things in Ubiquitous Comput. 26 (1) (2022) 110. Available from:
smart cities, Complexity 2020 (2020). Available from: https://doi.org/10.1007/s00779-020-01492-2.
https://doi.org/10.1155/2020/3047869. [13] K. Kowsari, et al., HMIC: Hierarchical Medical Image
[3] W. Kuo, C. Häne, P. Mukherjee, J. Malik, E.L. Yuh, Classification, a deep learning approach, Information
Expert-level detection of acute intracranial hemorrhage 11 (6) (2020) 318. Available from: https://doi.org/
on head computed tomography using deep learning, 10.3390/info11060318.
Proc. Natl. Acad. Sci. U. S. A. 116 (45) (2019). Available [14] L. Cai, J. Gao, D. Zhao, A review of the application of
from: https://doi.org/10.1073/pnas.1908021116. deep learning in medical image classification and seg-
[4] S. Santhoshkumar, V. Varadarajan, S. Gavaskar, J. mentation, Ann. Transl. Med. 8 (11) (2020) 713. Available
Jegathesh Amalraj, A. Sumathi, Machine learning from: https://doi.org/10.21037/atm.2020.02.44.
model for intracranial hemorrhage diagnosis and clas- [15] X. Liu, L. Song, S. Liu, Y. Zhang, A review of deep-
sification, Electron. 10 (21) (2021). Available from: learning-based medical image segmentation methods,
https://doi.org/10.3390/electronics10212574. Sustain 13 (3) (2021) 129. Available from: https://
[5] D.L. Labovitz, A. Halim, B. Boden-Albala, W.A. Hauser, doi.org/10.3390/su13031224.
R.L. Sacco, The incidence of deep and lobar intracerebral [16] F. Renard, S. Guedria, N. De, et al., Variability and
hemorrhage in whites, blacks, and Hispanics, Neurology reproducibility in deep learning for medical image seg-
65 (4) (2005) 518522. Available from: https://doi.org/ mentation, Sci. Rep. 10 (1) (2020) 13724. Available from:
10.1212/01.wnl.0000172915.71933.00. https://doi.org/10.1038/s41598-020-69920-0.
[6] J. Liu, et al., Prediction of hematoma expansion in spon- [17] M.H. Hesamian, W. Jia, X. He, P. Kennedy, Deep
taneous intracerebral hemorrhage using support vector learning techniques for medical image segmentation:
machine, EBioMedicine 43 (2019) 454459. Available achievements and challenges, J. Digit. Imaging 32 (4)
from: https://doi.org/10.1016/j.ebiom.2019.04.040. (2019) 582596. Available from: https://doi.org/
[7] L.R. Øie, et al., Functional outcome and survival fol- 10.1007/s10278-019-00227-x.
lowing spontaneous intracerebral hemorrhage: a retro- [18] V. Shah, R. Keniya, A. Shridharani, M. Punjabi, J.
spective population-based study, Brain Behav. 8 (10) Shah, N. Mehendale, Diagnosis of COVID-19 using CT
(2018) e01113. Available from: https://doi.org/ scan images and deep learning techniques, Emerg.
10.1002/brb3.1113. Radiol. 28 (3) (2021) 497505. Available from: https://
[8] A. Sage, P. Badura, Intracranial hemorrhage detection doi.org/10.1007/s10140-020-01886-y.
in head CT using double-branch convolutional neural [19] H. Li, Y. Pan, J. Zhao, L. Zhang, Skin disease diagnosis
network, support vector machine, and random forest, with deep learning: A review, Neurocomputing 464
Appl. Sci. 10 (21) (2020) 113. Available from: (2021) 364393. Available from: https://doi.org/
https://doi.org/10.3390/app10217577. 10.1016/j.neucom.2021.08.096.
[9] T. Lewick, M. Kumar, R. Hong, W. Wu, Intracranial [20] V. Thanikachalam, S. Shanthi, K. Kalirajan, S. Abdel-
hemorrhage detection in CT scans using deep learn- Khalek, M. Omri, L.M. Ladhar, Intelligent deep learn-
ing, in: Proc. - 2020 IEEE 6th Int. Conf. Big Data ing based disease diagnosis using biomedical tongue
Comput. Serv. Applications, BigDataService 2020, images, Comput. Mater. Contin. 70 (3) (2022)
2020, pp. 169172. Available from: https://doi.org/ 56675681. Available from: https://doi.org/10.32604/
10.1109/BigDataService49289.2020.00033. cmc.2022.020965.
[10] M.A. Al-masni, W.-R. Kim, E.Y. Kim, Y. Noh, D.-H. [21] P. Khan, et al., Machine learning and deep learning
Kim, Automated detection of cerebral microbleeds in approaches for brain disease diagnosis: principles
MR images: a two-stage deep learning approach, and recent advances, IEEE Access. 9 (2021)
NeuroImage Clin. 28 (2020) 102464. Available from: 3762237655. Available from: https://doi.org/
https://doi.org/10.1016/j.nicl.2020.102464. 10.1109/ACCESS.2021.3062484.
[11] M. Grewal, M.M. Srivastava, P. Kumar, S. [22] N. Alsharman, I. Jawarneh, GoogleNet CNN neural net-
Varadarajan, RADnet: radiologist level accuracy using work towards chest CT-coronavirus medical image clas-
deep learning for hemorrhage detection in CT scans, sification, J. Comput. Sci. 16 (5) (2020) 620625. Available
in: Proc. - Int. Symposium Biomed. Imaging, 2018, from: https://doi.org/10.3844/JCSSP.2020.620.625.
vol. 2018April, pp. 281284. Available from: [23] O.K. Oyedotun, K. Dimililer, Pattern recognition:
https://doi.org/10.1109/ISBI.2018.8363574. invariance learning in convolutional auto encoder

Applications of Artificial Intelligence in Medical Imaging


302 11. Brain hemorrhage detection using computed tomography images and deep learning

network, Int. J. Image, Graph. Signal. Process. 8 (3) hemorrhages in head CT scans, NeuroImage Clin. 32
(2016) 1927. Available from: https://doi.org/ (2021) 102785. Available from: https://doi.org/
10.5815/ijigsp.2016.03.03. 10.1016/j.nicl.2021.102785.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual [35] M.F. Mushtaq, et al., BHCNet: Neural Network-
learning for image recognition,” in Proc. IEEE Based Brain Hemorrhage Classification Using Head
Computer Soc. Conf. Computer Vis. Pattern Recognit., CT Scan, IEEE Access. 9 (2021) 113901113916.
2016, vol. 2016Decem, pp. 770778. Available from: Available from: https://doi.org/10.1109/
https://doi.org/10.1109/CVPR.2016.90. ACCESS.2021.3102740.
[25] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet [36] H. Ye, et al., Precise diagnosis of intracranial hemor-
classification with deep convolutional neural net- rhage and subtypes using a three-dimensional joint
works, Commun. ACM 60 (6) (2017). Available from: convolutional and recurrent neural network, Eur.
https://doi.org/10.1145/3065386. Radiol. 29 (11) (2019) 61916201. Available from:
[26] K. Simonyan, A. Zisserman, Very deep convolutional https://doi.org/10.1007/s00330-019-06163-2.
network. Large-scale image Recognition, 2015. [37] M.R. Arbabshirani, et al., Advanced machine learning
[27] L. Wen, X. Li, X. Li, L. Gao, A new transfer learning in action: identification of intracranial hemorrhage on
based on VGG-19 network for fault diagnosis, in: Proc. computed tomography scans of the head with clinical
2019 IEEE 23rd International Conference on Computer workflow integration, NPJ Digit. Med. 1 (1) (2018) 9.
Supported Cooperative Work in Design, CSCWD Available from: https://doi.org/10.1038/s41746-017-
2019, 2019, pp. 205209. Available from: https://doi. 0015-z.
org/10.1109/CSCWD.2019.8791884. [38] J.-L. Solorio-Ramı́rez, M. Saldana-Perez, M.D.
[28] M.A. Fayemiwo, et al., Modeling a deep transfer Lytras, M.-A. Moreno-Ibarra, C. Yáñez-Márquez,
learning framework for the classification of COVID- Brain Hemorrhage Classification in CT scan images
19 radiology dataset, PeerJ Comput. Sci. 7 (2021) using minimalist machine learning, Diagnostics 11
e614. Available from: https://doi.org/10.7717/ (8) (2021) 1449. Available from: https://doi.org/
peerj-cs.614. 10.3390/diagnostics11081449.
[29] S. Lu, Z. Lu, Y.-D. Zhang, Pathological brain detection [39] C. Yanez-Marquez, Toward the bleaching of the black
based on AlexNet and transfer learning, J. Comput. boxes: minimalist machine learning, IT Prof. 22 (4)
Sci. 30 (2019) 4147. Available from: https://doi.org/ (2020) 5156. Available from: https://doi.org/
10.1016/j.jocs.2018.11.008. 10.1109/MITP.2020.2994188.
[30] P. Kumaravel, S. Mohan, J. Arivudaiyanambi, N. [40] E.L. Yuh, A.D. Gean, G.T. Manley, A.L. Callen, M.
Shajil, H.N. Venkatakrishnan, A simplified framework Wintermark, Computer-aided assessment of head
for the detection of intracranial hemorrhage in CT computed tomography (CT) studies in patients with
brain images using deep learning, Curr. Med. Imaging suspected traumatic brain injury, J. Neurotrauma
Former. Curr. Med. Imaging Rev. 17 (10) (2021) 25 (10) (2008) 11631172. Available from: https://doi.
12261236. Available from: https://doi.org/10.2174/ org/10.1089/neu.2008.0590.
1573405617666210218100641. [41] S. Yune, H. Lee, S. Do, D. Ting, Case-based learning
[31] J.Y. Lee, J.S. Kim, T.Y. Kim, Y.S. Kim, Detection and based on artificial intelli-gence radiology atlas:
classification of intracranial haemorrhage on CT Example of intracranial hemorrhage and urinary stone
images using a novel deep-learning algorithm, Sci. detection, J. Gen. Intern. Med. 33 (Supplement 1)
Rep. 10 (1) (2020). Available from: https://doi.org/ (2018).
10.1038/s41598-020-77441-z. [42] K. Ali, Z. Shaikh, A. Khan, A. Laghari, Multiclass skin
[32] H. Lee, et al., An explainable deep-learning algorithm cancer classification using. EfficientNets a first step
for the detection of acute intracranial haemorrhage towards preventing skin cancer, Neurosci. Inform.
from small datasets, Nat. Biomed. Eng. 3 (3) (2019) 2 (4) (2022).
173182. Available from: https://doi.org/10.1038/ [43] V. Kumar, Implementing EfficientNet: a powerful con-
s41551-018-0324-9. volutional neural network. https://analyticsindiamag.
[33] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, Li Fei-Fei, com/implementing-efficientnet-a-powerful-convolu-
ImageNet: A large-scale hierarchical image database, tional-neural-network/, June 19, 2020 (accessed
in: IEEE Conference on Computer Vision and Pattern 10.02.22).
Recognition 2010, pp. 248255. Available from: [44] P. Saha, M.S. Sadi, O.F.M.R.R. Aranya, S. Jahan, F.-A.
https://doi.org/10.1109/cvpr.2009.5206848. Islam, COV-VGX: an automated COVID-19 detection
[34] X. Wang, et al., A deep learning algorithm for auto- system using X-ray images and transfer learning,
matic detection and classification of acute intracranial Inform. Med. Unlocked 26 (2021) 100741.

Applications of Artificial Intelligence in Medical Imaging


References 303
[45] A. Benali Amjoud, M. Amrouch, Convolutional neural [49] B. Shahangian, H. Pourghassem, Automatic brain
networks backbones for object detection, Lecture hemorrhage segmentation and classification in CT
Notes Computer Sci. (2020) 282289. Scan Images, in: 2013 8th Iranian Conference on
[46] A. Diker, Sıtma Hastalı ğının Sınıflandırılmasında Machine Vision and Image Processing (MVIP),
Evrişimsel Sinir Ağlarının Performanslarının 2013.
Karşılaştırılması, BEÜ Fen. Bilim. Derg. 9 (4) (2020) [50] H.J. Lee, Intracranial Hemorrhage Classification using
18251835. CNN, https://cs230.stanford.edu/projects_fall_2019/
[47] Kaggle, Head CT Hemorrhage Image Dataset, 2022, reports/26248009.pdf, Autumn 2019 (accessed
https://www.kaggle.com/mrdvolk/head-ct-hemor- 01.02.22.)
rhage-detection-with-keras (accessed 07.01.2022). [51] S. Chilamkurthy, R. Ghosh, S. Tanamala, M. Biviji,
[48] A.M. Dawud, K. Yurtkan, H. Oztoprak, Application of N.G. Campeau, V.K. Venugopal, et al., Deep learning
deep learning in neuroradiology: Brain Haemorrhage algorithms for detection of critical findings in head CT
Classification Using Transfer Learning, Computational scans: a retrospective study, Lancet 392 (10162) (2018)
Intell. Neurosci. 2019 (2019) 112. 23882396.

Applications of Artificial Intelligence in Medical Imaging


This page intentionally left blank
C H A P T E R

12
Artificial intelligence-based retinal
disease classification using optical
coherence tomography images
Sohan Patnaik1 and Abdulhamit Subasi2,3
1
Department of Mechanical Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India
2
Institute of Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 3Department of
Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

12.1 Introduction 305 12.4.3 Deep feature extraction and machine


learning 312
12.2 Related work 306
12.5 Results and discussions 312
12.3 Dataset 307
12.6 Discussion 318
12.4 Implementation details 307
12.4.1 Convolutional neural network-based 12.7 Conclusion 318
classification 307
References 319
12.4.2 Transfer learning-based
classification 309

12.1 Introduction eye, was examined in parallel by various


research groups around the world. Initially,
Starting from the work of Adolf Fercher and two-dimensional in vivo portrayal of a human
associates on low/partial coherence or white- eye fundus along a horizontal meridian was
light interferometry for in vivo ocular eye mea- found to be dependent on white light and
surements [1,2] in Vienna during the 1980s, interferometric depth scans were introduced at
imaging of natural tissue, particularly of the the ICO-15 SAT meeting in 1990 [3]. Further in

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00009-8 305 © 2023 Elsevier Inc. All rights reserved.
306 12. Artificial intelligence-based retinal disease classification using optical coherence tomography images

1990 Tanno et al. [4,5], at Yamagata University, For the basic framework, we use a convolutional
created a technology which was referred to as architecture followed by either a machine
heterodyne reflectance tomography, and spe- learning-based classification model or a fully con-
cifically in 1991, Huang et al., at Massachusetts nected neural network with softmax activation,
Institute of Technology, effectively coined the which gives the probability of each of the dis-
term “optical rational tomography.” From that eases present in the retina. It is obvious to keep
point of time, OCT with micrometer resolution in mind that the retinal image might not have a
and cross-sectional imaging capacities has disease too. So, our framework addresses that
become an extraordinary biomedical tissue- also and captures if there is no retinal disease.
imaging procedure that continuously got new
specialized abilities starting from early elec-
tronic signal detection, via utilization of broad- 12.2 Related work
band lasers and linear pixel arrays to ultrafast
tunable lasers to expand its performance and The execution of autograding framework
sensitivity envelope. Ocular (or ophthalmic) algorithms for optical coherence tomography
OCT is used heavily by ophthalmologists and (OCT) images has undergone a long turn of
optometrists to obtain high-resolution images events. With the emergence of the artificial intel-
of the retina and anterior segment. ligence, specialists have investigated the diagnos-
With regard to OCT’s capability to show cross- tic instrument with various strategies. The use of
sections of tissue layers with micrometer resolu- an autograding framework has also encountered
tion, OCT provides a straightforward and effec- a long turn of events. Prior algorithms started
tive method of assessing cellular organization, with an image segmentation model. Similar to
photoreceptor integrity [69], axonal thickness in the methodology that human experts use, the
glaucoma [10], macular degeneration [11], diabetic segmentation algorithms detected the edge of the
macular edema [12], multiple sclerosis, and other features and made diagnoses by a binary classifi-
eye diseases or systemic pathologies which have cation algorithm [16]. As the convolutional neu-
ocular signs [13]. Additionally, ophthalmologists ral network (CNN) came into picture, it was
leverage OCT to assess the vascular health of the gradually implemented in the classification
retina via a technique called OCT angiography model. One study endeavored to utilize CNN to
[14]. In ophthalmological surgery, especially reti- perceive the features and make classifications
nal surgery, an OCT can be mounted on the [17]. In recent years, some CNN models have
microscope. Such a system is called an intraopera- been modified to achieve higher accuracy [18,19].
tive OCT (iOCT) and provides support during the Since the start of the 21st century, OCT technol-
surgery with clinical benefits [15]. ogy is more frequently used in detecting the fea-
Diagnosing the diseases from the retinal cross- tures of Age-related macular degeneration (AMD)
sectional images obtained using OCT is still a and Diabetic Macular Edema (DME) [2022], with
challenge when the number of images is very the increasing desire for OCT image autograding,
high. Here comes the need of an intelligent agent many research communities have advanced assets
with better computational skills than a normal in this field to attempt to accomplish more accurate
human being. Deep learning has made remark- models. Improvement of automated image classifi-
able progress in the field of image classification. cation/grading frameworks started with an auto-
Keeping that in mind, we propose a deep mated segmentation algorithm [16]. In 2014 Ehlers
learning-based diagnosis of three types of retinal et al. [16] proposed an automated classification
diseases—drusen, diabetic macular edema framework to perceive AMD and DME. They
(DME), and choroidal neovascularization (CNV). used image segmentation to detect the particular

Applications of Artificial Intelligence in Medical Imaging


12.4 Implementation details 307
features (RNFL and drusen) of AME and DME an OCT interpretation course review. This first
and made it the identifier of the classification. tier of graders conducted initial quality control
In 2017 Rogers et al. [23] introduced an auto- and excluded OCT images containing severe arte-
matic segmentation framework based on facts or significant image resolution reductions.
CNNs and graph search strategies (CNN-GS). The second tier of graders consisted of four
They segmented the OCT images of nonexuda- ophthalmologists who independently graded
tive AMD patients to detect nine retinal layer each image that had passed the first tier. The pres-
units. The algorithm first generated likelihood ence or absence of CNV (active or in the form of
maps by the CNN model. Then they used the subretinal fibrosis), macular edema, drusen, and
CNN probability map to accomplish layer seg- other pathologies visible on the OCT scan were
mentation. This was a novel use of a CNN that recorded. Finally, a third level of two trained pro-
customized the CNN and integrated with a fessionals and retinal specialists, each with over
segmentation algorithm. Fang et al. [24] pro- 20 years of medical retina experience, verified the
posed surrogate-assisted retinal OCT image true labels for each image. To account for human
classification based on CNN for AMD and error in grading, a validation subset of 993 scans
DME. In 2018 Kermany et al. [18] collected and was graded separately by two ophthalmologist
processed a vast number of OCT images and graders, with disagreement in clinical labels arbi-
built a CNN model dependent on the denoised trated by a senior retinal specialist. Some images
images. The dataset they processed has been are shown in Fig. 12.1.
published and is publicly available, and we
have used that dataset for our research.
12.4 Implementation details
12.3 Dataset In this section, we elaborate the three
approaches that we incorporated to accomplish
The dataset used for our research was pre- the task of classifying the image whether it con-
pared by Kermany et al. [25,26] in 2018. The tains the three diseases mentioned earlier or it is
dataset is organized into three folders (train, normal. The codebase is written in python and
test, val) and contains subfolders for each image tensorflow was the library used to design the
category (NORMAL, CNV, DME, DRUSEN). neural network architecture. For the machine
There are 84,495 X-Ray images (JPEG) and 4 cat- learning-based classification, sklearn package was
egories (NORMAL, CNV, DME, DRUSEN). used. All the neural network-based models were
Images are labeled as (disease)(randomized trained for 10 epochs which took from 45 minutes
patient ID)(image number by this patient) and to 1 hour 20 minutes depending upon the size
split into four directories: CNV, DME, DRUSEN, of the model. The random seed for training pur-
and NORMAL. poses was set to 0.
Before the data was published by Kermany
et al. [25], each image was taken through a tiered
grading framework consisting of different layers
12.4.1 Convolutional neural network-based
of trained graders of increasing expertise for
checking and verifying image labels. Each image
classification
brought into the dataset began with a label match- CNN stands for convolutional neural networks.
ing the most recent diagnosis of the patient. The These are a class of neural networks that are
first tier of graders consisted of undergraduate extensively used to deal with tasks involving
and medical students who had taken and passed images. Let’s say we have an image. In order to

Applications of Artificial Intelligence in Medical Imaging


308 12. Artificial intelligence-based retinal disease classification using optical coherence tomography images

capture certain interpretable information, we first between the convolutional layers. The best
need a feature representation of the image. This is accuracy and F1 score on the test set was
accomplished by using the convolution operator obtained using the 7-layered CNN, the archi-
between the image and some lower dimensional tecture of which is shown in Fig. 12.2. The
kernels. After several layers of convolution, we input image was mapped to 150 3 150 3 3,
arrive at a small feature map that indeed captures where 3 is the number of channels (RGB) and
the features of the image such as edges, colors, 150 3 150 is the spatial layout of the image.
spatial layout, etc. Moreover, a typical CNN archi- For the classification stage, the output feature
tecture also incorporates certain pooling layers map of the convolutional network was flat-
such as max pooling and average pooling which tened and then passed to a fully connected
just reduce the dimension of the feature maps layer in order to get a vector representation
without any learnable parameters. of the image. Finally, a softmax layer with
For our research, we experimented with four neurons was employed to obtain the
networks with 2- to 8-layered CNNs along probabilities of the four classes (three dis-
with some max-pooling layers embedded in eases and normal).

FIGURE 12.1 Retinal OCT images. OCT, Optical coherence tomography.

Normal

CNV

DME

Drusen

OCT Image Convolution Pooling Convolution Pooling


Fully
Connected
FIGURE 12.2 CNN’s architecture. CNN, Convolutional neural network.

Applications of Artificial Intelligence in Medical Imaging


#Create the Model
model = tf.keras.models.Sequential[

tf.keras.layers.Conv2D(16, (3, 3), activation = 'relu',


input_shape = INPUT_SHAPE),
tf.keras.layers.MaxPooling2D(2, 2),

tf.keras.layers.Conv2D(16, (3, 3), activation = 'relu'),


tf.keras.layers.Conv2D(32, (3, 3), activation = 'relu'),
tf.keras.layers.MaxPooling2D(2, 2),

tf.keras.layers.Conv2D(64, (3, 3), activation = 'relu'),


tf.keras.layers.Conv2D(64, (3, 3), activation = 'relu'),
tf.keras.layers.MaxPooling2D(2, 2),

tf.keras.layers.Conv2D(32, (3, 3), activation = 'relu'),


tf.keras.layers.MaxPooling2D(2, 2),

tf.keras.layers.Conv2D(16, (3, 3), activation = 'relu'),


tf.keras.layers.MaxPooling2D(2, 2),

tf.keras.layers.Flatten(),

tf.keras.layers.Dense(4, activation = 'softmax')


]

# Define the metrics list


metrics_list = ['accuracy',
tf.keras.metrics.AUC(),
tfa.metrics.CohenKappa(num_classes = 4),
tfa.metrics.F1Score(num_classes = 4)]

# Compile the model


model.compile(loss = 'categorical_crossentropy', optimizer = 'adam',
metrics = metrics_list)
# Fit the model and store the history
history = model.fit_generator(
train_generator,
steps_per_epoch = (83484/100),
epochs = 10,
validation_data = validation_generator,
validation_steps = (32/16),
verbose = 1)

# Obtain the test predictions


y_pred = model.predict(test_generator)

12.4.2 Transfer learning-based classification


The basic premise of transfer learning is convolutional layers extract general, low-level
simple: take a model trained on a large dataset features that are applicable across images
and transfer its knowledge to a smaller data- such as edges, patterns, and gradients and the
set. For object recognition with a CNN, we later layers identify specific features within an
freeze the early convolutional layers of the image such as eyes or wheels. Thus we can
network and only train the last few layers use a network trained on unrelated categories
which make a prediction. The idea is that the in a massive dataset (usually ImageNet) and
apply it to our own problem because there are VGG19, Inceptionv3, MobileNet, DenseNet169,
universal, low-level features shared between DenseNet121, InceptionResNetV2, MobileNetV2,
images. and ResNet101. The feature maps obtained from
In our experiment, we used 10 pretrained CNN- these pretrained models were passed through
based models which were trained on ImageNet some fully connected layers before the softmax
dataset. The models used were ResNet50, VGG16, layer. A pretrained VGG19 model is given below.

# Import libraries
import tensorflow as tf
import tensorflow_addons as tfa
# Download VGG19 Model
vgg19 = tf.keras.applications.VGG19(
include_top = False,
weights = 'imagenet',
input_tensor = None,
input_shape = INPUT_SHAPE,
pooling = None,
classes = 1000
)

# Freeze the early convolutional layers of the VGG19 Pretrained Model


vgg19.trainable = False
# Create a Transfer Learning Model by Adding New Layers
model = tf.keras.models.Sequential[
vgg19,
tf.keras.layers.Conv2D(64, (3, 3), activation = 'relu'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(100, activation = 'relu'),
tf.keras.layers.Dense(4, activation = 'softmax')
]

# Define the Metrics


metrics_list = ['accuracy',
tf.keras.metrics.AUC(),
tfa.metrics.CohenKappa(num_classes = 4),
tfa.metrics.F1Score(num_classes = 4)]

# Train the model


model.compile(loss = 'categorical_crossentropy', optimizer = 'adam',metrics=metrics_list)
history = model.fit_generator(
train_generator,
steps_per_epoch = (83484/100),
epochs = 10,
validation_data = validation_generator,
validation_steps = (32/16),
verbose = 1)
# Validate and test the model
model.predict(test_generator, steps = int(968/44))
model.evaluate(test_generator)
# Plot the training and validation performance curves
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))
plt.figure(figsize=(7,7))
plt.plot(epochs, acc, 'r', label = 'Training accuracy')
plt.plot(epochs, val_acc, 'b', label = 'Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure(figsize = (7,7))
plt.plot(epochs, loss, 'r', label = 'Training Loss')
plt.plot(epochs, val_loss, 'b', label = 'Validation Loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

#Print the Confusion Matrix and Classification Results


from sklearn.metrics import confusion_matrix, classification_report
import pandas as pd

Y_pred = model.predict(test_generator, int(968/44))


y_pred = np.argmax(Y_pred, axis = 1)

cm = confusion_matrix(test_generator.classes, y_pred)
df_cm = pd.DataFrame(cm, list(test_generator.class_indices.keys()),
list(test_generator.class_indices.keys()))

fig, ax = plt.subplots(figsize = (10,8))


sns.set(font_scale = 1.4) # for label size
sns.heatmap(df_cm, annot = True, annot_kws = {"size": 16}, cmap = plt.cm.Blues)
plt.title('Confusion Matrix\n')
plt.savefig('confusion_matrix.png', transparent = False, bbox_inches = 'tight', dpi = 400)
plt.show()

print('Classification Report\n')
target_names = list(test_generator.class_indices.keys())
print(classification_report(test_generator.classes, y_pred, target_names = target_names))
312 12. Artificial intelligence-based retinal disease classification using optical coherence tomography images

ANN
K-NN
SVM
RF
Adaboost
XGBoost

OCT Image Deep Feature Extraction Classification

FIGURE 12.3 Deep feature extraction’s architecture.

12.4.3 Deep feature extraction and normal retina. We used six machine learning
machine learning models, namely, artificial neural network,
K-nearest neighbors, support vector machines,
It is quite intuitive that the pretrained convolu- random forest classifier, AdaBoost classifier, and
tional models indeed capture the low-level fea- XGBoost classifier. A basic picture of the feature
tures in the image. Keeping that in mind, we first extraction model on top of which the machine
extracted the feature maps from the pretrained learning-based classification model was employed
CNN models and flattened them to obtain a vec- is shown in Fig. 12.3.

# Import the required libraries


import numpy as np
import tensorflow as tf
import tensorflow_addons as tfa
from sklearn.ensemble import AdaBoostClassifier
from keras.applications.vgg16 import VGG16

#Create the Model


model = VGG16(weights = “imagenet”, include_top = False)

# Obtain the train and test features


x_train = model.predict(train_generator)
x_test = model.predict(test_generator)

# Fit the classifier


# Default parameters, change to obtain required accuracy
classifier = AdaBoostClassifier()
classifier = classifier.fit(x_train, y_train)
y_preds = classifier.predict(x_test)

12.5 Results and discussions


tor representation of the image. After this, some
machine learning-based classification models The results for the test set to classify the reti-
were trained on the feature vectors obtained to nal disease using the simple CNN-based archi-
get the correct class among the three diseases and tecture is shown in Table 12.1. It can be clearly

Applications of Artificial Intelligence in Medical Imaging


12.5 Results and discussions 313
seen that the accuracy for a 7-layered CNN is From the plot, we can see that the training
equal to 0.9948 and so is the F1 score. After accuracy decreases as we increase the number
proper hyperparameter tuning, we arrived at of convolutional layers but the test accuracy
the best CNN-based architecture. The insight increases. The reason for this can be attributed
behind this impressive high accuracy accounts to the fact that more the convolutional layers,
for the ability of CNNs which capture the intri- more high-level information is being captured
cate features in the image with high precision. by the model and less is the overfitting. Owing
Moreover, proper hyperparameter tuning to to this fact, the accuracy on the test set is also
decide the size of filters, strides and the num- increasing. It is also a matter of fact that as the
ber of max-pooling layers to be included in the number of layers is increasing, we may require
architecture are also important to achieve to train the model for a greater number of
promising results. epochs owing to the reason that the number of
The accuracy plot for the train, validation, parameters also increases, so more training
and test set is shown in Fig. 12.4. iterations would be needed.

TABLE 12.1 Results for convolutional neural network (CNN) models.


Classifier Training accuracy Validation accuracy Test accuracy F1 score Kappa ROC area

2 Layer 0.9824 0.9062 0.9298 0.9309 0.9105 1.0


3 Layer 0.949 1 0.9638 0.9639 0.9518 0.9973

4 Layer 0.9487 0.9688 0.9742 0.9743 0.9656 0.999


5 Layer 0.9543 1 0.9897 0.9897 0.9862 0.9999
6 Layer 0.9393 1 0.9804 0.9804 0.9738 0.9995
7 Layer 0.95 1.00 0.9948 0.9948 1 0.9931
8 Layer 0.9529 1 0.9897 0.9897 0.9862 0.9999

Accuracy vs Number of Convoluonal Layers


1.00

0.98
Accuracy

0.96

0.94

0.92 train accuracy


val accuracy
test accuracy

2 3 4 5 6 7 8
Number of Convoluonal Layers
FIGURE 12.4 Train, validation, and test accuracy.

Applications of Artificial Intelligence in Medical Imaging


314 12. Artificial intelligence-based retinal disease classification using optical coherence tomography images

The results for the test set on pretrained be attributed to the fact that pretraining helps
image recognition models which have been the models to understand and capture certain
trained previously on ImageNet dataset are low-level features, such as edges, gradients of
shown in Table 12.2. Here, we can see that the color, shapes, etc.
maximum accuracy and F1 score, that is, From Fig. 12.5, we can observe that almost
0.9628 is obtained for VGG19. Good results can all the pretrained image recognition models

TABLE 12.2 Results for pretrained models.

Classifier Training accuracy Validation accuracy Test accuracy F1 score Kappa ROC area

ResNet50 0.7551 0.6875 0.75 0.7299 0.6667 0.9338


VGG16 0.9346 0.875 0.939 0.9388 0.9187 0.9944
VGG19 0.9197 0.9688 0.9628 0.9628 0.9504 0.9962
InceptionV3 0.915 0.9375 0.8988 0.8999 0.865 0.9854
MobileNet 0.971 0.875 0.9473 0.9478 0.9298 0.9931
DenseNet169 0.9386 0.9375 0.9421 0.9423 0.9229 0.9935

DenseNet121 0.929 0.9062 0.9525 0.9527 0.9366 0.9942


InceptionResNetV2 0.9159 0.9375 0.9329 0.932 0.9105 0.9937
MobileNetV2 0.9471 0.9375 0.9576 0.9574 0.9435 0.9956
ResNet101 0.7514 0.75 0.6849 0.6541 0.5799 0.9133

Accuracy vs Pre-trained model


train accuracy
val accuracy
0.95 test accuracy

0.90

0.85
Accuracy

0.80

0.75

0.70

ResNet50 VGG16 VGG19 Incepon_v3 MobileNet DenseNet169 DenseNet121 InceponResNetv2 MobileNetv2 ResNet101
Pre-trained Model

FIGURE 12.5 Accuracy of pretrained models.

Applications of Artificial Intelligence in Medical Imaging


12.5 Results and discussions 315
perform to the same extent except for both the are shown in Table 12.312.13. The different
variants of the ResNet model. As the pre- feature extractors in this case are VGG16,
trained models have a huge number of para- VGG19, ResNet50, ResNet101, MobileNetv2,
meters, training for more epochs might fetch MobileNet, Inceptionv3, InceptionResNetv2,
better results. DenseNet169, DenseNet121, and Xception. The
Now, the results for the feature extraction typical machine learning-based classification
and machine learning model based classification models are artificial neural networks, K-nearest

TABLE 12.3 Results for features extracted from VGG16.

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.8528 0.7812 0.8023 0.8104 0.8327 0.8524 0.7724


KNN 0.8324 0.75 0.7926 0.8034 0.8524 0.8254 0.7826
SVM 0.8879 0.8437 0.8726 0.8636 0.9025 0.8445 0.8836
Random Forest 0.9524 0.9375 0.9334 0.9539 0.9542 0.9722 0.9364
AdaBoost 0.9437 0.9375 0.9216 0.9133 0.9345 0.9417 0.8867
XGBoost 0.8765 0.8437 0.8236 0.8565 0.8621 0.7912 0.8336

TABLE 12.4 Results for features extracted from VGG19.


Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.8645 0.8125 0.8224 0.8175 0.8346 0.8005 0.8354


KNN 0.8445 0.7812 0.8356 0.8344 0.8412 0.8065 0.8644
SVM 0.8769 0.875 0.8456 0.8414 0.8521 0.8612 0.8225
Random Forest 0.9412 0.9375 0.9399 0.9353 0.9465 0.9264 0.9444

AdaBoost 0.9566 0.96875 0.9216 0.9207 0.9366 0.9625 0.8824


XGBoost 0.9026 0.9375 0.8845 0.8777 0.8934 0.8916 0.8643

TABLE 12.5 Results for features extracted from ResNet50.


Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.7845 0.7812 0.7656 0.7511 0.7622 0.7411 0.7614


KNN 0.8065 0.8125 0.7923 0.7894 0.8041 0.7922 0.7867
SVM 0.8366 0.8125 0.8244 0.8238 0.8491 0.8014 0.8475
Random Forest 0.8627 0.875 0.8465 0.8371 0.8538 0.8477 0.8269
AdaBoost 0.8689 0.875 0.8544 0.8541 0.8512 0.8421 0.8666

XGBoost 0.8412 0.8437 0.8214 0.818 0.8265 0.8355 0.8014

Applications of Artificial Intelligence in Medical Imaging


316 12. Artificial intelligence-based retinal disease classification using optical coherence tomography images

TABLE 12.6 Results for features extracted from ResNet101.


Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.8016 0.7812 0.7841 0.7799 0.7955 0.8005 0.7605


KNN 0.8245 0.8125 0.7944 0.7885 0.8143 0.8142 0.7644
SVM 0.8411 0.875 0.8321 0.8312 0.8314 0.8332 0.8294
Random Forest 0.8665 0.8465 0.8511 0.8514 0.8444 0.8741 0.83
AdaBoost 0.8545 0.8465 0.8512 0.8522 0.8693 0.8765 0.8294
XGBoost 0.8744 0.875 0.8497 0.8539 0.85 0.8742 0.8346

TABLE 12.7 Results for features extracted from MobileNetV2.

Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.9014 0.875 0.8821 0.8822 0.9147 0.8641 0.9011

KNN 0.8944 0.875 0.8912 0.8944 0.8825 0.8814 0.9078


SVM 0.9246 0.9375 0.9102 0.9034 0.9176 0.8925 0.9146
Random Forest 0.9564 0.9375 0.9234 0.9204 0.9264 0.8947 0.9478
AdaBoost 0.9677 0.9687 0.9541 0.9499 0.9645 0.9388 0.9614
XGBoost 0.9412 0.9375 0.9368 0.9339 0.9489 0.9266 0.9414

TABLE 12.8 Results for features extracted from MobileNet.


Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.8528 0.7812 0.8023 0.8104 0.8327 0.8524 0.7724


KNN 0.8065 0.8125 0.7923 0.7894 0.8041 0.7922 0.7867
SVM 0.8769 0.875 0.8456 0.8414 0.8521 0.8612 0.8225
Random Forest 0.8627 0.875 0.8465 0.8371 0.8538 0.8477 0.8269
AdaBoost 0.9026 0.9375 0.8845 0.8777 0.8934 0.8916 0.8643

XGBoost 0.8879 0.8437 0.8726 0.8636 0.9025 0.8445 0.8836

Applications of Artificial Intelligence in Medical Imaging


12.5 Results and discussions 317
TABLE 12.9 Results for features extracted from InceptionV3.
Model Training accuracy Validation accuracy Test accuracy F1_score Kappa Recall Precision

ANN 0.9144 0.9375 0.8971 0.8965 0.8964 0.9058 0.8874


KNN 0.8926 0.75 0.8533 0.8523 0.8935 0.8743 0.8315
SVM 0.9137 0.9375 0.8546 0.8597 0.8744 0.8744 0.8456
Random Forest 0.9564 0.9375 0.8947 0.9025 0.9348 0.9137 0.8916
AdaBoost 0.9677 0.9687 0.9244 0.9254 0.9513 0.9369 0.9142
XGBoost 0.9456 0.9375 0.9173 0.9137 0.9577 0.9315 0.8966

TABLE 12.10 Results for features extracted from InceptionResNetV2.


Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.8665 0.8465 0.8511 0.8514 0.8444 0.8741 0.83


KNN 0.8689 0.875 0.8544 0.8541 0.8512 0.8421 0.8666
SVM 0.8769 0.875 0.8456 0.8414 0.8521 0.8612 0.8225
Random Forest 0.9246 0.9375 0.9102 0.9034 0.9176 0.8925 0.9146
AdaBoost 0.9026 0.9375 0.8845 0.8777 0.8934 0.8916 0.8643
XGBoost 0.9026 0.9375 0.8845 0.8777 0.8934 0.8916 0.8643

TABLE 12.11 Results for features extracted from DenseNet169.


Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.9014 0.875 0.8821 0.8822 0.9147 0.8641 0.9011


KNN 0.8769 0.875 0.8456 0.8566 0.8521 0.8612 0.8225
SVM 0.8944 0.875 0.8912 0.8944 0.8825 0.8814 0.9078
Random Forest 0.9437 0.9375 0.9216 0.9133 0.9345 0.9417 0.8867

AdaBoost 0.9564 0.9375 0.9234 0.9204 0.9264 0.8947 0.9478


XGBoost 0.9026 0.9375 0.8845 0.8777 0.8934 0.8916 0.8643

neighbors, support vector machines, Random conclude that the 7-layered CNN architecture
Forest classifier, AdaBoost classifier, and proposed by us is the best model when
XGBoost classifier. From the tables, we can employed on the test set.

Applications of Artificial Intelligence in Medical Imaging


318 12. Artificial intelligence-based retinal disease classification using optical coherence tomography images

TABLE 12.12 Results for features extracted from DenseNet121.


Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.9014 0.875 0.8821 0.8822 0.9147 0.8641 0.9011


KNN 0.8944 0.875 0.8912 0.8944 0.8825 0.8814 0.9078
SVM 0.9246 0.9375 0.9102 0.9034 0.9176 0.8925 0.9146
Random Forest 0.9524 0.9375 0.9334 0.9539 0.9542 0.9722 0.9364
AdaBoost 0.9437 0.9375 0.9216 0.9133 0.9345 0.9417 0.8867
XGBoost 0.8765 0.8437 0.8236 0.8118 0.8621 0.7912 0.8336

TABLE 12.13 Results for features extracted from XceptionNet.


Model Training accuracy Validation accuracy Test accuracy F1 score Kappa Recall Precision

ANN 0.9146 0.875 0.8821 0.8839 0.8885 0.9045 0.8644


KNN 0.8963 0.875 0.8912 0.9006 0.8736 0.9073 0.8941
SVM 0.9244 0.9375 0.9102 0.9137 0.9002 0.9452 0.8844
Random Forest 0.9752 0.9687 0.9234 0.9302 0.9146 0.9241 0.9364

AdaBoost 0.9644 0.9687 0.9541 0.9531 0.9436 0.9647 0.9418


XGBoost 0.9348 0.9375 0.9368 0.9364 0.9389 0.9536 0.92

12.6 Discussion DME, patients take OCT images frequently to


monitor the changes in their disease. The integra-
An electronic medical record (EMR) is a system tion of the autograding OCT image classification
composed of the patient’s medical treatment files system and EMR system can greatly improve the
such as the records of text, symbols, charts, efficiency of retinopathy treatments. This system
images, and data slices. The impressive data in could provide a better quality of healthcare.
the EMR system is obtained, sorted, and analyzed With better access to test results and automatic
by medical staff through outpatient, physical diagnoses, the time that doctors spend on recog-
examination, auxiliary examination, diagnoses, nizing the test result can be considerably
treatment, nursing, and other medical activities. It reduced, and patients could be aware of their
provides the most practical and abundant data test results and treatment method.
for health management, medical diagnoses, treat-
ment, and scientific research. EMR significantly
improved medical quality, management level, 12.7 Conclusion
and academic ability. Besides, it is a cost-saving
approach not only for paper and folders but also In this chapter, we proposed robust architec-
for labor and storage space. tures to detect the retinal diseases such as drusen,
This autograding OCT image classification DME, and CNV from normal retina. Our work
system is a brief diagnostic tool that can be used obtained an astounding high accuracy of 0.9948
in an EMR system. In the treatment of AMD and and an F1 score of 0.9948 on the test set as

Applications of Artificial Intelligence in Medical Imaging


References 319
prepared by Kermany et al. [26]. As CNNs are diabetic retinopathy, Cochrane Database Syst. Rev.
gaining importance in medical imaging and diag- (2015). January.
[13] J. Drr, K.D. Wernecke, M. Bock, G. Gaede, J.T.
nosis tasks, we also provided an intuitive under- Wuerfel, C.F. Pfueller, et al., Association of Retinal
standing why this is so. For the future, we would and Macular Damage with Brain Atrophy in Multiple
like to work upon zero shot learning and semisu- Sclerosis, PLoS ONE 6 (4) (2011) e18132. April.
pervised learning where we will only be given [14] T. Aik Kah, CuRRL syndrome: a case series, Acta Sci.
normal retinal images and the model would try Ophthalmol. 1 (3) (2018).
[15] A.H. Kashani, C.L. Chen, J.K. Gahm, F. Zheng, G.M.
to identify whether an image is defective or not. Richter, P.J. Rosenfeld, et al., Optical coherence tomog-
raphy angiography: a comprehensive review of cur-
rent methods and clinical applications, Prog. Retinal
References Eye Res. 60 (2017). September.
[16] J.P. Ehlers, Y.K. Tao, S.K. Srivastava, The value of
[1] A.F. Fercher, E. Roth, G.J. Mueller (Ed.). Ophthalmic laser intraoperative OCT imaging in vitreoretinal surgery,
interferometry, in: Proc. SPIE. Optical Instrumentation Curr. Opin. Ophthalmol. 25 (2014). May.
for Biomedical Laser Applications, 15 September 1986, [17] J. Sugmk, S. Kiattisin, A. Leelasantitham, Automated
658: 4851. classification between age-related macular degenera-
[2] A.F. Fercher, K. Mengedoht, W. Werner, Eye-length tion and Diabetic macular edema in OCT image using
measurement by interferometry with partially coher- image segmentation, in: The 7th 2014 Biomedical
ent light, Opt. Letters. 13 (3) (1988) 1868. March. Engineering International Conference, 2014.
[3] A.F. Fercher. Ophthalmic interferometry. in: G. von [18] D.S. Kermany, et al., Identifying medical diagnoses
Bally, S. Khanna (Eds.), Proc. International Conference and treatable diseases by image-based deep learning,
on Optics in Life Sciences. Garmisch-Partenkirchen, Cell 172 (5) (2018) 11221131. e9.
Germany, 1216 August 1990, pp. 221228. [19] J. Wang, et al., Deep learning for quality assessment of
[4] N. Tanno, T. Ichikawa, A. Saeki Lightwave reflection retinal OCT images, Biomed. Opt. Express 10 (12) (2019)
measurement, Japanese Patent # 2010042, 1990 (Japanese 60576072.
Language). [20] T. Tsuji, et al., Classification of optical coherence
[5] S. Chiba, N. Tanno, Backscattering optical heterodyne tomography images using a capsule network, BMC
tomography, in: 14th Laser Sensing Symposium (in Ophthalmol 20 (1) (2020) 114.
Japanese), 1991. [21] S.W. Kang, C.Y. Park, D.-I. Ham, The correlation between
[6] J. Sherman, D. Epshtein, The ABCs of OCT, Review of fluorescein angiographic and optical coherence tomo-
Optometry, 2012. graphic features in clinically significant diabetic macular
[7] J. Sherman, Photoreceptor integrity line joins the nerve edema, Am. J. Ophthalmol. 137 (2) (2004) 313322.
fiber layer as key to clinical diagnosis, Optometry. 80 [22] M.R. Hee, et al., Optical coherence tomography of age-
(6) (2009) 2778. June. related macular degeneration and choroidal neovascu-
[8] M.A. Bonini Filho, A.J. Witkin, Outer retinal layers as pre- larization, Ophthalmology 103 (8) (1996) 12601270.
dictors of vision loss, Review of Ophthalmology, 2015. [23] A.H. Rogers, A. Martidis, P.B. Greenberg, C.A. Puliafito,
[9] N. Cuenca, I. Ortuo-Lizarn, I. Pinilla, Cellular charac- Optical coherence tomography findings following pho-
terization of OCT and outer retinal bands using spe- todynamic therapy of choroidal neovascularization,
cific immunohistochemistry markers and clinical Am. J. Ophthalmol. 134 (4) (2002) 566576.
implications, Ophthalmology (2018). March. [24] L. Fang, D. Cunefare, C. Wang, R.H. Guymer, S. Li, S.
[10] D.S. Grewal, A.P. Tanna, Diagnosis of glaucoma and Farsiu, Automatic segmentation of nine retinal layer
detection of glaucoma progression using spectral boundaries in OCT images of non-exudative AMD
domain optical coherence tomography, Curr. Opin. patients using deep learning and graph search,
Ophthalmol. 24 (2) (2013) 15061. March. Biomed. Opt. Express 8 (5) (2017) 27322744.
[11] P.A. Keane, P.J. Patel, S. Liakopoulos, F.M. Heussen, S. [25] D.S. Kermany, M. Goldbaum, W. Cai, C.C. Valentim,
R. Sadda, A. Tufail, Evaluation of age-related H. Liang, S.L. Baxter, et al., Identifying medical diag-
macular degeneration with optical coherence tomogra- noses and treatable diseases by image-based deep
phy, Surv. Ophthalmol. 57 (5) (2012) 389414. September. learning, Cell 172 (5) (2018) 11221131.
[12] G. Virgili, F. Menchini, G. Casazza, R. Hogg, R.R. Das, [26] D. Kermany, K. Zhang, M. Goldbaum, Labeled optical
X. Wang, et al., Optical coherence tomography (OCT) coherence tomography and chest X-Ray images for
for detection of macular oedema in patients with classification, Mendeley Data v2 (2018).

Applications of Artificial Intelligence in Medical Imaging


This page intentionally left blank
C H A P T E R

13
Diagnosis of breast cancer from
histopathological images with deep
learning architectures
Emrah Hancer1 and Abdulhamit Subasi2,3
1
Department of Software Engineering, Mehmet Akif Ersoy University, Burdur, Turkey 2Institute of
Biomedicine, Faculty of Medicine, University of Turku, Turku, Finland 3Department of Computer
Science, College of Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

13.1 Introduction 321 13.3.1 Experimental setup 329


13.3.2 Experimental results 331
13.2 Materials and methods 323
13.2.1 Dataset 323 13.4 Conclusion 332
13.2.2 Methods 324
References 332
13.3 Results and discussions 329

13.1 Introduction The diagnosis process first starts with clinical


analysis through visual imaging technologies,
Being one of the most dangerous types of such as ultrasound, mammography, or mag-
cancer, breast cancer has a significant impact netic resonance imaging. A needle tissue
on the death of women between 20 and 59 biopsy is then applied to patients with high
years old. According to the American Cancer probability of breast malignancy by using the
Society, approximately 268,600 new cases was hematoxylin and eosin (H&E) stain protocol.
reported for breast cancer and 41,760 people While hematoxylin has a dark purple- or blue-
died because of it in 2019. Despite the fact, the colored visual effect on nucleus structures,
diagnosis of breast cancer in its earlier stages eosin stains other structures into red, pink, and
can significantly increase the survival rate [1]. orange shades. Histopathological breast images

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00002-5 321 © 2023 Elsevier Inc. All rights reserved.
322 13. Diagnosis of breast cancer from histopathological images with deep learning architectures

are considered at different magnifications to from the perspective of machine learning and
investigate the cellular and tissue level varia- image processing have resulted in the emergence
tions. For instance, the tissue patterns are of deep learning discipline, which has widely
investigated at 100 3 magnification, while the attracted the interest of researchers from different
size and shape of nuclei structures are investi- fields. Deep learning architectures, especially con-
gated at 400 3 magnification. After obtaining volutional neural networks (CNNs), can extract
information from these features, pathologists intrinsic features from raw image data without
can determine a tumor slide as benign and the requirement of more effort. Therefore CNNs
malignant. In case of malignancy, a further have widely been adopted in CAD systems,
analysis is carried out to grade the tumor and resulting in a groundbreaking performance in
based on the grade a treatment is advised to biomedical applications, especially diagnostic
the corresponding patient. Breast cancer may pathology [4].
be in the form of different types and each type CNNs are similar to conventional neural net-
has its own microscopic features [2]. works in such a way that they are built on neu-
Pathologists examine morphological features rons with learnable weights and biases. Each
of H&E stained tissue samples from related neuron takes some inputs and then performs a
breast regions under a microscope to establish a transformation process. The whole network still
definitive diagnosis. Any differences observed represents a differentiable function that trans-
in any features of the interested region is forms a raw image data to a class score.
regarded as abnormal and then a confirmation Moreover, a loss function (e.g., softmax) is still
process is carried out to verify it as a malignant apparent on the last (denoted as fully con-
tumor. Pathologists also need to grade the nected) layer and all the learning procedures for
tumor to examine the degree of cancer in some conventional neural networks are still per-
cases [3]. Unfortunately, the visual analysis formed. So, what is the difference between
manually carried out by the pathologists is an CNNs and conventional neural networks? The
error-prone, tiresome, and subjective task, caus- inputs of CNNs are images and so CNNs allow
ing inevitable errors in decisions. To ease the us to encode the characteristics of them into the
workload on experts and/or pathologists and architecture. The forward function can then be
improve the efficiency of the diagnosis perfor- implemented more efficiently, thereby reducing
mance, researchers have focused on automating the number of parameters in the network.
the diagnosis process through computer-aided Thanks to the effectiveness of CNNs in CAD
diagnosis (CAD) systems in recent years. The systems, the diagnosis process of breast cancer
basic steps of CAD systems are as follows: (1) from histopathological images has aroused the
preprocessing, (2) segmentation, (3) feature interest of researchers. Especially with the relea-
extraction, and (4) classification. The CAD sys- sement of the largest publicly available datasets,
tems wrapped around traditional machine the studies have rapidly been increased to
learning methods use specified classifiers over a develop CNN-based automated systems to deal
set of handcrafted features obtained from histo- with these datasets. Spanhol et al. [5] used a
pathology images to predict the output labels. variant of AlexNet [6] to form an automated
However, the classification performance of them breast cancer diagnosis method on histopatho-
is not competitive and even far from need. logical images created based on a set of pixel
Moreover, extracting handcrafted features is a patches generated by using random strategies
computationally intensive and complex process and sliding window. The obtained accuracy
due to the requirement of extensive prior-domain was between 81% and 89%. In another work,
knowledge. Fortunately, the recent advancements Spanhol et. al. [7] extracted deep features using

Applications of Artificial Intelligence in Medical Imaging


13.2 Materials and methods 323
a pretrained CNN architecture (BVLC CaffeNet) its performance may deteriorate due to local
and then built a classification model on the stagnation and overfitting problems. To cover
extracted deep features using a logistic regres- this issue, pretrained CNN-based architectures
sion classifier. According to the results, it was became an alternative to the conventional
possible to obtain promising results through CNNs trained from scratch. For instance, Shallu
deep features as well as training a CNN variant and Mehra [14] compared the performance of a
from scratch. Bayramoglu et al. [8] introduced fine-tuned VGG16 with that of a CNN trained
the following two CNN-based architectures: (1) from scratch through a logistic regression classi-
single-task CNN variant was used to detect the fier. According to the results, VGG16 obtained
malignancy and (2) multitask CNN variant was better diagnosis performance. Xiang et al. [15]
designed to detect both the malignancy and the investigated the performance of a fine-tuned
magnification degree. The accuracy obtained by InceptionV3 to diagnose breast cancer from his-
these CNN-based architectures was between topathological datasets. It can therefore be indi-
80% and 84%. Wei et al. [9] proposed a CNN- cated that the studies for diagnosis of breast
based breast cancer detection methodology cancer have not come to an end yet.
which applied improved data augmentation In this chapter, we aim to utilize CNN-based
and transfer learning strategies. The proposed architectures to reflect which deep learning
methodology reached nearly 97% accuracy. architecture performs well among a variety of
Pratiher and Chattoraj [10] introduced an end- deep learning architectures. To achieve this goal,
to-end deep ensemble methodology based on we use CNNs trained from scratch and pre-
manifold learning L-ISOMAP and stacked trained CNN-based architectures on the invasive
sparse auto-encoder to detect breast cancer. ductal carcinoma (IDC) dataset. The aim of this
Different from the previous studies, Nahid and chapter is to give comprehensive understanding
Kong [11] did not only use the raw image data of deep CNN models to researchers/experts
as an input but also utilized handcrafted fea- who are interested in breast cancer diagnosis,
tures and frequency-domain information to and researchers who would like to have an
carry out the training process of a CNN archi- insight of several methods in this field.
tecture. The overall accuracy obtained by this The remainder of the chapter is as follows. In
study was between 94% and 97%. In another Section 13.2, we present the dataset used in
study, Nahid et al. [12] introduced a combined experiments and then briefly describe the meth-
version of CNN and long short-term memory to ods with implementation codes. In Section 13.3,
extract meaningful features and then individu- we present the results with discussions. Finally,
ally applied softmax layer and support vector we conclude the chapter with future trends.
machines as classifiers. The reported accuracy
was between 90% and 91%. Bardou et al. [13]
introduced a comparative study of deep fea-
tures and handcrafted features. According to
13.2 Materials and methods
the results, the classification of deep features
was superior to that of handcrafted features. In
13.2.1 Dataset
summary, the aforementioned studies wrapped We choose the IDC histopathological image
around CNNs showed a significant perfor- dataset [16] to carry out experiments for breast
mance for breast cancer diagnosis from histo- cancer diagnosis. IDC is regarded as the most
pathological images. However, the training common type among all breast cancers.
process of a CNN architecture requires high Pathologists typically grade whole mount samples
memory and computation resources. Moreover, on the basis of the IDC present in the region. As a

Applications of Artificial Intelligence in Medical Imaging


324 13. Diagnosis of breast cancer from histopathological images with deep learning architectures

FIGURE 13.1 Microscopic pat-


terns of invasive (A) lobular carci-
noma and (B) ductal carcinoma.

result, identifying the exact regions of IDC inside 13.2.2 Methods


of a given mount slide is one of the common pre-
processing steps for automatic aggressiveness In this section, we consider the methods we
grading. Fig. 13.1 shows histopathological image incorporated to accomplish the task of breast
examples of malignant breast tumor. cancer diagnosis. We implement the correspond-
The original dataset involves 162 whole ing methods by using Python and TensorFlow.
mount slide images scanned at 40 3 . In total, The evaluation metrics are calculated through
277,524 patches with size 50 3 50 (198,738 IDC the scikit-learn package. The number of epochs
negative and 78,786 IDC positive) were obtained is set to 10 for all deep learning architectures
from the dataset. To alleviate the class-imbalance and the random seed is set to 41.
problem, we choose 78,786 image samples from
the category of IDC negative. Accordingly, the 13.2.2.1 Convolutional neural network-
total number of image samples is 157,572. After based diagnosis method
addressing the class-imbalance problem, the CNNs are specialized types of neural net-
splitting process is started. We split the data into work architecture firstly designed for 2D image
75% train and 25% test set. Also, we separate data. Not only working with images, CNNs can
25% of the train set as the validation set. Finally, also work with 1D and 3D data. CNNs involve
we convert the target variable of each set into a the following layers: convolutional layer, pool-
matrix of binary values to use with the loss func- ing layer, and fully connected layer. The design
tion “categorical_crossentropy.” The splitting of a CNN architecture starts with the convolu-
procedure is shown in the following section. tional layer. Additional convolutional layers or

# Train/Validation/Test Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train,
y_train, test_size=0.25, random_state=42)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
y_valid = to_categorical(y_valid)

Applications of Artificial Intelligence in Medical Imaging


13.2 Materials and methods 325
pooling layers can be added to the initial convo- connection to each pixel from the input image,
lutional layer. The final layer is the fully con- that is, it only connects to the corresponding
nected layer. While convolutional layers learn region where the filter is carried out [17].
local patterns from images by using 2D win- Accordingly, convolutional and pooling layers
dows (also called filters), fully connected layers are referred to as partially connected layers.
extract global patterns from the feature space of Pooling layers also called downsampling
their input. Proportionally to the number of reduce the number of parameters in the input
layers, the complexity of a CNN architecture by sweeping a filter over the entire input.
increases. Therefore it is possible to identify Different from the convolutional layer, the fil-
larger portions of the image. To be specific, ter applied in pooling layers does not have any
while small local patterns (e.g., color and edge) weights. To be specific, the filter processes the
are extracted from images in earlier layers, latter receptive field values to generate the output
layers detect larger patterns of the features (e.g., array by applying an aggregation function.
ear and eye). There are two main pooling types: max pooling
The novelty of a CNN architecture is the con- and average pooling. While max pooling gen-
volutions with convolutional layer, the main erates the output array by assigning the pixel
components of which are input, a feature detec- with the maximum value within the receptive
tor, and a feature map. Convolutions are carried field, average pooling generates the output
out over a feature map. For a color image as an array by calculating the average value within
input, the feature map will be three dimensions: the receptive field. Max pooling is more gen-
two of them are spatial axes (height and width) eral than average pooling. After a convolution
and the other one is the depth axis (also known operator, a transformation (e.g., ReLU) is
as the channel axis). For an RGB image, the applied to the output array (feature map) to
depth axis will be 3 since it has three channels: introduce nonlinearity to the model.
red, green, and blue. Another component of the As convolutional and pooling layers are par-
convolutional layer, the feature detector (called tially connected, the pixel value of an input
a kernel or a filter) checks whether the feature is image does not have a direct relationship with
present in the image by moving across the the class labels. To cover this issue, fully con-
receptive fields. The feature detector is a 2D nected layers use the output array (feature
array of weights representing the image part. map) obtained from partially connected layers
Filter sizes can vary, but a traditional 3 3 3 to carry out a classification task. While partially
matrix is typically the most common; this also connected layers usually use ReLU as an acti-
determines how large a receptive field is. When vation function, fully connected layers gener-
a filter operation is applied to a region of an ally use a softmax function to generate
image, the dot product is calculated between probabilities within the range of 0 and 1 repre-
the input and the filter. The calculated dot prod- senting the predicted class labels of the input
uct is then used to build an output array. This image data.
process is repeated by shifting the filter by a Thanks to the success in a variety of fields,
stride until the filter has crossed the entire CNNs have also been applied for breast cancer
image. The final output array built by a number diagnosis. How a CNN is implemented to
of dot products between the input and the filter diagnose breast cancer from histopathological
is known as a feature map or a convolved fea- images is shown in Fig. 13.2. Firstly, the pre-
ture. Each output value does not have a direct processing stage is carried out to prepare the

Applications of Artificial Intelligence in Medical Imaging


Normal

Cancer

CNN
Histopathological
Image

FIGURE 13.2 The general methodology of CNN architecture for breast cancer diagnosis from histopathological images.
CNN, Convolutional neural network.

histopathological image dataset for the training CNN model is built on the train image set.
stage. In this stage, we should first deal with Finally, the performance of the trained model
data origin problems such as noise, artefacts, is evaluated on the test image set to verify the
inconsistency, and class-imbalance. Then, we diagnosis performance of the model on the
split the preprocessed data into the train and unseen dataset. A sample CNN model with 6
the test image sets. In the training stage, a layers is shown as below.

import tensorflow as tf
from tensorflow import keras
model = Sequential()
model.add(Conv2D(32, (3,3), activation = 'relu', input_shape = (50,50,3)))
model.add(Conv2D(32, (3,3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.3))

model.add(Conv2D(64, (3,3), activation ='relu'))


model.add(Conv2D(64, (3,3), activation ='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.3))

model.add(Conv2D(128, (3,3), activation ='relu'))


model.add(Conv2D(128, (3,3), activation ='relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(dropout_conv))

model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.3))
model.add(Dense(2, activation = "softmax"))

model.summary()
13.2 Materials and methods 327
13.2.2.2 Pretrained convolutional neural whole architecture. At the end of partially
network-based diagnosis method connected layers, two fully connected layers
A number of pretrained CNN architectures are leveraged. The number 16 in VGG16
have been introduced due to the motivation of represents that the architecture has 16
addressing new datasets, such as MNIST, layers. A fixed-sized RGB image is used to
CIFAR-10, and competitions such as ImageNet. train the model, and the only preprocessing
Some of these architectures are given as follows. that takes place at the training stage is
subtracting the mean RGB values computed
1. VGG [18], which is treated as one of the for each pixel on the training set.
successful CNN architectures, was first Proportionally to the network depth, the
proposed to win the ImageNet competition training process of VGG slows down and
in 2014. VGG does not have a complex the parameter size becomes quite large.
structure, that is, VGG16 applies VGG is easy to explainable and works
convolutional layers with 3 3 3 sized filters properly for classical classification
and a stride 1 and uses same padding and problems. However, the large number of
max pooling with 2 3 2 sized filters and parameters causes high computational cost.
stride 2. The convolution and max pooling A sample VGG16 model with 7 CNN layers
layers are arranged consistently over the is shown as below.

# VGG16 Model
import tensorflow as tf
from tensorflow import keras
base_model=tf.keras.applications.VGG19(input_shape=( 50,50,3),
include_top=False,weights="imagenet")
model=Sequential()
model.add(base_model)
model.add(Dropout(0.5))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(2048,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))

Applications of Artificial Intelligence in Medical Imaging


328 13. Diagnosis of breast cancer from histopathological images with deep learning architectures

model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2,activation='softmax'))
model.summary()

2. ResNet [19] is an exotic type of architectures 3. DenseNet [20] connects each layer to every
that uses a network of microarchitecture other layer. For L layers, the total number
modules unlike conventional sequential of direct connections is L(L 1 1)/2. Each
pretrained architectures such as AlexNet and layer takes the feature maps as inputs
VGG. Microarchitecture represents the generated by the preceding layers. In
building blocks used to build the entire other words, the input of a layer in a
network. ResNet has a much deeper DenseNet architecture is the
architecture than VGG but has a smaller concatenation of feature maps of previous
number of actual weighting parameters. This layers. The architecture in DenseNet is
is because ResNet leverages global average divided into DenseBlocks, where the
pooling rather than fully connected layers. dimensionality of a feature map remains
ResNet addresses vanishing gradients, constant, but the number of filters
accelerates the training speed, provides higher changes between them. Convolution and
accuracy in classification problems, and pooling are the first layers in DenseNet.
detects redundant extracted features. On the Then, there is a dense block followed by a
other hand, ResNet has an increased complex transition layer, and finally a dense block
architecture. Moreover, skip connections followed by a classification layer.
between layers cause extra dimensionality. DenseNet addresses vanishing gradients,
The ResNet50 model is generated as follows. enhances feature reuse, and reduces the

#Import ResNet from Keras


from keras.applications.resnet50 import ResNet50
base_model=ResNet50(input_shape=(50,50,3),include_top=False,weights="imagenet")

Applications of Artificial Intelligence in Medical Imaging


13.3 Results and discussions 329
parameter size. The DenseNet169 model 1. An optimizer: It is a function or an algorithm
is generated as follows. that is used to update the weights and learning

#Import DenseNet from Keras


from keras.applications.DenseNet169 import DenseNet169
base_model=DenseNet169(input_shape=( 50,50,3),include_top=False,weights="imagenet")

4. MobileNet [21] applies the same convolution rate of a deep learning model based on the loss
as CNNs to filter images, but its method function value. Accordingly, it helps to reduce
differs from that of CNN in such a way that the overall loss and increase the accuracy.
it performs depth convolution and point There are variety of optimizers such as
convolution different from the conventional gradient descent, stochastic gradient descent,
convolution as done by CNNs. Accordingly, AdaGrad, root mean square (RMS Prob), and
the efficiency of CNN is increased and so it Adam. As a deep learning model generally
is possible to integrate MobileNet in the consists of millions of parameters, choosing the
mobile systems. In other words, it is best weights for it can be a daunting task.
possible to obtain a better response in a Therefore it is necessary to choose a
short time due to the time efficiency. The suitable optimizer for your application.
MobileNet model is generated as follows. 2. A loss function: The function is used to
quantify how good or bad the model

#Import MobileNet from Keras


from keras.applications.mobilenet import MobileNet
base_model= MobileNet(input_shape=(50,50,3),include_top=False,weights="imagenet")

13.3 Results and discussions performs. Loss functions are mainly


investigated in two groups: regression loss
How a pretrained CNN architecture is imple- (e.g., mean squared error, mean squared
mented to diagnose breast cancer from histopatho- logarithmic error, and mean absolute
logical images is shown in Fig. 13.3. As described error) and classification loss (e.g., binary
in Section 13.2.2.1, the preprocessing stage is car- cross entropy, hinge, and categorical cross
ried out and then the training stage is performed entropy). While regression loss is used
on the train data. Finally, the performance of the with continuous target variables,
pretrained model is verified on the test data. classification loss is used with discrete
target variables.
3. Metrics: Metrics are different from loss
functions. While loss functions are used to
13.3.1 Experimental setup update the model parameters through an
To make the established CNN-based model optimizer, metrics are used to measure or
ready for training, the following three cases are visualize the performance of model during
considered as part of the compilation stage: training and testing. Metrics can be

Applications of Artificial Intelligence in Medical Imaging


Normal

Cancer
Histopathological
Image Pre-trained Model

Fully Output
Connected Layer
Layer
FIGURE 13.3 The general methodology of pretrained architecture for breast cancer diagnosis.

divided into the following groups: (e.g., accuracy, confusion matrix, F1-score,
regression metrics (e.g., mean absolute and receiver operating characteristic curve
error, mean squared error, and root mean [ROC]).
squared error) and classification metrics In this study, the compilation stage is imple-
mented as follows.
def f1_score(y_true, y_pred): #taken from old keras source code
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
recall = true_positives / (possible_positives + K.epsilon())
f1_val = 2*(precision*recall)/(precision+recall+K.epsilon())
return f1_val

METRICS = [
tf.keras.metrics.BinaryAccuracy(name= 'accuracy'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall'),
tf.keras.metrics.AUC(name='auc'),
f1_score,
]

lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 5,verbose = 1,factor = 0.75,


min_lr = 1e-10)
mcp = ModelCheckpoint('model.h5')
es = EarlyStopping(verbose=1, patience=5)

model.compile(optimizer='Adam', loss='binary_crossentropy',metrics=METRICS)
history=model.fit(X_train, y_train,validation_data=(X_valid, y_valid),verbose = 1,
epochs = 20,callbacks=[lrd,mcp,es])

model.evaluate (X_test, y_test, verbose=1)


13.3 Results and discussions 331

13.3.2 Experimental results performance than conventional CNNs due to


their well-designed network model. However,
The results of a variety of CNN and pre- the F1-score obtained by ResNet50, VGG19,
trained architectures are presented in Table 13.1 ResNet101, and DenseNet121 is lower than 0.9,
in terms of the accuracy, F1-score, and ROC. that is, they cannot perform well compared to
According to the results, CNNs with 2 and 3 other pretrained architectures. It should also be
layers cannot perform as well as other CNNs notified that a combined VGG19 architecture
models having more layers. Moreover, it is with 7 layers obtains similar performance with
observed that increasing the layer size from 4 to conventional VGG19 despite additional mod-
9 in CNNs does not significantly enhance the ules. This means the network architecture even
diagnosis performance, that is, CNNs with with deeper structure does not perform well if
layers between 4 and 9 have similar perfor- not well-prepared and well-designed. The F1-
mance in detecting breast cancer. It can therefore score more than 0.9 is obtained by VGG16 and
be stated that CNNs established with more MobileNet, and these pretrained architectures
layers do not always guarantee better classifica- are followed by MobileNetV2 and DenseNet121.
tion performance. When considering pretrained It can therefore be suggested that deep learning
CNN architectures, it is observed from architectures, especially pretrained architectures,
Table 13.1 that pretrained architectures can can be successfully applied to diagnose breast
achieve similar or mostly better diagnosis cancer from histopathological images.

TABLE 13.1 Results of deep learning architectures.

Classifiers Train accuracy Val accuracy Test accuracy F1-score ROC area

CNN 2 Layer 0.8455 0.85 0.846 0.8455 0.8


CNN 3 Layer 0.846 0.85 0.846 0.846 0.85
CNN 4 Layer 0.8741 0.881 0.8808 0.8741 0.9429
CNN 5 Layer 0.8737 0.8767 0.8779 0.8737 0.9726
CNN 6 Layer 0.88 0.88 0.88 0.8826 0.95
CNN 7 Layer 0.8757 0.8784 0.8785 0.8784 0.9433

CNN 8 Layer 0.87 0.86 0.8692 0.87 0.93


CNN 9 Layer 0.88 0.88 0.87 0.88 0.95
ResNet50 0.8731 0.8837 0.8792 0.8731 0.94
VGG16 0.934 0.9211 0.922 0.934 0.9786
VGG19 0.87 0.85 84.5 0.87 0.93
MobileNet 0.94 0.91 0.91 0.94 0.95

DenseNet169 0.89 0.86 0.85 0.89 0.93


DenseNet121 0.91 0.91 0.9 0.91 0.95
MobileNetV2 0.92 0.92 0.91 0.92 0.97
ResNet101 0.88 0.86 0.85 0.87 0.95

Applications of Artificial Intelligence in Medical Imaging


332 13. Diagnosis of breast cancer from histopathological images with deep learning architectures

13.4 Conclusion [7] F.A. Spanhol, et al., Deep features for breast cancer
histopathological image classification, in: 2017 IEEE
International Conference on Systems, Man, and
In this chapter, we leveraged deep learning Cybernetics (SMC), 2017.
architectures to carry out a comparative study on [8] Bayramoglu, N., J. Kannala, J. Heikkilä, Deep learning
detecting breast cancer from histopathological for magnification independent breast cancer histopa-
images. According to the results, we obtained thology image classification, in: 2016 23rd International
Conference on Pattern Recognition (ICPR), 2016.
92.2% test accuracy and 0.934 F1-score with
[9] B. Wei, et al., Deep learning model based breast cancer
VGG16 in the IDC dataset. It can therefore be histopathological image classification, in: 2017 IEEE
revealed that deep learning architectures, espe- 2nd International Conference on Cloud Computing
cially pretrained models, are really good alterna- and Big Data Analysis (ICCCBDA), 2017.
tives for the diagnosis process of the breast [10] S. Pratiher, S. Chattoraj, Manifold learning & stacked
sparse autoencoder for robust breast cancer classifi-
cancer. We would also want to notify that increas-
cation from histopathological images. arXiv, 2018,
ing the deepness of an architecture does not verify 1806.06876.
better diagnosis performance if not well-designed [11] A.-A. Nahid, Y. Kong, Histopathological breast-image
and well-prepared. For the future, we would like classification using local and frequency domains by
to work upon transfer learning to improve the convolutional neural network, Information 9 (1) (2018).
[12] A.-A. Nahid, M.A. Mehrabi, Y. Kong, Histopathological
effectiveness of the diagnosis process. To achieve
breast cancer image classification by deep neural net-
this, we will first design a deep architecture by work techniques guided by local clustering, BioMed.
integrating well-designed modules to each other. Res. Int. (2018) 2362108. 2018.
Moreover, we will propose a diagnosis methodol- [13] D. Bardou, K. Zhang, S.M. Ahmad, Classification of breast
ogy based on a feature selection method to select cancer based on histology images using convolutional
neural networks, IEEE Access. 6 (2018) 2468024693.
the most appropriate features from the feature set
[14] Shallu, R. Mehra, Breast cancer histology images clas-
extracted by deep architectures for classification. sification: training from scratch or transfer learning?
ICT Express 4 (4) (2018) 247254.
[15] Z. Xiang, et al., Breast cancer diagnosis from histopatho-
References logical image based on deep learning, in: 2019 Chinese
[1] S. Boumaraf, et al., A new transfer learning based Control and Decision Conference (CCDC), 2019.
approach to magnification dependent and independent [16] A. Janowczyk, A. Madabhushi, Deep learning for digital
classification of breast cancer in histopathological images, pathology image analysis: a comprehensive tutorial with
Biomed. Signal. Process. Control. 63 (102192) (2021). selected use cases, J. Pathol. Inform. 7 (1) (2016) 29. ISSN
[2] R. Rashmi, K. Prasad, C.B.K. Udupa, Breast histopath- 21533539, https://doi.org/10.4103/2153-3539.186902.
ological image analysis using image processing techni- [17] IBM Cloud Education, Convolutional neural networks
ques for diagnostic purposes: a methodological https://www.ibm.com/cloud/learn/convolutional-
review, J. Med. Syst. 46 (1) (2021) 7. neural-networks, 2020.
[3] X. Zhou, et al., A comprehensive review for breast his- [18] K. Simonyan, A. Zisserman, Very deep convolutional
topathology image analysis using classical and deep networks for large-scale image recognition, in: 3rd
neural networks, IEEE Access. 8 (2020) 9093190956. International Conference on Learning Representations
[4] F.A. Zeiser, et al., Breast cancer intelligent analysis of (ICLR2015), 2015.
histopathological data: a systematic review, Appl. Soft [19] K. He, et al., Deep residual learning for image recogni-
Comput. 113 (107886) (2021). tion, in: 2016 IEEE Conference on Computer Vision
[5] F.A. Spanhol, et al., Breast cancer histopathological and Pattern Recognition (CVPR), 2016.
image classification using Convolutional Neural [20] G. Huang, et al., Densely connected convolutional net-
Networks, in: 2016 International Joint Conference on works, in: IEEE Conference on Computer Vision and
Neural Networks (IJCNN), 2016. Pattern Recognition (CVPR2017), 2017.
[6] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet [21] A.G. Howard, et al., MobileNets: efficient convolu-
classification with deep convolutional neural net- tional neural networks for mobile vision applications,
works, Commun. ACM 60 (6) (2017) 8490. CoRR (2017). abs/1704.04861.

Applications of Artificial Intelligence in Medical Imaging


C H A P T E R

14
Artificial intelligence based Alzheimer’s
disease detection using deep
feature extraction
Manav Nitin Kapadnis1, Abhijit Bhattacharyya2 and
Abdulhamit Subasi3,4
1
Department of Electrical Engineering, Indian Institute of Technology Kharagpur, Kharagpur,
West Bengal, India 2Department of Electronics and Communication Engineering, National Institute of
Technology Hamirpur, Hamirpur, Himachal Pradesh, India 3Institute of Biomedicine, Faculty of
Medicine, University of Turku, Turku, Finland 4Department of Computer Science, College of
Engineering, Effat University, Jeddah, Saudi Arabia

O U T L I N E

14.1 Introduction 333 14.4.1 Experimental data 344


14.4.2 Performance evaluation measures 345
14.2 Background/literature review 335
14.4.3 Experimental results 348
14.3 Artificial intelligence models 337
14.5 Discussion 350
14.3.1 Deep feature extraction techniques 337
14.3.2 Classification techniques 339 14.6 Conclusion 352
14.4 Alzheimer’s disease detection using References 352
artificial intelligence 344

14.1 Introduction decline of cognitive abilities. The latest statistical


survey concludes that majority of the dementia
Alzheimer’s disease (AD) is considered as one patients (60%80%) suffer from AD and the
of the progressive fatal neurodegenerative disor- number will increase as the population become
ders that affect elderly population and results in elderly. It is predicted that by the year 2050,

Applications of Artificial Intelligence in Medical Imaging


DOI: https://doi.org/10.1016/B978-0-443-18450-5.00007-4 333 © 2023 Elsevier Inc. All rights reserved.
334 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

worldwide the total number of AD patients can provide insight about beta-amyloid protein level,
reach up to 152 million [1] and 1 out of 85 per- (3) tau PET-CT are utilized for tau (protein
sons will be affected by AD [2]. In contrast to responsible for formation of neurofibrillary tan-
other brain disorders, AD has significantly high gles in the nerve cells) detection [6,7]. For exam-
mortality rate and is rising each year. ple, higher level of beta-amyloid protein can be
AD has multiple stages such as predementia, confirmed by a positive amyloid PET-CT scan
early, middle, and advanced stages. In the prede- that can assist to confirm AD. Other conditions
mentia stage, the common symptoms include such as head trauma, stroke, and tumors contrib-
mild cognitive impairment (MCI) with forgetful- uting to dementia can also be disregarded using
ness that mimics the natural aging process. In the the aforementioned diagnostic methods [8]. The
early AD stage, person suffers from impairment MRI scan measures the volume alteration at
of executive functions, learning, and memory, characteristic positions to analyze AD which can
often leading to language difficulty. Patient feels provide up to 87% analytical accuracy [9,10].
more speech difficulty, reading and writing skills More often the appraisal is performed on
are largely impeded in the middle stage. In the temporoparietal cortical atrophy and mesial
advanced stage, AD patients exhibit apathy and temporal lobe atrophy (i.e., entorhinal cortex
fail to perform even simple tasks independently. and hippocampus). Direct estimation is quanti-
Eventually, the patients become immobilized and fied by measuring the volume loss of hippo-
death occurs [3,4]. campal or parahippocampal tissue, whereas
The early diagnosis of this disease can reduce indirect estimation is based on the parahippo-
the surging pervasiveness and high mortality. Till campal fissure’s magnification.
date, AD has no curable treatment and only after Recently, the accessibility of advanced nonin-
early identification of the disorder, the disease vasive imaging methods has gained significant
progression can be slowed with cognition enhanc- attention of the researchers developing reliable
ing drugs, physical exercises, and proper lifestyle and precise tools for brain condition monitoring.
management. The diagnosis of AD is commonly In fact, many computer-assisted systems for AD
carried out based on patient’s illness history and diagnosis and severity analysis have been pro-
also using physiological and neurological features. posed and implemented in literature [1113]. In
Patient’s medical history can be received from general, the primary steps involved in a
relatives and assessing patient’s behavior [5]. computer-aided AD diagnosis system are listed
In recent times, for the diagnosis of AD, clini- as follows: (1) brain imagery collection using a
cians generally rely on patient’s brain image recommended and standard imaging technique,
analysis, as current imaging systems provide (2) enhancement of the brain images with a
wide range of information about the subject’s suitable image enhancement tool, (3) extraction
health condition. Commonly used imaging tech- of automatic/handcrafted discriminatory fea-
niques are computed tomography (CT), positron tures from the enhanced images, (4) selection of
emission tomography-CT (PET-CT), and mag- dominant features using feature selection techni-
netic resonance imaging (MRI). A CT image can ques, (5) building a classification model for brain
be used for diagnosing dementia by inspecting image categorization, and (6) validation of the
the different brain region sizes that includes fron- built system using new test images.
tal lobe, temporal lobe, and hippocampus. PET- In this work, we aim to build AI-driven AD
CT provides information about different types of diagnosis method using brain MRI images.
brain functions such as (1) fluorodeoxyglucose More specifically, we propose different transfer
(FDG) PET-CT are generally used for measuring learning (TL) models [pretrained deep neural
glucose levels in the brain, (2) amyloid PET-CT networks (DNNs)] for automatic learning and

Applications of Artificial Intelligence in Medical Imaging


14.2 Background/literature review 335
extraction of discriminatory features for AD algorithms. Their obtained results can be useful
diagnosis. The privilege of using DNNs is that for the early diagnosis of AD. Kaur and Kaur
features are optimally tuned for discriminating [18] employed different image enhancement
different classes of images. Convolutional neu- techniques for AD detection. They used cor-
ral networks (CNNs) are very effective in rected red and green ingredients and found the
extracting spatial patterns from the input best AD detection approach by computing sen-
images. Finally, different machine learning sitivity and specificity measures. Zhao et al.
(ML) models are explored to classify the deep [19] used through gradient echo plural contrast
features for AD diagnosis. A detailed compara- imaging (GEPCI) with MRI scan for the detec-
tive study of the performance of different deep tion of AD. The GEPCI enhances the resolution
feature extraction models has been presented of the affected brain area in the MRI and can
in detail in order to choose the best combina- effectively identify the damaged brain tissues
tion of deep feature-ML model for the AD caused by AD.
detection. Sankari and Adeli [20] used probabilistic
neural network (PNN) for AD diagnosis using
MRI scans. In the first stage, they computed
14.2 Background/literature review total brain volume and atrophy rate. Then sev-
eral features, namely, shape, correlation, and
In the area of medical imaging the AD diag- contrast were extracted and given as input for
nosis plays a critical role in maintaining the classification. Their study reveals that PNN
cognitive capability of an individual. In fact, performs better than the KNN and support
the persons suffering from AD face serious vector machine (SVM) with respect to sensitiv-
problems that impact thinking, memory, and ity, specificity, and accuracy. Plant et al. [21]
other living activities. The human intellectual proposed an MRI image-based novel pipeline
activities are highly affected in later stage of for the AD diagnosis. The classifiers, namely,
their lives due to implacable destruction of the Bayes statistics, SVM, and voting feature inter-
nerve cells over time [14]. In the United States val (VFI) were combined for AD pattern
only, the number of persons suffering from AD matching in the imagery. They obtained the
has approximately reached to five million, MRI from thirty-two AD patients, which were
most of them are aged between 80 and 90 years used for feature extraction purpose. The
[15]. Early detection of AD is carried out using authors achieved significant prediction accu-
various therapies and advanced diagnostic racy after selecting the discriminatory features.
techniques. The MRI is a potent tool for investi- Zhang et al. [22] introduced a novel ML
gation of clinical signs of AD in the brain. framework for MRI-based AD diagnosis with
However, MRI requires manual inspection of significant performance. In the first stage,
the workflow and thus acquisition procedure they performed skull stripping to remove the
of MRI is sluggish [16]. extracted regions, and thereafter features
Recently, several efficient AD diagnosis were computed based on stationary wavelet
techniques have been proposed using image entropy. The computed features were fed to a
processing and ML techniques, which can neural network with single hidden layer, and
become alternative to the manual systems. weights and biases were optimized via parti-
Patil et al. [17] proposed an effective image cle swarm optimization (PSO).
processing system for AD detection using MRI Wang et al. [23] presented single slice (SS)
images. They computed atrophy with wavelet, method based on wavelet entropy and percep-
K-means, watershed, and other customized tron learning for AD detection using MRI

Applications of Artificial Intelligence in Medical Imaging


336 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

scans. Several features were extracted, includ- outperforming the existing methods [28]. In
ing wavelet transform-based SS features, wave- another work, Wang et al. [29] presented a 3D
let entropy, and wavelet orientation. The DF estimation-based method for discrimination
authors employed multilayer perceptron with of AD and healthy subjects. They extracted fea-
biogeography-based optimization algorithm tures using 3D DF method and selected statisti-
for the classification of the extracted features cally significant features using Bhattacharyya
that outperformed the existing methods. distance, Welch’s t-test (WTT), and Student’s t-
Zhang et al. [24] introduced an AD detection test. Finally, selected features were classified
framework using MRI scans based on under- using SVM and TSVM classifiers.
sampling method. Principal component analy- Zhang et al. [30] presented a computer-aided
sis along with singular value decomposition diagnostic (CAD) system for AD detection with
methods were used for feature computation MRI images. The authors used maximum inter-
and discriminatory feature selection. Further, class variance for selecting key slices of 3D MRI.
decision tree (DT) and SVM classifiers were Afterward, for each slice set eigenbrain was
utilized for achieving significant detection generated. Then, significant eigenbrains were
performance. selected using WTT and fed to SVM classifier
Zhang and Wang [25] computed displace- with different kernels. Further, the prediction
ment field (DF) in MRI for detecting abnormal- accuracy was notably improved using PSO algo-
ities present in normal brain for AD detection. rithm. Zhang et al. [31] introduced a CAD sys-
The discrete wavelet transform (DWT)- tem to detect AD from MRI scans. From each
based features were computed and feature MRI scan wavelet entropy and Hu moment
dimensionality was reduced using PCA. The invariant features were extracted. Then
final set of selected features were categorized extracted features were classified using com-
using three different classifiers, namely, SVM, putation of generalized eigenvalues with
twin SVM (TSVM), and generalized eigenvalue SVM. Hett et al. [32] presented multi-textural
proximal SVM. The authors concluded that DF (MTL) pipeline for feature computation from
is useful in AD diagnosis when MRI scans MRI scans. The AD structural information
were utilized. El-Dahshan et al. [26] described was estimated via MTL approach. Further,
an MRI-based AD classification method using adaptive fusion method was applied for fus-
a hybrid approach. Their framework includes ing texture grading features computed using
three basic steps that are as follows: feature 3D Gabor filter. Their work achieved signifi-
extraction, dimensionality reduction, and clas- cant performance improvement over existing
sification. They extracted DWT-based MRI fea- biomarker methods. Gao et al. [33] employed
tures and then feature reduction was carried deep learning for obtaining early-stage AD
by irrelevant points using PCA. Finally, two information and classification. They fused 2D
classification algorithms, namely, ANN and and 3D CNN that provided significant per-
KNN were employed for the classification of formance with a softmax layer.
normal and AD MRI scans. Wang et al. [27] Ayadi et al. [34] described a hybrid method
introduced a novel AD classification algorithm for extracting features from brain MRI scans
using the features of Zernike moment (ZM), and proposed a classification system. Initially,
followed by a linear regression (LR) classifier. DWT-based features were extracted from the
The ZM was used to extract features with test images, and further, Bag-of-Words method
lengths between 10 and 256 from each MRI adopted for grouping key image features.
image. The computed features were classified Finally, several ML techniques, such as ran-
LR that achieved an accuracy of 97.51%, dom forest (RF), AdaBoost, KNN, and SVM

Applications of Artificial Intelligence in Medical Imaging


14.3 Artificial intelligence models 337
were employed for classification. The recent the target task, and the source and target tasks
work of Acharya et al. [13] uses MRI scans and must be similar in nature before utilizing TL.
employs different quantitative techniques, There are three main approaches that might
including filtering, feature extraction, feature help students learn more effectively through
selection using Student’s t-test, and KNN clas- transfer. The first is the first feasible output in
sifier for the diagnosis of AD. The shearlet the target task using only the transferred infor-
transform-based feature computation method mation, as contrasted to an uninformed per-
achieved superior performance over the alter- son’s initial performance before any further
native methods. learning. Second, the time it takes to fully com-
prehend the objective mission, notwithstand-
ing the information provided, in comparison to
the time it takes to comprehend it from scratch.
14.3 Artificial intelligence models Third, the maximum output that can be
achieved in the target role in comparison to the
14.3.1 Deep feature extraction techniques maximum output that can be achieved without
Pretrained neural networks from freely transition.
available platforms such as ImageNet are also VGG16, VGG19, DenseNet169, DenseNet121,
available for usage in different applications ResNet101, InceptionV3, InceptionResNetV2,
and datasets off-the-shelf. Except for the fully MobileNet, and MobileNetV2 are some of the
linked layer, practically all of the pretrained models we used. The VGG network design was
weights in the CNN are used without change. introduced by Simonyan and Zisserman [37],
The fully connected layer’s weights are derived which is recognized by its simplicity, utilizing
from the dataset at hand. Because the labels of only three universal layers constructed on top of
the class in a certain context may differ from each other in increasing intricacy. Max pooling is
those of ImageNet, the last layer’s preparation a high-volume capacity method. ResNet is a
is critical. The weights in the preceding layer, form of “exotic architecture” built on micro-
on the other hand, are useful because they architecture modules, as opposed to traditional
learn different types of shapes in the images sequential network designs such as AlexNet,
and may be used for nearly any type of catego- VGG, and OverFeat. The term “micro-architec-
rization. Furthermore, the last layer’s activation ture” refers to the set of “building pieces” used
function can also be applied for clustering. It is to put the network together. The ResNet design,
worth noting that the utilization of pretrained initially proposed by He et al. [38], has shown
CNNs is so popular that training is hardly that extraordinary deep networks may be outfit-
begun from the scratch [35]. ted using conventional stochastic gradient
Deep learning methods need lots of exam- descent and residual modules. The purpose of
ples or a big data. We can use TL since the the inception module is to operate as a “multi-
number of samples in our dataset is lower than level feature extractor” by computing one, three,
modern data standards, that is, the number of and five brackets all inside the same network
MRI scans for the classification model is less. module. The filter outputs are then stacked along
Taking characteristics acquired on one topic the channel dimension before being transferred
and applying them to a new, related issue is to the following network layer. The initial imple-
known as TL [36]. By integrating knowledge mentation of this idea was Google LeNet, but
from the source task, export learning aims to subsequent implementations were simply known
improve learning in the target task. The source as Iteration vN, where N is the Google version
task must be learned on a bigger dataset than number. François Chollet, the Keras library’s

Applications of Artificial Intelligence in Medical Imaging


338 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

inventor and major maintainer, offered the name markings that are not accessible in large num-
Xception. The Xception architecture is an expan- bers and are not the part of a larger tool such as
sion of the standard that replaces the Inception ImageNet. This causes issues since neural net-
architecture. works require a lot of training data to build
One of the challenges that image analysts from start. However, the most important thing
encounters is that labeled training data may not to remember about image data is that the char-
be suitable for a particular purpose. Consider acteristics retrieved from an image are just that:
the following scenario: you have a set of images features from a specific dataset are particularly
that must be combined to get a single image. reusable across data sources [35].
Although retrieval applications do not use The following code snippet gives us a tem-
labels, semantic compatibility between features plate for replicating our model architecture for
is critical. In other cases, you might want to feature extraction using a TL model, (import-
classify a dataset using a specific collection of ing of libraries step is skipped):

input_tensor = Input(shape=(224, 224, 3))


base_model = VGG16(include_top=False, weights='imagenet', input_shape=(224,224,3))

x = base_model.output
x = Dropout(0.5)(x)
x = Flatten()(x)
x = BatchNormalization()(x)
x = Dense(1024,kernel_initializer='he_uniform')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)
x = Dense(1024,kernel_initializer='he_uniform')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)
x = Dense(1024,kernel_initializer='he_uniform')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)

predictions = Dense(4, activation='softmax')(x)

model_feat = Model(inputs=base_model.input,outputs=predictions)

train_features = model_feat.predict(x_train)
test_features = model_feat.predict(x_test)

Applications of Artificial Intelligence in Medical Imaging


14.3 Artificial intelligence models 339
In the above code snippet, in order to use a inspired a lot of interest in deep learning. The
different pretrained CNN model, you just have detection of illness has been greatly improved
to replace the VGG16 with the choice of model because to recent breakthroughs in AI. To exam-
you wish to use. ine various datasets, a variety of deep learning
methods are applied. It’s critical to understand
the intrinsic brain mechanisms that may be
14.3.2 Classification techniques exploited by various scanning technologies to
diagnose AD. DNNs are an AI algorithm
14.3.2.1 Artificial neural network inspired by the way human brain functions and
An artificial neural network (ANN) with a is structured. To solve artificial intelligence (AI)
single hidden layer has limited capability, but by difficulties, a variety of DNN topologies were
adding more hidden layers, the ANN can learn developed. The DNN’s operation is divided into
more intricate data. This is the idea underlying two phases. To adjust the network parameters
DNNs, which start with raw input networks and during training, the first phase uses training
learn increasingly intricate data by integrating data, which is a proportion of the original data
values from previous layers. The goal of DNN is collected. Following training, test data is used to
to learn the degree of features by upgrading con- determine if the model is properly trained to
cepts with the least amount of human interaction detect previously unseen instances [4143].
possible [3941]. Medical image analysis and The following code snippet gives an in-depth
computer-aided decision support systems have detail about the model architecture used for DNN:

model = Sequential()
model.add(Flatten(input_shape=(224,224,3)))
model.add(Dense(units=4, activation='relu'))
model.add(Dense(units=8, activation='relu'))
model.add(Dense(units=16, activation='relu'))
model.add(Dense(units=4))
model.add(Activation('softmax'))

def f1_score(y_true, y_pred): #taken from old keras source code


true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
recall = true_positives / (possible_positives + K.epsilon())
f1_val = 2*(precision*recall)/(precision+recall+K.epsilon())
return f1_val

Applications of Artificial Intelligence in Medical Imaging


340 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

METRICS = [
tf.keras.metrics.BinaryAccuracy(name='accuracy'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall'),
tf.keras.metrics.AUC(name='auc'),
f1_score,
]

def exponential_decay(lr0, s):


def exponential_decay_fn(epoch):
return lr0 * 0.1 **(epoch / s)
return exponential_decay_fn

exponential_decay_fn = exponential_decay(0.01, 5)

lr_scheduler = tf.keras.callbacks.LearningRateScheduler(exponential_decay_fn)

model.compile(optimizer= tf.keras.optimizers.Adam(
learning_rate = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-
07, amsgrad=False, name='Adam'),loss='categorical_crossentropy',metrics=METRICS)

history = model.fit(train_features, y_1, epochs=20,batch_size=32,validation_split=0.2)

14.3.2.2 K-nearest neighbor modifications of this KNN algorithm for AD.


KNN, also known as k-nearest neighbor, is a Aruchamy et al. [44] modified the KNN algo-
supervised ML technique for classification and rithm along with methods such as image pro-
regression problems. One of the easiest algo- cessing feature extraction, and feature reduction
rithms to learn is K-closest neighbor. KNN is and achieved a significant increase in the classi-
nonparametric, that is, it is not a normal distri- fication performance. Furthermore, Dinu and
bution. It does not make any assumptions Ganesan [45] presented a feature selection uti-
about the underlying data. The K-closest neigh- lizing the t-test technique for combined regres-
bor method is sometimes known as a lazy algo- sion and classification using an instance-based
rithm since it does not learn during the KNN classifier for AD detection.
training phase but does learn during the test- The following code snippet gives a deep
ing phase. It is a distance-based method. A lot detail about the model architecture used for
of studies in the past have worked on different KNN:

Applications of Artificial Intelligence in Medical Imaging


# Getting Train and validation accuracy
x_train_acc, x_val_acc, y_train_acc, y_val_acc = train_test_split(train_features,
y_train,test_size = 0.2,
stratify = y_train,
shuffle = True,
random_state = 42)
X_test,y_test=test_features,y_test

Model = KNeighborsClassifier()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)

14.3.2.3 Support vector machine along with feature extraction from image such
The SVM is a supervised ML approach for as Gaussian blur, edge detection features,
classifying and predicting data. It is, however, among others. They achieved impressive results
mainly used to tackle categorization problems. for early Alzheimer’s detection on MRI images.
Each data item is represented as a point in n- Alam et al. [47] suggested an innovative method
dimensional space (where n is the number of for identifying AD from healthy controls by
features), with the value of each feature being employing dual-tree complex wavelet trans-
the SVM algorithm’s position value. Then, to forms, main coefficients from MRI transaxial
complete classification, we find the hyperplane slices, linear discriminant analysis, and TSVM.
that clearly divides the two categories. Rabeh The following code snippet gives a deep detail
et al. [46] implemented SVM for AD detection about the model architecture used for SVM:

# Getting Train and validation accuracy


x_train_acc, x_val_acc, y_train_acc, y_val_acc = train_test_split(train_features,
y_train,test_size = 0.2,
stratify = y_train,
shuffle = True,
random_state = 42)
X_test,y_test=test_features,y_test

Model = SVC()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)
342 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

14.3.2.4 Random forest The following code snippet gives a deep


RF is a supervised ML technique for classifi- detail about the model architecture used for RF
cation and regression problems. It employs the classifier:

# Getting Train and validation accuracy


x_train_acc, x_val_acc, y_train_acc, y_val_acc = train_test_split(train_features,
y_train,test_size = 0.2,
stratify = y_train,
shuffle = True,
random_state = 42)
X_test,y_test=test_features,y_test

Model = RandomForestClassifier()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)

majority vote for classification and the average 14.3.2.5 AdaBoost


for regression to build DTs from several exam-
AdaBoost, also known as Adaptive Boosting,
ples. The capacity of the RF algorithm to handle
is a ML technique used as an ensemble
datasets with both continuous and categorical
approach. The most common method used with
variables, as in regression and classification, is
one of its most valuable characteristics. It out- AdaBoost is DTs with one level, or DTs with
only one split. These trees are also known as
performs others when it comes to categorization
decision stumps. This method generates a model
challenges. Recent studies [4850] implemented
by giving all of the data points the same weight.
RF for AD detection using various image for-
It therefore gives improperly classified points a
mats and achieved excellent results due to the
larger weight. All points with higher weights are
robustness of RF. Ali et al. [51] developed a
given more weight in the following model. It
new variation in RF algorithm for optimum
will continue to train models until it receives a
feature extraction which in turn proposed a
new method for feature extraction along with a smaller error. Recent studies such as Refs.
[5254] show the effectiveness of AdaBoost
better version of RF.
approach in Alzheimer’s disease classification.

Applications of Artificial Intelligence in Medical Imaging


14.3 Artificial intelligence models 343
The following code snippet gives a deep may come from the same learning algorithm or
detail about the model architecture used for from different learning algorithms. Ensemble
AdaBoost classifier: learners are often used in two different ways:

# Getting Train and validation accuracy


x_train_acc, x_val_acc, y_train_acc, y_val_acc = train_test_split(train_features,
y_train,test_size = 0.2,
stratify = y_train,
shuffle = True,
random_state = 42)
X_test,y_test=test_features,y_test

Model = AdaBoostClassifier()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)

14.3.2.6 XGBoost bagging and boosting. Despite the fact that


XGBoost is an ensemble learning approach. these two procedures may be used to a wide
It is not always possible to rely just on the range of statistical models, DTs are the most
results of a single ML model. Ensemble learn- widely employed. Recent studies such as Refs.
ing is a method for systematically integrating [55,56] show the effectiveness of XGBoost
the prediction skills of several learners. As a approach in Alzheimer’s disease classification.
consequence, a single model has been created The following code snippet gives a deep
that incorporates the results of many models. detail about the model architecture used for
The ensemble’s foundation learners, or models, XGBoost classifier:

Applications of Artificial Intelligence in Medical Imaging


344 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

# Getting Train and validation accuracy


x_train_acc, x_val_acc, y_train_acc, y_val_acc = train_test_split(train_features,
y_train,test_size = 0.2,
stratify = y_train,
shuffle = True,
random_state = 42)
X_test,y_test=test_features,y_test

Model = XGBClassifier()
Model_trained = model.fit(X_train, y_train)
y_pred_train= Model_trained.predict(X_train)
y_pred_val = Model_trained.predict(X_val)
y_pred_test = Model_trained.predict(X_test)

FIGURE 14.1 General framework for Alzheimer’s disease detection utilizing artificial intelligence methods.

14.4 Alzheimer’s disease detection using 14.4.1 Experimental data


artificial intelligence
We utilize a publicly available MRI dataset
The general framework for AD detection that is taken from the Kaggle. The data sam-
using AI methods is presented in Fig. 14.1. ples are shown in Fig. 14.2. The dataset

Applications of Artificial Intelligence in Medical Imaging


14.4 Alzheimer’s disease detection using artificial intelligence 345

(A) (B)

(C) (D)
FIGURE 14.2 MRI scans for different subjects: (A) nondemented, (B) mild demented, (C) very mild demented, and (D)
moderate demented. MRI, Magnetic resonance imaging.

TABLE 14.1 Class distribution of magnetic resonance imaging dataset.


Nondemented Very mild demented Mild demented Moderate demented Total

Train 2048 1434 574 41 4097


Val 512 358 143 11 1024
Test 640 448 179 12 1279

includes total of 5480 images. We divided this gives a description of class distribution of the
dataset into training, validation, and test sets. four classes in the each of the three datasets.
The training and test sets are provided by the
creator of datasets themselves. The training set
is further split into training and validation set, 14.4.2 Performance evaluation measures
80% and 20%, respectively. The training, vali- The performance measurements for the train-
dation, and testing set contains around 4097, ing set and the test set cannot be presumed to
1024, and 1279 images, respectively. Table 14.1 be identical. Although the training set contains

Applications of Artificial Intelligence in Medical Imaging


346 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

the majority of the instances, the test set is TP are correctly predicted positive values,
intended to be more realistic. A lack of data is a demonstrating that the value of the actual class
significant challenge for ConvNets. In this and the value of the projected class are both
example, the methodologies utilized to evaluate yes. For instance, if the actual class value shows
the performance measurements are still debat- that this patient survived, and the anticipated
able. In the case of the training set, the classifier class also suggests the same. TN are correctly
is fine-tuned to produce the best performance predicted negative values, implying that the
measurements. The most essential thing to value of the actual class is zero and the value of
remember when training is that no instances the predicted class is zero as well. For instance,
from the test set must be included in the classi- if the real class states the patient did not survive
fier construction. As a result, the performance and the predicted class says the same.
measurements from the test set may be FP and FN occur once your actual class is
expected to be similar to the performance mea- different from the predicted class. FP are when
sures from the control set. A classifier assigns a the actual class is not the same as the predicted
categorization to an image. It is believed to be class. For instance, if the actual class implies
successful if the category matches the specified that this patient did not survive, but the fore-
category. If there is a discrepancy, it is pre- cast class implies that this patient will. FN are
sumed to be a mistake. The majority of perfor- situations in which the real class is yes, but the
mance metrics are dependent on a classifier’s expected class is no. For instance, if the
error rate. A large number of examples are patient’s actual class value indicates that he or
included in the training set. To have the best she survived, while the predicted class value
performance measurements, the validation indicates that the person would die.
dataset should have a distribution that is nearly Accuracy is the most straightforward perfor-
identical to the test set. The validation set mance metric, because it is simply the ratio of cor-
should be used to fine-tune parameters, while rectly predicted observations to total observations.
the test set should be used to determine the
Accuracy 5 ðTP 1 TNÞ=ðTP 1 FP 1 FN 1 TNÞ
final values of performance measures [57].
A confusion matrix is a table, which is often Precision is the ratio of correctly predicted
employed to illustrate the performance of a positive observations to the total predicted pos-
classification model on a set of test data for itive observations.
which the true values are known. All the
metrics we have used to evaluate the perfor- Precision 5 TP=TP 1 FP
mance of the model can be assessed using the Recall is the ratio of correctly predicted positive
confusion matrix (see Fig. 14.3). observations to all observations in actual class.
The observations that are accurately pre-
Recall 5 TP=TP 1 FN
dicted and hence represented in green are true
positives (TP) and true negatives (TN). We F1-score is the weighted average of precision
wish to reduce false positives (FP) and false and recall. Hence, this score takes both FP and
negatives (FN), thus they are presented in red. FN into account. Naturally, it is not as simple
Predicted class
Class = Yes Class = No
Actual Class
Class = Yes True Posive False Negave
Class = No False Posive True Negave

FIGURE 14.3 Example of a confusion matrix.

Applications of Artificial Intelligence in Medical Imaging


14.4 Alzheimer’s disease detection using artificial intelligence 347
to understand as accuracy, but F1 is generally The in-built metric class of scikit learn
more effective than accuracy, particularly if library is used in order to implement all these
you have an uneven class distribution. functions. The code snippet for the same is
given below:
F1 2 score 5 2 3 ðRecall 3 PrecisionÞ=

ðRecall 1 PrecisionÞ

train_accuracy = np.round(accuracy_score(y_train, y_pred_train),4)*100


train_precision = np.round(precision_score(y_train, y_pred_train,average='weighted'),4)
train_recall = np.round(recall_score(y_train, y_pred_train, average='weighted'),4)
train_F1 = np.round(f1_score(y_train, y_pred_train, average='weighted'),4)
train_confusion_matrix = confusion_matrix(y_train,y_pred_train)

val_accuracy = np.round(accuracy_score(y_val, y_pred_val),4)*100


val_precision = np.round(precision_score(y_val, y_pred_val, average='weighted'),4)
val_recall = np.round(recall_score(y_val, y_pred_val, average='weighted'),4)
val_F1 = np.round(f1_score(y_val, y_pred_val, average='weighted'),4)
val_confusion_matrix = confusion_matrix(y_val,y_pred_val)

test_accuracy = np.round(accuracy_score(y_test, y_pred_test),4)*100


test_precision = np.round(precision_score(y_test, y_pred_test, average='weighted'),4)
test_recall = np.round(recall_score(y_test, y_pred_test, average='weighted'),4)
test_F1 = np.round(f1_score(y_test, y_pred_test, average='weighted'),4)
test_confusion_matrix = confusion_matrix(y_test,y_pred_test)

print()
print('------------------------ Train Set Metrics---------------------')
print()
print("accuracy : {}%".format(train_accuracy))
print("F1_score : {}".format(train_F1))
print("Recall : {}".format(train_recall))
print("Precision : {}".format(train_precision))
print("Confusion Matrix :\n {}".format(train_confusion_matrix))

Applications of Artificial Intelligence in Medical Imaging


348 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

print()
print('------------------------ Validation Set Metrics----------------')
print()
print("accuracy : {}%".format(val_accuracy))

print("F1_score : {}".format(val_F1))
print("Recall : {}".format(val_recall))
print("Precision : {}".format(val_precision))
print("Confusion Matrix :\n {}".format(val_confusion_matrix))

print()
print('------------------------ Test Set Metrics-----------------------')
print()
print("accuracy : {}%".format(test_accuracy))
print("F1_score : {}".format(test_F1))
print("Recall : {}".format(test_recall))
print("Precision : {}".format(test_precision))
print("Confusion Matrix : {}".format(test_confusion_matrix))

14.4.3 Experimental results


RF and XGBoost have higher train accuracy
We performed feature extraction with differ- as compared to others, but at the same time,
ent pretrained models and fed them to ML these classifiers seem to be overfitting. ANN
models such as ANN, KNN, SVM, RF, on the other hand shows greater test accuracy
AdaBoost, and XGBoost classifier. than all of them, which in turn is deemed to be
good for AD classification.
VGG16
VGG19

Training Validation Test F1- Training Validation Test F1-


Model accuracy accuracy accuracy score Recall Precision Model accuracy accuracy accuracy score Recall Precision

ANN 74.44 75.05 75.02 0.0164 0.0086 0.5238 ANN 75.82 76.61 77.56 0.4869 0.4292 0.5677

KNN 60.35 42.19 42.73 0.4099 0.4273 0.4036 KNN 63.21 45.8 44.69 0.4273 0.4469 0.4246

SVM 50 50 50 0.3333 0.5 0.25 SVM 51.86 51.56 51.09 0.3758 0.5109 0.4307

Random 100 43.65 46.02 0.4321 0.4602 0.4219 Random 100 49.51 46.09 0.431 0.4609 0.4183
Forest Forest

AdaBoost 49.58 49.51 49.45 0.3866 0.4945 0.3925 AdaBoost 50.56 51.37 50 0.3952 0.5 0.4534

XGBoost 91.89 47.56 46.64 0.4257 0.4664 0.4201 XGBoost 89.97 49.41 48.52 0.4509 0.4852 0.4457

Applications of Artificial Intelligence in Medical Imaging


14.4 Alzheimer’s disease detection using artificial intelligence 349
Here too like in VGG16 we find out that RF MobileNetV2
and XGBoost have a very high training accu-
Training Validation Test F1-
racy since they beat all the models in train Model accuracy accuracy accuracy score Recall Precision
accuracy but at the same time these classifiers
ANN 75.53 77.05 74.78 0.374 0.3041 0.493
can be seen overfitting to a great extent. Here
unlike previous model all classifiers are over- KNN 61.1299 41.41 42.97 0.407 0.4297 0.4003

fitting to a certain extent. SVM 50 50 50 0.3333 0.5 0.25

Random 100 41.99 44.14 0.4135 0.4414 0.4001


ResNet 50 Forest

AdaBoost 49.02 47.36 48.75 0.3889 0.4875 0.3839


Training Validation Test F1-
Model accuracy accuracy accuracy score Recall Precision XGBoost 90.11 43.36 46.72 0.4297 0.4672 0.4285

ANN 75.82 76.61 77.56 0.4869 0.4292 0.5677

KNN 63.21 45.8 44.69 0.4273 0.4469 0.4246


All classifiers are overfitting in this case,
except ANN, which shows consistent accuracy
SVM 51.8599 51.559 51.09 0.3758 0.5109 0.4307
of around B75%.
Random 100 49.51 46.0899 0.431 0.4609 0.4183
Forest
MobileNet
AdaBoost 50.56 51.37 50 0.3952 0.5 0.4534

XGBoost 89.97 49.41 48.52 0.4509 0.4852 0.4457 Training Validation Test F1-
Model accuracy accuracy accuracy score Recall Precision

ANN 75.67 76.05 75.06 0.1112 0.0641 0.5093

KNN 60.72 42.97 42.89 0.4028 0.4289 0.3957


RF and XGBoost beat all others like usual in
SVM 50 50 50 0.3333 0.5 0.25
the training accuracy but are seen to be overfit-
Random 100 44.04 44.61 0.4098 0.4461 0.3963
ting again. Here ANN can be considered as a Forest
good classifier with less overfitting.
AdaBoost 49.27 48.93 49.22 0.3872 0.4922 0.4242

XGBoost 89.33 46.29 47.27 0.4317 0.4727 0.4297


ResNet101

Training Validation Test F1- All of the classifiers are overfitting in this
Model accuracy accuracy accuracy score Recall Precision case, except ANN, which shows consistent accu-
ANN 76.38 77.49 76.11 0.2971 0.2048 0.561 racy of around B75%. Here ANN can be consid-
KNN 64.62 46.88 48.59 0.4658 0.4859 0.4603
ered a good classifier for AD classification.
SVM 50 50 50 0.333 0.5 0.25
InceptionV3
Random 100 48.73 48.67 0.4664 0.4867 0.4598
Forest
Training Validation Test F1-
AdaBoost 51.51 50.88 48.91 0.4424 0.4891 0.4437 Model accuracy accuracy accuracy score Recall Precision

XGBoost 90.23 49.32 48.05 0.4573 0.4805 0.4508 ANN 74.76 75.07 75.04 0.003 0.0016 1

KNN 60.77 41.21 42.03 0.3978 0.4203 0.3882

SVM 50 50 50 0.3333 0.5 0.25

Random 100 45.61 42.5 0.3987 0.425 0.3859


RF and XGBoost have a very high train Forest
accuracy as they beat all classifiers once again.
AdaBoost 49.41 49.41 49.06 0.397 0.4906 0.4176
None of the classifiers seem to give good test
XGBoost 91.55 45.51 43.83 0.4073 0.4383 0.3998
results.

Applications of Artificial Intelligence in Medical Imaging


350 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

Here, just like the previous cases, the classi- DenseNet121


fiers are overfitting, apart from ANN, which
shows consistent accuracy of around B75%. Model
Training
accuracy
Validation
accuracy
Test F1-
accuracy score Recall Precision
Here ANN can be considered a good classifier
ANN 75 75 75 0.6609 0.5023 0.9853
for AD classification InceptionV3.
KNN 62.5 42.77 42.97 0.4111 0.4297 0.4025

InceptionResNetV2 SVM 50.0199 50.29 50.16 0.3398 0.5016 0.4258

Random 100 46.19 44.84 0.415 0.4484 0.4076


Training Validation Test F1- Forest
Model accuracy accuracy accuracy score Recall Precision
AdaBoost 50.27 49.8 50.08 0.3686 0.5008 0.3889
ANN 75 75 75 0.6229 0.4938 0.8935
XGBoost 91.63 46.19 45.39 0.4142 0.4539 0.4021
KNN 62.23 46.39 44.379 0.4257 0.4438 0.4226

SVM 50.83 50.88 50.16 0.3621 0.5016 0.3894


Here, just like the previous cases, the classi-
Random 100 46.88 45.39 0.4278 0.4539 0.4151
Forest
fiers are overfitting, apart from ANN, which
shows consistent accuracy of around B75%.
AdaBoost 47.49 44.73 43.44 0.3927 0.4344 0.3613
Here ANN can be considered a good classifier
XGBoost 91.039 47.85 45.78 0.4259 0.4578 0.4117
for AD classification with DenseNet121 as well.

In this case, XGBoost and RF achieve good Xception


accuracy on training set, but both of them per-
Training Validation Test F1-
form poorly on test sets. ANN is a better classi- Model accuracy accuracy accuracy score Recall Precision
fier, as we can see from previous results as
ANN 77.94 78.05 76.06 0.6667 0.3542 0.5317
well as from this.
KNN 61.839 45.019 44.92 0.4254 0.4492 0.42

DenseNet169 SVM 50.149 50.29 50.16 0.339 0.5016 0.368

Random 100 46.68 47.81 0.4494 0.4781 0.43485


Training Validation Test F1- Forest
Model accuracy accuracy accuracy score Recall Precision
AdaBoost 51.339 51.559 51.41 0.4318 0.5141 0.4886
ANN 75.11 76.2 75.02 0.5004 0.5004 0.5004
XGBoost 91.02 49.32 47.89 0.4392 0.4789 0.4311
KNN 60.99 41.21 44.3 0.4178 0.443 0.4123

SVM 50 50 50 0.3333 0.5 0.25


With Xception deep feature extraction, ANN
Random 100 42.58 44.92 0.4123 0.4492 0.401
Forest
achieved good results as well.
AdaBoost 50.1499 49.02 48.83 0.3871 0.4883 0.4168

XGBoost 88.4 43.55 44.769 0.4147 0.4477 0.4071


14.5 Discussion
Here, just like the previous cases, the classi- AI methods have been studied in the neuro-
fiers are overfitting, apart from ANN, which imaging research field due to their advantages
shows consistent accuracy of around B75%. over typical diagnostic procedures that use
Here ANN can be considered a good classifier mass-univariate statistical methodologies. In con-
for AD classification with DenseNet169. trast, AI models can differentiate effects at the

Applications of Artificial Intelligence in Medical Imaging


14.5 Discussion 351
subject level, whereas mass-univariate statistical results for HC versus AD, HC versus MCI, AD
approaches employ changes at the category level. versus MCI, and HC versus AD versus MCI
DNN is being more widely employed in neuro- were all very positive. These trials, together
imaging, resulting in technical improvements in with CNN-based models reported significant
computer vision by outperforming other state-of- performance in other research domains, demon-
the-art detection approaches [58,59]. DNN differs strate CNNs as effective method for detecting
from typical AI approaches in that it learns raw AD [42]. Furthermore, these encouraging find-
data features without relying on extraction and ings show how AI could link real-world thera-
selection characteristics. DNN also utilizes a non- peutic experience and neuroimaging data [41].
linear transformation hierarchy that is best suited It could be interesting to map the consecutive
for recognizing distributed, complex, and layers weights back to original neuroimage to
dynamic patterns [41,42]. determine the regions that have the most influ-
The existing evidence suggests that DNN ence on the AD diagnosis due to a range of
might be effective for the AD diagnosis, and nonlinearities [63]. This information, on the
that the binary classification accuracy for dis- other hand, might be beneficial in clinical neuro-
crimination between control participants and imaging that is used to diagnose problems.
patients was good (above 95%) [6066]. As pre- Importantly, a predictor with high performance
viously stated, the absence of suitable ways to can categorize people using irrelevant variables
combat DNN classifier overfitting might explain rather than therapeutically important data.
these large findings. DNN is considered as a Another issue is that an accurate model without
dynamic technique that employs a large range understanding of underlying neurofunctional or
of hyperparameters and integrates multiple neuroanatomical variations may be insignificant
forms inside the same model [41,42]. CNNs are for medical purposes. To solve this problem,
a type of ANN that is activated by the visual deconvolution and input modification techniques
cortex of the human brain. CNNs have shat- might be applied. For visualization, the input
tered computer vision records in a variety of adjustment approach is used, which comprises
competitions, indicating a very promising methodological input modification, the activation
method [58]. In terms of detecting AD, CNNs functions of artificial neurons to the next layers,
have achieved the most impressive results. and the computation of the network’s output
Local networking and weight sharing are two alterations. The occlusion method [68], which
key aspects of CNNs, which result in a drasti- includes the section of the input image that
cally decreased number of weights and compu- impacts the output category’s likelihood, is an
tationally viable network. The CNNs are not example of this approach. The deconvolution
only utilized for discrimination between AD approach, on the other hand, aims to analyze the
and MCI patients, but the study has also had a impact of input characteristics to the output.
high level of effectiveness so far. Structural MRI Convolution also entails specifying an efficient
[60,64], CT imaging [67], and resting-state fMRI activation function in the output neuron, fol-
[65] were all used to achieve high efficiency. lowed by assessment of each neuron’s influence
Hosseini-Asl et al. [64] took a different strategy, in the appropriate layers [69] and DeconvNet
using a pretrained CNN model to the [41,68]. Furthermore, in comparison to other
Alzheimer’s dementia dataset, which they then deep learning models, TL approaches are quite
altered and used a different test dataset. The successful at detecting AD.

Applications of Artificial Intelligence in Medical Imaging


352 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

14.6 Conclusion and over [70]. Initially, we experimented with


feature extraction by using various different TL
In image-based diagnostics and disease diag- models such as VGG16 and ResNet 50 among
nosis, AI technologies are becoming increas- others. We further tried supervised ML models
ingly successful. To reach their full potential, on these extracted features as comparison of
several technological and pragmatic solutions their performance on direct images and
are required. This chapter uses MRI to demon- extracted features. We finally conclude that
strate the application of AI approaches in the feature extraction along with gradient boosting
detection of AD. In addition, AI technologies methods such as XGBoost and AdaBoost classi-
for the identification of AD that result in serious fier gives promising results for AD detection.
health problems were examined. When it comes
to comparing AI algorithms, TL models have
outperformed other deep learning models for References
detecting AD. As a result, numerous AI [1] F. Richard, J. Zeisel, K. Bennett, Design, dignity,
approaches for detecting AD are investigated. It dementia: dementia-related design and the built envi-
is reported that TL approaches have obtained ronment, World Alzheimer Rep. 2020 (2020).
the best results in detecting AD. [2] R. Brookmeyer, E. Johnson, K. Ziegler-Graham, H.M.
Arrighi, Forecasting the global burden of Alzheimer’s dis-
AI has shown to be effective in diagnosing AD ease, Alzheimers Dement. 3 (3) (2007) 186191. Available
and has the potential to make substantial advance- from: https://doi.org/10.1016/j.jalz.2007.04.381.
ments in neurological disorders. However, various [3] G.A. Malik, N.P. Robertson, Treatments in
improvements are required to fully comprehend Alzheimer’s disease, J. Neurol. 264 (2) (Feb. 2017)
AI’s capabilities in the AD diagnosis. Because AI 416418. Available from: https://doi.org/10.1007/
s00415-017-8395-1.
methods are sophisticated, a dataset with signifi- [4] The Lancet, The three stages of Alzheimer’s disease,
cantly larger samples is required at first, rather Lancet 377 (9776) (2011) 1465. Available from: https://
than moderate or small sample sizes. To accom- doi.org/10.1016/S0140-6736(11)60582-5.
plish so, multicenter partnerships are necessary, [5] J. Neugroschl, S. Wang, Alzheimer’s disease: diagnosis
in which data is collected across sites using the and treatment across the spectrum of disease severity, Mt.
Sinai J. Med. J. Transl. Pers. Med. 78 (4) (2011) 596612.
same recording circumstances and scanning Available from: https://doi.org/10.1002/msj.20279.
methodologies. Then, by combining diverse [6] E. Merlo Pich, et al., Imaging as a biomarker in drug
AI techniques, it is feasible to create significant discovery for Alzheimer’s disease: is MRI a
AI improvements. Finally, the application of AI suitable technology? Alzheimers Res. Ther. 6 (4) (2014)
to imagine consistent scores might be applied for 51. Available from: https://doi.org/10.1186/alzrt276.
[7] M.N. Braskie, A.W. Toga, P.M. Thompson, Recent
prospective medical employment research [42]. advances in imaging Alzheimer’s disease, J.
Finally, the ability of AI systems to learn com- Alzheimers Dis. 33 (s1) (2012) S313S327. Available
plex features through nonlinear transformations from: https://doi.org/10.3233/JAD-2012-129016.
might lead to promising findings in the detection [8] R. Ossenkoppele, et al., Prevalence of amyloid PET
of AD. Although considerable roadblocks positivity in dementia syndromes: a meta-analysis,
JAMA 313 (19) (2015) 1939. Available from: https://
remain, the findings presented here give a fun- doi.org/10.1001/jama.2015.4669.
damental example of AI algorithms’ potential [9] J.F. Norfray, J.M. Provenzale, Alzheimer’s disease: neu-
relevance in the development of future AD prog- ropathologic findings and recent advances in imaging,
nostic and diagnostic markers [41]. Am. J. Roentgenol. 182 (1) (2004) 313. Available from:
In this chapter, we showed some findings https://doi.org/10.2214/ajr.182.1.1820003.
[10] L.J. Whalley, Spatial distribution and secular trends in the
from brain MRI scans utilizing several AI mod- epidemiology of Alzheimer’s disease, Neuroimaging
els for identifying AD, which is the fifth largest Clin. N. Am. 22 (1) (2012) 110. Available from: https://
cause of mortality among Americans aged 65 doi.org/10.1016/j.nic.2011.11.002.

Applications of Artificial Intelligence in Medical Imaging


References 353
[11] M. Tanveer, A.H. Rashid, M.A. Ganaie, M. Reza, I. Alzheimer’s disease, NeuroImage 50 (1) (2010)
Razzak, K.-L. Hua, Classification of Alzheimer’s dis- 162174. Available from: https://doi.org/10.1016/j.
ease using ensemble of deep neural networks trained neuroimage.2009.11.046.
through transfer learning, IEEE J. Biomed. Health Inf. [22] Y. Zhang, et al., Multivariate approach for Alzheimer’s
26 (4) (2022) 14531463. Available from: https://doi. disease detection using stationary wavelet entropy and
org/10.1109/JBHI.2021.3083274. predator-prey particle swarm optimization, J.
[12] S. Dwivedi, T. Goel, M. Tanveer, R. Murugan, R. Alzheimers Dis. 65 (3) (2018) 855869. Available from:
Sharma, Multi-modal fusion based deep learning net- https://doi.org/10.3233/JAD-170069.
work for effective diagnosis of Alzheimers disease, [23] S.-H. Wang, et al., Single slice based detection for
IEEE Multimed. (2022) 1. Available from: https://doi. Alzheimer’s disease via wavelet entropy and multi-
org/10.1109/MMUL.2022.3156471. layer perceptron trained by biogeography-based opti-
[13] U.R. Acharya, et al., Automated detection of mization, Multimed. Tools Appl. 77 (9) (2018)
Alzheimer’s disease using brain MRI images  a 1039310417. Available from: https://doi.org/
study with various feature extraction techniques, J. 10.1007/s11042-016-4222-4.
Med. Syst. 43 (9) (2019) 302. Available from: https:// [24] Y. Zhang, S. Wang, Z. Dong, Classification of Alzheimer
doi.org/10.1007/s10916-019-1428-9. disease based on structural magnetic resonance imaging
[14] N.A. Mathew, R.S. Vivek, P.R. Anurenjan, Early by kernel support vector machine decision tree, Prog.
Diagnosis of Alzheimer’s Disease from MRI Images Electromagn. Res. 144 (2014) 171184. Available from:
Using PNN, in: 2018 International CET Conference on https://doi.org/10.2528/PIER13121310.
Control, Communication, and Computing (IC4), [25] Y. Zhang, S. Wang, Detection of Alzheimer’s disease
Thiruvananthapuram, Jul. 2018, pp. 161164. Available by displacement field and machine learning, PeerJ 3
from: https://doi.org/10.1109/CETIC4.2018.8530910. (2015) e1251. Available from: https://doi.org/
[15] R. Varatharajan, G. Manogaran, M.K. Priyan, R. 10.7717/peerj.1251.
Sundarasekar, Wearable sensor devices for early [26] E.-S.A. El-Dahshan, T. Hosny, A.-B.M. Salem, Hybrid
detection of Alzheimer disease using dynamic time intelligent techniques for MRI brain images classification,
warping algorithm, Clust. Comput. 21 (1) (2018) Digit. Signal. Process. 20 (2) (2010) 433441. Available
681690. Available from: https://doi.org/10.1007/ from: https://doi.org/10.1016/j.dsp.2009.07.002.
s10586-017-0977-2. [27] S.-H. Wang, et al., Alzheimer’s disease detection by
[16] B.C. Dickerson, et al., Alzheimer-signature MRI biomarker pseudo Zernike moment and linear regression classifi-
predicts AD dementia in cognitively normal adults, cation, CNS Neurol. Disord. - Drug. Targets 16 (1)
Neurology 76 (16) (2011) 13951402. Available from: (2017) 1115. Available from: https://doi.org/
https://doi.org/10.1212/WNL.0b013e3182166e96. 10.2174/1871527315666161111123024.
[17] C. Patil et al., Using image processing on MRI scans, [28] H.T. Gorji, J. Haddadnia, A novel method for early
in: 2015 IEEE International Conference on Signal diagnosis of Alzheimer’s disease based on pseudo
Processing, Informatics, Communication and Energy Zernike moment from structural MRI, Neuroscience
Systems (SPICES), Kozhikode, India, Feb. 2015, 305 (2015) 361371. Available from: https://doi.org/
pp. 15. Available from: https://doi.org/10.1109/ 10.1016/j.neuroscience.2015.08.013.
SPICES.2015.7091517. [29] S. Wang, Y. Zhang, G. Liu, P. Phillips, T.-F. Yuan,
[18] A. Kaur, P. Kaur, A comparative study of various exu- Detection of Alzheimer’s disease by three-dimensional
date segmentation techniques for diagnosis of diabetic displacement field estimation in structural magnetic
retinopathy, Int. J. Curr. Eng. Technol. 46 (2016) 142146. resonance imaging, J. Alzheimers Dis. 50 (1) (2015)
[19] Y. Zhao, et al., In vivo detection of microstructural cor- 233248. Available from: https://doi.org/10.3233/
relates of brain pathology in preclinical and early JAD-150848.
Alzheimer disease with magnetic resonance imaging, [30] Y. Zhang, et al., Detection of subjects and brain
NeuroImage 148 (2017) 296304. Available from: regions related to Alzheimer’s disease using 3D MRI
https://doi.org/10.1016/j.neuroimage.2016.12.026. scans based on eigenbrain and machine learning,
[20] Z. Sankari, H. Adeli, Probabilistic neural networks for Front. Comput. Neurosci. 9 (2015). Available from:
diagnosis of Alzheimer’s disease using conventional https://doi.org/10.3389/fncom.2015.00066.
and wavelet coherence, J. Neurosci. Methods 197 (1) [31] Y. Zhang, S. Wang, P. Sun, P. Phillips, Pathological
(2011) 165170. Available from: https://doi.org/ brain detection based on wavelet entropy and Hu
10.1016/j.jneumeth.2011.01.027. moment invariants, Biomed. Mater. Eng. 26 (s1) (2015)
[21] C. Plant, et al., Automated detection of brain atrophy S1283S1290. Available from: https://doi.org/
patterns based on MRI for the prediction of 10.3233/BME-151426.

Applications of Artificial Intelligence in Medical Imaging


354 14. Artificial intelligence based Alzheimer’s disease detection using deep feature extraction

[32] K. Hett, V.-T. Ta, J.V. Manjón, P. Coupé, Adaptive in: 2020 IEEE International Symposium on Sustainable
fusion of texture-based grading for Alzheimer’s dis- Energy, Signal Processing and Cyber Security (iSSSC),
ease classification, Comput. Med. Imaging Graph. 70 Gunupur Odisha, India, Dec. 2020, pp. 16. Available
(2018) 816. Available from: https://doi.org/10.1016/ from: https://doi.org/10.1109/iSSSC50941.2020.9358867.
j.compmedimag.2018.08.002. [45] A.J. Dinu, R. Ganesan, Early detection of Alzheimer’s
[33] X.W. Gao, R. Hui, Z. Tian, Classification of CT brain disease using predictive k-NN instance based approach
images based on deep learning networks, Comput. and T-test method, Int. J. Adv. Trends Comput. Sci. Eng.
Methods Prog. Biomed. 138 (2017) 4956. Available 8 (1.4) (Sep. 2019) 2937. Available from: https://doi.
from: https://doi.org/10.1016/j.cmpb.2016.10.007. org/10.30534/ijatcse/2019/0581.42019.
[34] W. Ayadi, W. Elhamzi, I. Charfi, M. Atri, A hybrid [46] A.B. Rabeh, F. Benzarti, H. Amiri, Diagnosis of
feature extraction approach for brain MRI classifica- Alzheimer diseases in early step using SVM (support
tion based on bag-of-words, Biomed. Signal. Process. vector machine), in: 2016 13th International Conference
Control. 48 (2019) 144152. Available from: https:// on Computer Graphics, Imaging and Visualization
doi.org/10.1016/j.bspc.2018.10.010. (CGiV), Beni Mellal, Morocco, Mar. 2016, pp. 364367.
[35] C.C. Aggarwal, Neural Networks and Deep Learning: A Available from: https://doi.org/10.1109/CGiV.2016.76.
Textbook, Springer, Cham, Switzerland, 2018. Available [47] S. Alam, G.-R. Kwon, J.-I. Kim, C.-S. Park, Twin SVM-
from: https://link.springer.com/book/10.1007/978-3-319- based classification of Alzheimer’s disease using com-
94463-0. plex dual-tree wavelet principal coefficients and LDA,
[36] S.J. Pan, Q. Yang, A survey on transfer learning, IEEE J. Healthc. Eng. 2017 (2017) 8750506. Available from:
Trans. Knowl. Data Eng. 22 (10) (Oct. 2010) https://doi.org/10.1155/2017/8750506.
13451359. Available from: https://doi.org/10.1109/ [48] P.J. Moore, T.J. Lyons, J. Gallacher, Alzheimer’s
TKDE.2009.191. Disease Neuroimaging Initiative, Random forest pre-
[37] K. Simonyan, A. Zisserman, Very deep convolutional diction of Alzheimer’s disease using pairwise selection
networks for large-scale image recognition. arXiv pre- from time series data, PLoS One 14 (2) (2019)
print arXiv:1409.1556, https://arxiv.org/abs/1409.1556, e0211558. Available from: https://doi.org/10.1371/
Sep. 2014. journal.pone.0211558.
[38] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning [49] A. Sarica, A. Cerasa, A. Quattrone, Random Forest
for image recognition, ArXiv151203385 Cs, http://arxiv. algorithm for the classification of neuroimaging data
org/abs/1512.03385, Dec. 2015 (accessed 01.03.22). in Alzheimer’s disease: a systematic review, Front.
[39] Y. Bengio, Learning deep architectures for AI, Found. Aging Neurosci. 9 (329) (2017). Available from:
Trendss Mach. Learn. 2 (1) (2009) 1127. Available https://doi.org/10.3389/fnagi.2017.00329.
from: https://doi.org/10.1561/2200000006. [50] A.V. Lebedev, et al., Random Forest ensembles for
[40] E. Alpaydin, Introduction to Machine Learning, Third detection and prediction of Alzheimer’s disease with a
edition, The MIT Press, Cambridge, Massachusetts, 2014. good between-cohort robustness, NeuroImage Clin. 6
[41] A. Subasi, Use of artificial intelligence in Alzheimer’s (2014) 115125. Available from: https://doi.org/
disease detection, Artificial Intelligence in Precision 10.1016/j.nicl.2014.08.023.
Health, Elsevier, 2020, pp. 257278. Available from: [51] M.S. Ali, Md. K. Islam, J. Haque, A.A. Das, D.S. Duranta,
https://doi.org/10.1016/B978-0-12-817133-2.00011-2. M.A. Islam, Alzheimer’s disease detection using m-
[42] S. Vieira, W.H.L. Pinaya, A. Mechelli, Using deep learn- Random Forest algorithm with optimum features extrac-
ing to investigate the neuroimaging correlates of psychiat- tion, in: 2021 1st International Conference on Artificial
ric and neurological disorders: methods and applications, Intelligence and Data Analytics (CAIDA), Riyadh, Saudi
Neurosci. Biobehav. Rev. 74 (2017) 5875. Available Arabia, Apr. 2021, pp. 16. Available from: https://doi.
from: https://doi.org/10.1016/j.neubiorev.2017.01.002. org/10.1109/CAIDA51941.2021.9425212.
[43] M. Deepika Nair, M.S. Sinta, M. Vidya, A study on vari- [52] A. Savio, M. Garcı́a-Sebastián, M. Graña, J. Villanúa,
ous deep learning algorithms to diagnose Alzheimer’s Results of an Adaboost approach on Alzheimer’s disease
disease, in: D. Pandian, X. Fernando, Z. Baig, F. Shi, detection on MRI, in: J. Mira, J.M. Ferrández, J.R. Álvarez,
(Eds.), Proc. International Conference on ISMAC in F. de la Paz, F.J. Toledo, (Eds.). Bioinspired applications
Computational Vision and Bio-Engineering 2018 (ISMAC- in artificial and natural computation, vol. 5602, Berlin,
CVB), vol. 30, Cham: Springer International Publishing, Heidelberg: Springer Berlin Heidelberg, 2009,
2019, pp. 17051710. Available from: https://doi.org/ pp. 114123. Available from: https://doi.org/10.1007/
10.1007/978-3-030-00665-5_157. 978-3-642-02267-8_13.
[44] S. Aruchamy, V. Mounya, A. Verma, Alzheimer’s disease [53] J.H. Morra, Z. Tu, L.G. Apostolova, A.E. Green, A.W.
classification in brain MRI using modified kNN algorithm, Toga, P.M. Thompson, Comparison of AdaBoost and

Applications of Artificial Intelligence in Medical Imaging


References 355
support vector machines for detecting Alzheimer’s dis- [62] H.-I. Suk, S.-W. Lee, D. Shen, Hierarchical feature
ease through automated hippocampal segmentation, representation and multimodal fusion with deep
IEEE Trans. Med. Imaging 29 (1) (2010) 3043. Available learning for AD/MCI diagnosis, NeuroImage 101
from: https://doi.org/10.1109/TMI.2009.2021941. (2014) 569582. Available from: https://doi.org/
[54] G. Battineni, et al., Improved Alzheimer’s disease 10.1016/j.neuroimage.2014.06.077.
detection by MRI using multimodal machine learning [63] H.-I. Suk, S.-W. Lee, D. Shen, The Alzheimer’s disease
algorithms, Diagnostics 11 (11) (2021) 2103. Available neuroimaging initiative, Latent feature representation
from: https://doi.org/10.3390/diagnostics11112103. with stacked auto-encoder for AD/MCI diagnosis, Brain
[55] L. Akter, Ferdib-Al-Islam, Dementia identification for Struct. Funct. 220 (2) (Mar. 2015) 841859. Available
diagnosing Alzheimer’s disease using XGBoost algo- from: https://doi.org/10.1007/s00429-013-0687-3.
rithm, in: 2021 International Conference on Information [64] E. Hosseini-Asl, G. Gimel’farb, A. El-Baz, Alzheimer’s
and Communication Technology for Sustainable Disease diagnostics by a deeply supervised
Development (ICICT4SD), Dhaka, Bangladesh, Feb. adaptable 3D convolutional network, ArXiv160700556
2021, pp. 205209. Available from: https://doi.org/ Cs Q-Bio Stat, http://arxiv.org/abs/1607.00556, Jul.
10.1109/ICICT4SD50815.2021.9396777. 2016 (accessed 09.01.21).
[56] Y. Shmulev, M. Belyaev, Predicting conversion of mild [65] S. Sarraf, G. Tofighi, Classification of Alzheimer’s dis-
cognitive impairments to Alzheimer’s disease and ease using fMRI data and deep learning convolutional
exploring impact of neuroimaging, ArXiv180711228 neural networks, ArXiv160308631 Cs,. [Online].
Cs Stat, http://arxiv.org/abs/1807.11228, Jul. 2018 Available: http://arxiv.org/abs/1603.08631, Mar.
(accessed 06.03.22). 2016, (accessed 06.03.22).
[57] I.H. Witten, E. Frank, Data Mining: Practical Machine [66] H.-I. Suk, S.-W. Lee, D. Shen, The Alzheimer’s disease
Learning Tools and Techniques with Java neuroimaging initiative, Deep sparse multi-task learn-
Implementations, Morgan Kaufmann, San Francisco, ing for feature selection in Alzheimer’s disease diag-
California, 2000. nosis, Brain Struct. Funct. 221 (5) (Jun. 2016)
[58] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet 25692587. Available from: https://doi.org/10.1007/
classification with deep convolutional neural net- s00429-015-1059-y.
works, Commun. ACM 60 (6) (May 2017) 8490. [67] X.W. Gao, R. Hui, A deep learning based approach to
Available from: https://doi.org/10.1145/3065386. classification of CT brain images, in: 2016 SAI
[59] Q.V. Le et al., Building high-level features using large Computing Conference (SAI), London, United
scale unsupervised learning, ArXiv11126209 Cs, http:// Kingdom, Jul. 2016, pp. 2831. Available from:
arxiv.org/abs/1112.6209, Jul. 2012 (accessed 06.03.22). https://doi.org/10.1109/SAI.2016.7555958.
[60] A. Payan, G. Montana, Predicting Alzheimer’s disease: [68] M.D. Zeiler, R. Fergus, Visualizing and understanding
a neuroimaging study with 3D convolutional neural convolutional networks, ArXiv13112901 Cs, http://
networks, ArXiv150202506 Cs Stat. [Online]. arxiv.org/abs/1311.2901, Nov. 2013 (accessed 06.03.22).
Available: http://arxiv.org/abs/1502.02506, Feb. 2015 [69] J.T. Springenberg, A. Dosovitskiy, T. Brox, M.
(accessed 09.01.21). Riedmiller, Striving for simplicity: the all
[61] H.-I. Suk, D. Shen, Deep learning-based feature repre- Convolutional Net, ArXiv14126806 Cs, http://arxiv.
sentation for AD/MCI classification, in: C. Salinesi, M. org/abs/1412.6806, Apr. 2015, (accessed 06.03.22).
C. Norrie, Ó. Pastor (Eds.), Advanced Information [70] W. Thies, L. Bleiler, 2012 Alzheimer’s disease facts
Systems Engineering, vol. 7908, Springer Berlin and figures Alzheimer’s Association, Alzheimers
Heidelberg, Berlin, Heidelberg, 2013, pp. 583590. Dement. 8 (2) (2012) 131168. Available from:
Available from: 10.1007/978-3-642-40763-5_72. https://doi.org/10.1016/j.jalz.2012.02.001.

Applications of Artificial Intelligence in Medical Imaging


This page intentionally left blank
Index

Note: Page numbers followed by “f” and “t” refer to figures and tables respectively.

A breast cancer, 112115 B


Accuracy (ACC), 123, 167, 216217, breast tumor detection, 149178 Backpropagation method, 269
253, 289290, 346 classification techniques, 339343 Bagging, 1011
Acral lentiginous melanoma, 184 AdaBoost, 342343 COVID-19 detection with deep
AdaBoost, 1112, 53, 272273, ANN, 339 feature extraction, 232
336337, 342343, 348 KNN, 340 feature extraction with pretrained
classifier, 312, 343 RF, 342 models, 237
COVID-19 detection with deep SVM, 341 Batch normalization, 2022, 288289
feature extraction, 232 XGBoost, 343 Bidirectional LSTM (Bi-LSTM), 192
feature extraction with pretrained colon cancer detection, 268270 Bidirectional recurrent neural
models, 236237 ANN, 269 networks (HA-BiRNN), 138
melanoma skin cancer, CNN, 270 Biomarker testing, 266
190191 deep learning, 269270 Biopsy, 53, 266
Adam, 116119 deep feature extraction techniques, Bleeding, 244
Adamax, 116119 337339 Blood tests, 266267
Adaptive Boosting. See AdaBoost feature extraction, 8284 Blood vessels, 244
Adenocarcinoma, 5152 lung cancer detection, 5872 Boosting, 11
colon cancer, 266 performance evaluation measures, Brain hemorrhage detection, 296t,
AlexNet, 32, 57, 76, 186187, 210212, 345347 299300
251, 284285, 300 example of confusion deep learning methods, 286289
Alphafold, 81 matrix, 346f Brain stroke detection, 209210
Alveoli, 5152 performance evaluation metrics, 92 deep learning methods, 210216
Alzheimer’s dementia dataset, 351 prediction and classification, 88 Brain tissue density analysis, 209210
Alzheimer’s disease (AD), 333334 supervised learning, 747 Brain tumor, 7576
artificial intelligence models, transfer learning, 8587 Breast cancer, 109110, 112115, 137,
337343 unsupervised learning, 56 139140
detection using artificial Artificial neural networks (ANNs), Breast self-examination, 138
intelligence, 344350 1314, 5556, 7879, 112, 245, Breast ultrasonography, 137138
diagnosis, 334335 269, 285286, 312, 339, 348. Bronchi, 5152
Amelanotic melanomas, 184 See also Convolutional neural Bronchioles, 5152
Area under curve (AUC), 5354, 88, networks (CNNs)
123124, 168, 187, 234, 244, breast cancer, 139140 C
253254, 285286 colon cancer, 269 Cancer, 5152
Artificial intelligence (AI), 1, 137138, melanoma skin cancer, 187188 Categorical_crossentropy loss
185, 208209, 251252, 267, Autoencoders, 247 function, 324
306, 339 Automatic brain tumor detection, 77 CBIS-DDSM, 114115
AD detection, 344350 Automatic detection methodology, Central nervous system (CNS), 7576
AI-based methods, 224 245246 Chaotic crow search algorithm
AI-driven AD diagnosis method, Average pooling, 271272, 307308, (CCSA), 54
334335 325 Chemotherapy, 184185

357
358 Index

Choroidal neovascularization (CNV), CNN-based automated systems, DarkNet-19, 288289


306 322323 EfficientNet-B0, 287288
CIFAR-10, 327329 CNN-based breast cancer detection ResNet-18, 286
Classical machine learning, 12 methodology, 322323 VGG-16, 288
Clustering, 56 CNN-DRD model, 250 brain stroke detection, 210216
Code snippet, 341, 343 CNN-RNN model, 285286 AlexNet model, 210212, 210f
Cohen Kappa Score, 258 colon cancer, 270 GoogleNet, 212, 214f
Cohen’s kappa coefficient, 124, convolutional neural network-based residual convolutional neural
167168 classification, 307308 network, 212214, 215f
Cohen’s kappa statistic, 234 convolutional neural network-based VGG-16, 214215, 215f
Colon adenocarcinoma, 274 diagnosis method, 324326 VGG-19, 215216, 215f
Colon cancer breast cancer, 140141, 140f breast cancer, 140
artificial intelligence for colon diabetic retinopathy detection, colon cancer, 269270
cancer detection, 268270 249250 diabetic retinopathy detection
DenseNet121, 279t dropout, 1920 methods, 247260
DenseNet169, 279t early stopping, 20 CNN, 249250
histopathological images, 274 fully connected layers, 19 Diabetic Retinopathy 224X224
InceptionResNetV2, 278t melanoma skin cancer, 193194 Gaussian Filtered, 252, 258260
InceptionV3, 278t padding, 18 diabetic retinopathy sample
MobileNet, 278t pooling, 19 dataset, 254257
MobileNetV2, 277t ReLU layer, 1819 Diabetic-Retinopathy, 252
ResNet50, 277t strides, 18 DNN, 247248
ResNet101, 277t training, 19 DL-supported models, 247
VGG16, 276t Coronavirus disease 2019 detection Deep neural networks (DNNs), 5556,
VGG19, 276t (COVID-19 detection), 224225 138, 247248, 334335
Xception, 279t AdaBoost, 232 diabetic retinopathy detection,
Colonoscopy, 266 bagging, 232 247248
Colorectal cancer, 265266 DenseNet, 227228 Denoising, 2
Computed tomography (CT), 266, 284, Inception/GoogLeNet, 229 Densely connected convolutional
334 K-nearest neighbors, 230231 networks, 4445
Computer-aided decision support MobileNet, 228 DenseNet, 44, 328329
systems, 339 random forests, 231 DenseNet121, 71, 8284, 9495, 251,
computer-aided AD diagnosis, 334, ResNet, 227 279, 337338, 350
336 support vector machine, 231 DenseNet169, 71, 8284, 251,
computer-aided diagnosis (CAD), visual geometry group, 229 328329, 337338, 350
23, 3f, 110, 285286, 322 Xception, 228 Depthwise convolution, 3839
Computer-assisted diagnosis, XGBoost, 232233 Desmoplastic melanomas, 184
242243 DFCNet model, 55
Confusion matrix, 92, 233234, 244, D Diabetes, 243244
289290, 346 Darknet reference, 288289 Diabetes mellitus (DM), 242
Convolution, 16, 284, 351 DarkNet-19, 288289 Diabetic macular edema (DME), 246,
Convolutional layers, 288289, 325 Data augmentation, 2425 306
Convolutional neural network and Decision stumps, 342 Diabetic Retinopathy 224x224,
graph search strategies (CNN- Decision tree (DT), 9, 335336 246247, 252, 258260
GS), 307 Deep feature extraction techniques, 78, Diabetic retinopathy detection (DR
Convolutional neural networks 82, 312, 337339 detection), 241242
(CNNs), 1622, 54, 5657, 76, Deep learning (DL), 14, 53, 56, Diabetic Retinopathy Sample Dataset
8081, 110, 114115, 138, 185, 112113, 137138, 247, 267, Binary Dataset, 254257
209210, 224, 242243, 247, 269270, 306 Dimension reduction, 45, 59
267, 284, 306, 322, 334335 algorithms, 185 Discrete wavelet transform (DWT),
batch normalization, 2022 architectures, 322 336
CNN-based architectures, 322323, brain hemorrhage detection, Diseases detection using artificial
325 286289 intelligence, 271280
Index 359
Disjoint sets, 233 G Intracranial hemorrhage (ICH),
Displacement field (DF), 336 Gaussian blur, 341 283284
Downsampling, 3839 Generalized discriminant analysis, 59 Intraoperative OCT (iOCT), 306
Dropout, 1920 Generative Adversarial Networks Intraparenchymal ICH (IPH ICH),
Drusen, 306 (GAN), 2530, 110, 252 283284
Gini index, 9 Intraventricular ICH (IVH ICH),
E Google LeNet, 337338 283284
Early stopping, 20 Google Net (2014), 57, 76, 212 Invasive ductal carcinoma (IDC), 323
Edge Adaptive Total Variation Gradient boosting. See XGBoost Inverted residual blocks, 170174
Denoising Technique, 77 Gradient echo plural contrast imaging Ischemic strokes, 207208
Edge detection features extraction, 341 (GEPCI), 335 Iteration vN, 337338
EfficientNet model, 287
EfficientNet-B0, 287288 H J
Electronic medical record (EMR), 318 Head CT Hemorrhage Image Dataset Japanese Society of Radiology
End to-end deep ensemble (HCTHID), 290291 Technology (JSRT), 54
methodology, 322323 Head trauma, 299
Ensemble learning, 187, 343 Hematoxylin and eosin (H&E), K
Epidural (EDH), 283284 321322 k-means clustering method, 6
Euclidean distance, 68 Hemorrhages, 242 K-nearest neighbors (KNN), 53,
Evaluation metrics, 166167 classification accuracy, 299 7677, 82, 138, 225, 312,
Exotic architecture, 251, 337338 Hemorrhagic strokes, 207208 335337, 340, 348
Experimental data Heterodyne reflectance tomography, COVID-19 detection with deep
breast cancer, 119122, 165166 305306 feature extraction, 230231
lung cancer, 61 Hierarchical clustering, 56 melanoma skin cancer, 189
magnetic resonance imaging, 88 Histogram-based model, 209210 Kaggle Data Science Bowl 2017
Experimental setup, magnetic Histopathological images, 54, 321323 (KDSB17), 5455
resonance imaging, 8891 Hospital Pedro Hispano (PH2), 187 Kappa score, 123
Eye disease, 241242 Hybrid deep CNN model, 7778 Keras, 42
Eyepacs-1 datasets, 246 Hyperbolic tangent, 14
Hyperparameters, 228 L
F Hypertension, 242 Large-cell carcinoma, 5152
F-score, 289290 Learning, 12
F1 score, 123124, 167, 253, 346347 I Lentigo maligna melanoma, 184
False negatives (FN), 92, 123, 216217, Image analysis techniques, 242243 Lesions, 242
233234, 254, 289290, 346 Image augmentation, 110 Linear discriminant analysis (LDA),
False positives (FP), 92, 123, 216217, Image classification, AI for, 35 272, 341
233234, 254, 289290, 346 Image Database Resource Initiative Linear function, 1314
Feature extraction, 8284 (IDRI), 54 Linear regression (LR), 336
AdaBoost and XGBoost, 236237, Image processing feature extraction, Locally linear embedding (LLE), 59
236t, 237t 340 Logistic regression (LR), 208209
bagging, 237, 237t Image segmentation, 139140 Long short-term memory (LSTM),
using deep learning, 58, 115116, with clustering, 6 2324
149150, 271272 ImageNet, 327329, 337338 Lung cancer, 5152, 61
K-nearest neighbors, 235, 235t Immunotherapy, 185 Lung Image Database Consortium
Random Forest, 236, 236t Inception, 288289 (LIDC), 54
process, 111112 Inception-ResNet, 4041
support vector machine, 235, 236t Inception-v4, 4041 M
Fluorodeoxyglucose PET-CT (FDG InceptionNet-V3, 8284 Machine learning (ML), 12, 4, 138,
PET-CT), 334 InceptionResNetV2, 70, 8284, 176, 208209, 272273, 312
Fully connected layers (FCL), 19, 251, 337338, 350 Magnetic resonance imaging (MRI), 2,
210212 InceptionV3, 70, 251, 278, 337338, 7576, 266, 321322, 334335
Fuzzy-based voting system, 245 349350 Malignant melanoma, 183184
360 Index

Mammographic Image Analysis Ocular OCT, 305306 Rectified linear unit (ReLU), 16,
Society (MIAS), 114115, Ophthalmological surgery, 306 1819, 54, 269
124128 Optical coherence tomography (OCT), Recurrent neural networks (RNNs),
Mammography, 110112, 138, 306, 318 23, 191, 285286
321322 angiography, 306 Region of interest (ROI), 110
Manhattan distance, 78 Optical rational tomography, 305306 Regularization, 1516
MATLAB 2013a software, 54 Ovarian cancer, 265266 Residual convolutional neural
Matthews correlation coefficient Over Feat, 251 network, 212214
(MCC), 289290 Overfitting problem in neural network Residual network (ResNet), 36, 57, 76,
Max pooling, 271272, 307308, 325, training, 1516 170, 212213, 251, 284285,
337338 regularization, 1516 328, 337338
Medical imaging, 23, 7576, 339 ResNet-18, 286
Melanin, 185186 P ResNet50, 68, 8284, 170, 251, 349
Melanocytes, 185186 Padding, 18, 271272 ResNet101, 69, 8284, 93, 170, 251,
Melanoma, 183184 Paralysis, 207208 277, 337338, 349
Melanoma cancer, 184 Particle swarm optimization (PSO), Restricted Boltzmann machines, 247
Melanoma skin cancer, 185186 335 Retinal diseases, 306
Messidor-2 datasets, 246 Partition clustering, 56 Retinal images, 244245
Metastatic melanoma, 184 Performance evaluation metrics, 61, Reverse transcription-polymerase
Metrics, 329330 123124 chain reaction (RTPCR), 224
Micro-architecture, 337338 Planting lesions, 242243 RMSprop, 116119
Microaneurysms, 242, 244 Polyps, 265266 Root mean square (RMS), 329
Minimalist machine learning (MML), Pooling, 16, 19, 284
285286 Positron emission tomography-CT S
MNIST, 327329 (PET-CT), 334 Scikit-Learn, 59
MobileNet, 8284, 277278, 329, Precision, 123, 253, 346 Secretions, 244
337338, 349 Prediction Segmentation process, 111112
MobileNetV2 model, 69, 8284, 93, breast cancer, 116119, 151164 Semisupervised learning, 4
251, 277278, 337338, 349 lung cancer, 5960 Sensitivity (SEN), 289290
Multi-textural pipeline (MTL magnetic resonance imaging, 88 Sigmoid function, 14
pipeline), 336 Pregnancy, 242 Sigmoid layer, 116119
Multiclass classification, 5960 Preprocessing methods, 110 Sign function, 14
Multilabel classification, 60 Pretrained convolutional neural Similarity-based clustering, 56
Multilayer perceptron (MLP), 187188 network-based diagnosis Single slice method (SS method),
Multilevel feature extractor, 337338 method, 327329 335336
Multiple instance learning (MIL), 187 Pretrained models, 63 Single-task CNN variant, 322323
Multitask CNN variant, 322323 Principal component analysis (PCA), Skin cancer, 183184
59, 268, 285286 Small-cell lung cancer (SCLC), 5152
N Probabilistic neural network (PNN), Softmax classifier, 187, 269
Naive Bayes classifier, 138 335 Sparse coding, 247
Nephropathy, 242 Specificity (SPE), 216217, 289290
Network in Network, 288289 R Squamous-cell carcinoma, 5152
Neural network (NN), 209210 Radiation therapy, 185 Stochastic gradient descent (SGD),
overfitting problem in neural Random Forest (RF), 910, 53, 9394, 116119
network training, 1516 138, 208209, 272273, Strides, 18
Nodular melanoma, 184 336337, 342, 348 Stroke classification models, 208209
Nonlinear functions, 1314 COVID-19 detection with deep Student’s t-test, 336
Nonsmall-cell lung cancer (NSCLC), feature extraction, 231 Subarachnoid ICH (SAH ICH),
5152 RCNN classifier, 138 283284
Recall, 123, 253, 346 Subdural ICH (SDH ICH), 283284
O Receiver operating characteristic curve Superficial spreading melanoma, 184
Occlusion method, 351 (ROC curve), 124, 168, 187, Supervised learning, 4, 747
Ocular melanoma, 184 216217, 234, 244, 289290 AdaBoost, 1112
Index 361
artificial neural networks, 1314 Traditional sequential network Uterine cancer, 265266
bagging, 1011 architectures, 251
boosting, 11 Transfer learning (TL), 3147, 57, 110, V
convolutional neural network, 115116, 149, 225229, Valid padding, 271272
1622 334335 Visual geometry group (VGG), 3435,
data augmentation, 2425 AlexNet, 32 251, 277, 327
decision tree, 9 Inception-ResNet, 4041 VGG16, 5354, 6368, 8284, 93,
deep learning, 14 Inception-v4, 4041 169, 186187, 214215, 251,
GANs, 2530 magnetic resonance imaging, 275, 288, 322323, 337338, 348
K-nearest neighbor approach, 79 8587 VGG19, 68, 8284, 169, 215216,
LSTM, 24 melanoma skin cancer, 195 251, 275, 337338, 348349
overfitting problem in neural MobileNet architecture, 3839 VGGNet, 284285
network training, 1516 ResNet, 36 Visual imaging technologies, 321322
random forest, 910 transfer learning-based Voting feature interval (VFI), 335
RNNs, 23 classification, 309311
transfer learning, 3147 visual geometry group, 3435 W
XGBoost, 1213 Xception, 4243 Weight sharing, 284
Support vector machine (SVM), 89, Trees, 342 Welch’s t-test (WTT), 336
53, 7677, 138, 208209, 268, True Benign, 167
272273, 312, 335337, 341, 348 True malignant, 167 X
classifier, 138 True negative components (TN X-ray
COVID-19 detection with deep components), 216217 images, 2
feature extraction, 231 True negatives (TN), 123, 233234, scans, 224
feature extraction with pretrained 254, 289290, 346 Xception, 4243, 178, 280, 337338,
models, 235 True normal, 167 350
melanoma skin cancer, 189 True positives (TP), 123, 216217, XceptionNet, 8284
Surgery, 185 233234, 254, 289290, 346 XGBoost, 1213, 53, 82, 272273, 343
Twin SVM (TSVM), 336 classifier, 9394, 312, 348
T 2D windows, 324325
t-distributed stochastic neighbor Y
embedding (t-SNE), 272 U YOLO algorithms, 288289
TensorFlow, 42 Ultrasound (US), 321322 YOLOv2, 288289
3D DF estimation-based method, 336 imaging, 137138
Traditional diabetic retinopathy Unsupervised learning, 46, 185. Z
detection approach, 244245 See also Supervised learning
Zernike moment (ZM), 336

You might also like