100% found this document useful (2 votes)
857 views169 pages

Springer

The document provides an introduction to the history and basic definitions of artificial intelligence (AI). It discusses how the foundations of AI were established in the 20th century, with early pioneers exploring the possibility of creating machines that could think like humans. The definitions of key AI terms are also presented, such as artificial intelligence, machine learning, deep learning, and neural networks. The document aims to familiarize healthcare professionals with the basic concepts and terminology associated with AI.

Uploaded by

Viola Gashi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
857 views169 pages

Springer

The document provides an introduction to the history and basic definitions of artificial intelligence (AI). It discusses how the foundations of AI were established in the 20th century, with early pioneers exploring the possibility of creating machines that could think like humans. The definitions of key AI terms are also presented, such as artificial intelligence, machine learning, deep learning, and neural networks. The document aims to familiarize healthcare professionals with the basic concepts and terminology associated with AI.

Uploaded by

Viola Gashi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 169

Imaging Informatics for Healthcare Professionals

Michail E. Klontzas · Salvatore Claudio Fanni ·


Emanuele Neri Editors

Introduction
to Artificial
Intelligence
Imaging Informatics for
Healthcare Professionals

Series Editors
Peter M. A. van Ooijen, University Medical Center Groningen,
University of Groningen, GRONINGEN, Groningen,
The Netherlands
Erik R. Ranschaert, Department of Radiology, ETZ Hospital,
Tilburg, The Netherlands
Annalisa Trianni, Department of Medical Physics, ASUIUD,
UDINE, Udine, Italy
Michail E. Klontzas, University Hospital of Heraklion, Heraklion,
Greece
Institute of Computer Science, Foundation for Research and
Technology (FORTH), Heraklion, Greece
The series Imaging Informatics for Healthcare Professionals is the
ideal starting point for physicians and residents and students in
radiology and nuclear medicine who wish to learn the basics in
different areas of medical imaging informatics. Each volume is
a short pocket-sized book that is designed for easy learning and
reference.
The scope of the series is based on the Medical Imaging
Informatics subsections of the European Society of Radiology
(ESR) European Training Curriculum, as proposed by ESR and
the European Society of Medical Imaging Informatics (EuSoMII).
The series, which is endorsed by EuSoMII, will cover the curric-
ula for Undergraduate Radiological Education and for the level
I and II training programmes. The curriculum for the level III
training programme will be covered at a later date. It will offer
frequent updates as and when new topics arise.
Michail E. Klontzas •
Salvatore Claudio Fanni •
Emanuele Neri
Editors

Introduction to
Artificial Intelligence
Editors
Michail E. Klontzas Salvatore Claudio Fanni
University Hospital of Heraklion Academic Radiology, Department
Heraklion, Greece of Translational Research
University of Pisa
Institute of Computer Science,
Pisa, Pisa, Italy
Foundation for Research and
Technology (FORTH)
Heraklion, Greece

Emanuele Neri
Academic Radiology, Department
of Translational Research
University of Pisa
Pisa, Italy

ISSN 2662-1541 ISSN 2662-155X (electronic)


Imaging Informatics for Healthcare Professionals
ISBN 978-3-031-25927-2 ISBN 978-3-031-25928-9 (eBook)
https://doi.org/10.1007/978-3-031-25928-9

© EuSoMII 2023
This work is subject to copyright. All rights are solely and exclusively licensed by
the Publisher, whether the whole or part of the material is concerned, specifically
the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by
similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks,
etc. in this publication does not imply, even in the absence of a specific statement,
that such names are exempt from the relevant protective laws and regulations and
therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice
and information in this book are believed to be true and accurate at the date of
publication. Neither the publisher nor the authors or the editors give a warranty,
expressed or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature


Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


Preface

Artificial intelligence (AI) is rapidly infiltrating the scientific


world while steadily demonstrating important real-life applica-
tions. The increasing number of publications in the field and
the numerous commercial applications of AI algorithms available
not only to computer scientists but to experts in numerous fields
necessitate deep understanding of basic AI principles. An increas-
ing number of professionals in disciplines other than mathematics
and computer scientists encounter terminology related to basic AI
principles on a daily basis. Even though AI principles are slowly
being introduced to medical school and residency curricula, there
is a great need for basic education on the foundations of this
exciting field.
This book aims to provide physicians and scientists the basics
of artificial intelligence with special focus on medical imaging.
The book provides an introduction to the main topics of artificial
intelligence currently applied on medical image analysis. Start-
ing with a chapter explaining the basic terms used in artificial
intelligence for novice readers, the book embarks on a series
of chapters each one of which provides the basics on one AI-
related topic. The second chapter utilizes a radiomics paradigm to
practically demonstrate how programming languages and avail-
able automated tools can be used for the development of machine
learning models. The third chapter endeavours to analyse the main
traditional machine learning techniques, explaining algorithms
such as random forests, support vector machines as well as basic
neural networks. The applications of those algorithms on the

v
vi Preface

analysis of radiomics data are expanded in the fourth chapter.


Chapter 5 provides the basics of natural language processing
which has revolutionized the analysis of complex radiological
reports, and Chap. 6 affords a succinct introduction to convolu-
tional neural networks which have revolutionized medical image
analysis. The penultimate chapter provides an introduction to data
preparation for use in the aforementioned artificial intelligence
applications. The book concludes with a chapter demonstrating
the main landscape of current AI applications while providing an
insight about the foreseeable future.
Ultimately, we sought to provide a succinct textbook that can
offer all basic knowledge on AI required for professionals dealing
with medical images. This volume comes as the third addition
to the “Imaging Informatics for Healthcare Professionals” book
series endorsed by EuSoMII aiming to become the basic resource
of information for healthcare professionals dealing with AI appli-
cations.

Heraklion, Greece Michail E. Klontzas


Pisa, Italy Salvatore Claudio Fanni
Pisa, Italy Emanuele Neri
Contents

1 What Is Artificial Intelligence: History


and Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Emmanouil Koltsakis, Michail E. Klontzas,
and Apostolos H. Karantanas
2 Using Commercial and Open-Source Tools for
Artificial Intelligence: A Case Demonstration
on a Complete Radiomics Pipeline . . . . . . . . . . . . . . . . . . . . . 14
Elisavet Stamoulou, Constantinos Spanakis,
Katerina Nikiforaki, Apostolos H. Karantanas,
Nikos Tsiknakis, Alexios Matikas, Theodoros
Foukakis, and Georgios C. Manikis
3 Introduction to Machine Learning in Medicine . . . . . . 39
Rossana Buongiorno, Claudia Caudai, Sara
Colantonio, and Danila Germanese
4 Machine Learning Methods for Radiomics
Analysis: Algorithms Made Easy . . . . . . . . . . . . . . . . . . . . . . 69
Michail E. Klontzas and Renato Cuocolo
5 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Salvatore Claudio Fanni, Maria Febi,
Gayane Aghakhanyan, and Emanuele Neri
6 Deep Learning Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Eleftherios Trivizakis and Kostas Marias

vii
viii Contents

7 Data Preparation for AI Analysis . . . . . . . . . . . . . . . . . . . . . . 133


Andrea Barucci, Stefano Diciotti, Marco Giannelli,
and Chiara Marzi
8 Current Applications of AI in Medical Imaging. . . . . . 151
Gianfranco Di Salle, Salvatore Claudio Fanni,
Gayane Aghakhanyan, and Emanuele Neri
What Is Artificial Intelligence:
History and Basic Definitions 1
Emmanouil Koltsakis, Michail E. Klontzas,
and Apostolos H. Karantanas

1.1 Twentieth Century: Setting the Foundations


of Artificial Intelligence
1.1.1 Artificial Intelligence

One would expect that the answer to a theoretically simple


question such as “What is artificial intelligence?” would be

E. Koltsakis ()
Department of Radiology, Karolinska University Hospital, Stockholm,
Sweden
M. E. Klontzas
University Hospital of Heraklion, Heraklion, Greece
Institute of Computer Science, Foundation for Research and Technology
(FORTH), Heraklion, Greece
A. H. Karantanas
Department of Medical Imaging, University Hospital of Heraklion,
Heraklion, Crete, Greece
Department of Radiology, School of Medicine, University of Crete,
Heraklion, Crete, Greece
Advanced Hybrid Imaging Systems, Institute of Computer Science,
Foundation for Research and Technology (FORTH), Heraklion, Crete,
Greece

© The Author(s), under exclusive license to Springer Nature 1


Switzerland AG 2023
M. E. Klontzas et al. (eds.), Introduction to Artificial
Intelligence, Imaging Informatics for Healthcare
Professionals, https://doi.org/10.1007/978-3-031-25928-9_1
2 E. Koltsakis et al.

correspondingly simple. However, the greatest difficulty is hidden


in simplicity. The purpose of this chapter is to simplify and
define the basic principles that are comprehended in the sphere of
artificial intelligence while delving into it and guiding the reader
through its history.
Among his great contributions to science, the mathematician
Alan Turing was the first one, back in 1950, to ask whether
machines can think and proposed the famous Turing test or
imitation game. According to this, the three participants would
be two humans (an interrogator and a contestant) and a machine.
While the interrogator asks single blinded questions to the other
two, he/she would have to understand which answers come from
the contestant and which from the machine [1]. Up until the day
that this chapter is written, no machine has ever successfully
passed the Turing test.
At the same time Alan Turing’s fellow student Christopher
Stracey developed a game that played checkers [2], while Dietrich
Prinz, who learned programming from Alan Turing’s seminars,
developed a program that played chess.
In 1956 the Dartmouth Summer Research Project on Artifi-
cial Intelligence took place, in which Professor John McCarthy
introduced the term Artificial Intelligence (AI). The aim of the
Dartmouth Workshop was that over a 2-month period, a group
of 10 people would have a machine adopt a characteristic of
intelligence, such as using language, forming abstractions, or
self-improvement. The proposal of this workshop stated that “An
attempt will be made to find how to make machines use language,
form abstractions and concepts, solve kinds of problems now
reserved for humans, and improve themselves”. The workshop
was not successful but the term AI came to life.

Artificial Intelligence: is the science and engineering of


inventing machines or computer systems with feature imi-
tating humanlike abilities or behaviors such as visual and
speech interpretation, problem-solving, self-teaching.
1 What Is Artificial Intelligence: History and Basic Definitions 3

Three years later, in 1959, the first AI laboratory was estab-


lished in MIT. It was in that very laboratory that ELIZA, the
first chatbot, was created by Joseph Weizenbaum. ELIZA worked
in a simple manner. An input was analyzed and inspected for
keywords, then according to a rule which was associated with the
keyword an output was created [3]. ELIZA and the Georgetown
experiment in which 60 sentences were automatically translated
from Russian to English were the first steps toward Natural
Language Processing (NLP) and more specifically Symbolic
NLP.

Natural Language Processing: is a subfield of AI with focus


on understanding human language.

1.1.2 Machine Learning

While NLP was starting to form, Artur Samuel published in 1959


the first paper that introduced the term Machine Learning (ML)
[4]. In this publication, two ML procedures for the game of
checkers were proposed [4].
While the two procedures will not be further discussed in this
book, it is important to mention that there are four types of ML.
Supervised learning, unsupervised learning, semi-supervised, and
reinforcement learning.

Machine Learning: is a subfield of AI in which computer


systems are able to understand and learn patterns in order
to solve problems without external reprogramming.

In supervised learning the input and output are provided


with annotation (labels). Thus, one feeds the algorithm with
information that helps the machine to learn. An example of
4 E. Koltsakis et al.

supervised learning is an algorithm for analyzing chest X-rays


for pneumothorax, where during training the algorithm should be
fed with manually annotated images on which a pneumothorax is
marked.
However, in unsupervised learning the user does not assist
the machine in the learning process. The machine finds patterns
in the unlabeled input and classifies the results creating clusters
depending on their differences. For example, it will divide X-
rays into two groups depending on the presence or absence of
pneumonia without manually providing labels. The algorithm
would be fed with a large number of chest X-rays allowing the
algorithm to cluster the photos in two groups (with or without
pneumonia).
Combining the first two processes produces the semi-
supervised process, where unlabeled data is provided in
combination with labeled data. This process helps to overcome
the limitations of supervised or unsupervised learning, whereas it
increases the accuracy and performance of the machine learning
model.
The fourth learning process is reinforcement learning. As one
may guess it follows Pavlov’s dog’s principle. The computer pro-
gram performs actions and either a positive or negative feedback
is provided. Through this reinforcement process the computer
tries to maximize the positive feedback. It is worth mentioning
that reinforcement learning was described for the first time in
1961 by Donald Michie who used it in a machine that plays tic-
tac-toe [5].
In 1975 Edward Shortlife published his paper for MYCIN, the
first artificial intelligence program with application in medicine.
MYCIN identified bacteria causing infection and recommended
antibiotics—why the name MYCIN—with an adjusted dose
according to the patient’s weight [6]. MYCIN was never used in
clinical practice.
The “informatica” symposium took place in Slovenia in 1976
and a paper of S. Bozinovski and A. Fulgosi was published in the
proceedings. The paper was about transfer learning (TL), which is
applied in ML. Likewise, Suzanna Becker and Geoffrey E. Hinton
described how output of modules that have already undergone a
1 What Is Artificial Intelligence: History and Basic Definitions 5

learning process can be used as input for more complex modules,


making the learning process of large-scaled networks faster [7].

Transfer Learning: is a learning type in which knowledge


gained while processing a dataset is stored and then transfer
for the process of a different dataset.

Transfer Learning is a learning type where knowledge gained


during the processing of one dataset is stored and then transferred
for the processing of another dataset. Transfer learning is cur-
rently very popular in deep learning because it can train deep
neural networks with comparatively little data. This is very useful
in the data science field since most real-world problems typically
do not have millions of labeled data points to train such complex
models. This method also has a significant potential in medicine,
as we will see later.

1.1.2.1 Neural Networks


Moving forward to 1979, a group of students from Stanford
University invented a cart that had the ability to navigate and
avoid obstacles. Concurrently, Kunihiko Fukushima published an
article on the neocognitron, which is a type of artificial neural
network (ANN) and the inspiration for convolutional neural net-
works (CNN).

Artificial Neuron: is a mathematical function which


receives one or multiple inputs, sums them, and produces
an output. The basic principle in which an artificial
neuron functions is the same as in a biological neuron.
Artificial Neural Network: is a network of artificial neu-
rons which communicate with each other through edges
(equivalent to the synapses of the biological neurons).

(continued)
6 E. Koltsakis et al.

The edges and the neurons are weighted and the weights
can be adjusted through the learning process of the
machine. As long as the output is not the desired and
there is a difference (error) the machine will adjust the
weight of the neuron in order to reduce that error. The
neurons are arranged in layers. The first layer corre-
sponds the input layer and the last to the output.

The neocognitron is a multilayered artificial neural network


consisting of multiple types of cells, two of which are the
simple cells—S-cells—and the complex cells—C-cells—similar
to the visual nervous system model as introduced by Hubel
and Wiesel [8, 9]. Stimulus patterns are recognized based on
geometric similarity and each specific pattern is processed only by
specific C-cells, which corresponds to the way the visual cortex
works.
This is how deep neural networks and deep learning came to
life.

Deep Neural Network: is an ANN with multiple layers of


neurons between the input and the output layer. There
are different subtypes of DNNs and various types of
layers.
Deep Learning: is a subset of machine learning that is using
DNNs and works in a way similar to that of a human
brain. Like in machine learning, there are various deep
learning algorithms.

Connection of neuron layers of a deep neural network resem-


bles the way neurons of the human brain are connected to each
other. While the chess champion Garry Kasparov was defeated
1 What Is Artificial Intelligence: History and Basic Definitions 7

by the Deep Blue computer system which run in IBM chess-


computer in 1997, Yann LeCun published the first paper for CNNs
inspired by the neocognitron [10].

Convolutional Neural Network: is a subtype of DNN which


is working using mathematical convolution in at least one
of their layers. CNNs are used in computer vision and pixel
pattern recognition.

The applicability of CNNs in computer vision render them


ideal for use on radiological applications where images have to
be analyzed.

1.2 The Period 2000–2020

As the research usage of AI, ML, DL, and CNN became greater
and greater, new milestones were added to the timeline. In 2002,
Torch, the first machine learning library that provided algo-
rithms for DNNs, was created and released by Ronan Collobert,
Samy Bengio, and Johnny Mariéthoz. As the name implies,
in a machine learning library, one may find common learning
algorithms that are available for the public. Depending on the
purpose of the program or the programming language, different
libraries are more applicable than others, similar to traditional
libraries. In the following years, important changes escalated
quickly.
In 2005, a Stanford robot won the DARPA Grand Challenge by
driving autonomously for 131 miles along an unrehearsed desert
trail. With raw data from GPS, camera, and 3D mapping com-
posed with LIDARS, the car controlled the speed and direction in
order to avoid obstacles.
In 2006, the term “Machine Reading” was used by Oren
Etzioni, Michele Banko, Michael J. Cafarella to describe the
8 E. Koltsakis et al.

automatic, unsupervised interpretation of text the same year, the


data scientist Fei-Fei Li set up the ImageNet, a database that
now contains more than 14 million annotated images available
for computer vision and different applications such as object
recognition or localization or image classification. A couple
of years later, the ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) began, in which different algorithms for
object detection and image classification at large scale were
evaluated.
When Microsoft launched the Kinect for the Xbox 360 in
2010, AI began to become available for everyday activities such
as video games. Kinect was the first gaming device that tracked
human body movement using a 3D camera and infrared detection.
Gradually, VR gaming came to life when in 2011 IBM’s Watson
computer beat TV game show “Jeopardy!” champions Rutter
and Jennings. Importantly, not only did a computer beat the two
champions, but it was also the first time people witnessed AI using
NLP. Then Apple’s Siri (2011), Google’s Google Now (2012),
Microsoft’s Cortana (2014), and smartphone apps that used nat-
ural language to answer questions, make recommendations, and
perform actions became part of our daily lives. In 2013 FDA
approved marketing of the AI-based device IDx-DR to detect
diabetic retinopathy, to assist clinicians who may not routinely
be involved in eye care.
In 2017 Arterys became the first AI company to receive FDA
clearance to use cloud-based DL in a clinical setting. More and
more AI models with clinical applicability became available
for direct use on radiological images in picture archiving and
communication systems (PACS) or in electronic health records
(EHRs), while others were used for automatic recognition of
melanomas in dermoscopy or analysis of the retina in retinal
optical coherence tomography (OCT). However, AI did not enter
into just these medical domains. AI has the potential to make
a substantial or promising contribution to multiple specialties,
ranging from pathology, cytology, and radiology to oncology,
gastroenterology, or anesthesiology, and in any specialty in which
NLP, computer vision, biometrics, or decision-making can be
1 What Is Artificial Intelligence: History and Basic Definitions 9

applied [11]. As AI increasingly integrates into clinical practice,


it is logical that questions are raised from the ethical perspective
around its usability. A quick search in PubMed reveals that an
increasing number of studies on the topic have been published
since 2019, and a joint European and North American multi-
society statement on AI in radiology was published in the same
year [12]. By this point there is no uncertainty regarding AI
soft- and hardware in medicine. Guides and books have been
published targeting medical professionals. AI is expanding and
pushing the boundaries of imagination. Entrepreneurial inven-
tions are proliferating. AI fuses with science, art, literature,
production, machinery, people’s safety and anything one can
imagine.
In January 2021, the DALL-E 1 program started generating
digital images with natural language descriptions as input. The
same year the AphaFold project released predicted protein struc-
tures for about 1 million proteins. In it, almost all human proteins
were included. A year later, more than 200 million proteins
structures were predicted. DALL-E 2 entered the beta phase and
the ChatGPT was launched, providing its users with detailed and
articulated answers.
Before continuing to the next chapter, the authors encourage
you to think and identify how many times and in which ways you
have interacted with AI since this morning. The next steps are
ongoing research into AI and inventing new software until one
finally arrives at models that successfully pass the Turing test. At
that point, we can speak of human-level intelligence, also called
“general AI.” However, it is not yet clear how long it will take to
reach that level of AI, and certainly when AI will exceed human
intelligence.
A brief overview of all major milestones in AI history is
presented in Fig. 1.1.
10 E. Koltsakis et al.

Fig. 1.1 Major milestones in the history of artificial intelligence (created


with biorender.com)
1 What Is Artificial Intelligence: History and Basic Definitions 11

References
1. Turing AM. Computing machinery and intelligence. Mind.
1950;LIX:433–60.
2. Strachey CS. Logical or non-mathematical programmes, ACM ’52, 1952.
3. Weizenbaum J. ELIZA-A computer program for the study of natural
language communication between man and machine. Commun ACM.
1966;9:36–45.
4. Samuel AL. Some studies in machine learning. IBM J Res Dev.
1959;3:210–29.
5. Michie D. Experiments on the mechanization of game-learning. Part I.
Characterization of the model and its parameters. Comput J. 1963;6:232–
6.
6. Shortlife EH, Buchanan BG. A model of inexact reasoning in medicine.
Math Biosci. 1975;23:351–79.
7. Becker S, Hinton GE. Self-organizing neural network that discovers
surfaces in random-dot stereograms. Nature. 1992;355:161–3.
8. Fukushima K. Neocognitron: a self-organizing neural network model for
a mechanism of pattern recognition unaffected by shift in position. Biol
Cybern. 1980;36:193–202.
9. Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s
striate cortex. J Physiol. 1959;148:574–91.
10. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied
to document recognition. Proc IEEE. 1998;86:2278–324.
11. Hinton G. Deep learning-a technology with the potential to transform
health care. JAMA. 2018;320:1101–2.
12. Raymond GJ. Ethics of artificial intelligence in radiology: summary
of the joint European and North American Multisociety Statement.
Radiology. 2019;293:436–40.
Using Commercial and Open-Source
Tools for Artificial Intelligence: A 2
Case Demonstration on a Complete
Radiomics Pipeline

Elisavet Stamoulou, Constantinos Spanakis,


Katerina Nikiforaki, Apostolos H. Karantanas,
Nikos Tsiknakis, Alexios Matikas,
Theodoros Foukakis, and Georgios C. Manikis

Authors Elisavet Stamoulou and Constantinos Spanakis have equally


contributed

E. Stamoulou · C. Spanakis · K. Nikiforaki


Computational BioMedicine Laboratory, Institute of Computer Science,
Foundation for Research and Technology (FORTH), Heraklion, Greece
A. H. Karantanas
Computational BioMedicine Laboratory, Institute of Computer Science,
Foundation for Research and Technology (FORTH), Heraklion, Greece
Department of Medical Imaging, University Hospital, Heraklion, Greece
Department of Radiology, School of Medicine, University of Crete, Voutes
Campus, Heraklion, Greece
N. Tsiknakis
Computational BioMedicine Laboratory, Institute of Computer Science,
Foundation for Research and Technology (FORTH), Heraklion, Greece
Department of Oncology-Pathology, Karolinska Institutet, Stockholm,
Sweden

© The Author(s), under exclusive license to Springer Nature 13


Switzerland AG 2023
M. E. Klontzas et al. (eds.), Introduction to Artificial
Intelligence, Imaging Informatics for Healthcare
Professionals, https://doi.org/10.1007/978-3-031-25928-9_2
14 E. Stamoulou et al.

2.1 Introduction

Radiology images can be considered the cornerstone of health-


care, as they affect the complete pathway from diagnosis and
optimal treatment selection to evaluation of treatment response.
However, imaging modalities including but not limited to tomo-
graphic techniques, i.e., magnetic resonance imaging (MRI) or
computed tomography (CT), are often not used in their raw form.
The reason is that image characteristics vital for the physicians
usually lie beyond human perception not only because of their
subtle manifestation but also because they can be masked by
a number of image degradation factors, i.e., artifacts. Recent
advances in artificial intelligent (AI) and medical image analysis
have drastically upgraded the role of radiology images in the clini-
cal routine, facilitating their digital decoding into high-throughput
quantitative features instead of their use as a subjective qualitative
assessment of the examined tissue. Particularly, the emerging field
of radiomics in conjunction with machine learning (ML) model-
ing has enabled the conversion of routine radiology images into
high-throughput quantitative data to describe non-intuitive prop-
erties of the imaging phenotype and tissue micro-environment [1].
In general, a radiomics analysis workflow comprises four main
steps: image segmentation, image pre-processing, radiomics
extraction, and ML modeling (an indicative analysis workflow
can be found in [2] to predict isocitrate dehydrogenase (IDH)

A. Matikas · T. Foukakis
Department of Oncology-Pathology, Karolinska Institutet, Stockholm,
Sweden
G. C. Manikis ()
Computational BioMedicine Laboratory, Institute of Computer Science,
Foundation for Research and Technology (FORTH), Heraklion, Greece
Department of Oncology-Pathology, Karolinska Institutet, Stockholm,
Sweden
2 Commercial and Open-Source Tools for AI 15

mutations in gliomas). Image segmentation is an essential step


of the analysis workflow that seeks to define a specific region of
interest (ROI) within the image from which radiomics features
are extracted. Image pre-processing aims to improve the quality
of images by removing or alleviating noise and artifacts which
could result in more accurate ML model predictions. Next, the
radiomics extraction can be achieved either using image analysis
techniques to calculate high-dimensional handcrafted features or
using an end-to-end deep learning (DL) architecture to extract
“deep features” [3]. In the end, these features are used to build
radiomics-based ML models for clinical outcome predictions.
Recent efforts aim to address technical details in radiomics
development [4–10], and however these are mostly addressed to
data scientists with a background in AI modeling. Consequently,
for a wide range of doctors and medical physicists, having little to
no experience with radiomics analysis is difficult to go beyond the
theoretical framework and exploit its full potential for the benefit
of the medical community and the society. To this direction, we
believe that studies raising interest and actively engaging these
professionals to the field of radiomics will increase people’s
attention to deliver more drastically AI in the clinical routine
and accelerate productivity and precision in healthcare delivery.
This is the main scope of this chapter toward presenting the basic
steps and the available tools and plugins required to develop a
radiomics analysis pipeline under the criteria of requiring only
a high-level friction with AI and minimal or no programming
skills. In addition, a “from theory to practice” section summarizes
a practical guide on how to perform radiomics by illustrating each
of the analysis steps and the appropriate tools/plugins.

2.2 Image Segmentation

Image segmentation is the process of partitioning an image to


multiple image components usually with reference to anatomical
structures, known as image segments or image regions. It is
required in many scientific or clinical projects and can be con-
ducted either manually, semi-automatically, or fully automatically
16 E. Stamoulou et al.

to delineate a 2D ROI or a three-dimensional (3D) volume of


interest (VOI). Manual and semi-automatic methods have the
drawback that they are tedious, time-consuming, need a starting
point from which they can begin the segmentation process, and
interactions with the user (even if it is at the beginning of
the process) can add bias to the segmentation process. Another
major issue is the intra- and inter-observer variability in the
segmentations that highly affect the generalization performance
of the AI models [11].
The rapid development of AI in the last decade has made great
advances for the improvement of image segmentation in terms
of its precision, accuracy, and repeatability. Especially, the use
of fully automated AI techniques leveraging DL has recently
become a sine qua non-condition in the field, demonstrating
state-of-the-art results when segmenting a variety of anatomical
structures from different image modalities. These techniques not
only show better results compared to traditional image analysis
methods for segmentation but also are fast and robust especially
when developed using large-scale datasets [12]. Current DL-
based segmentation approaches include the Deep Grow module
[13] and MedSeg [14], an online image segmentation tool
written in JavaScript and WebGL that provides seamless and
lightning-fast performance. FiJi [15] employs a WEKA plugin
[16] that combines a collection of AI and image analysis tools to
support fully integrated segmentation analysis workflows. Several
other techniques have been implemented as plugins in 3DSlicer
such as the MONAI [17], NVIDIA-AIAA (https://github.com/
NVIDIA/ai-assisted-annotation-client/tree/master/slicer-plugin),
and TotalSegmentator [18]. Another group of AI tools and plugins
can be found in the literature for brain (Slicer-DeepSeg [19]),
lung (DensityLungSegmentation [20]), and liver parenchyma
and vessels segmentation (SlicerRVXLiverSegmentation (https://
github.com/R-Vessel-X/SlicerRVXLiverSegmentation)). Other
DL image segmentation tools are the Autoradiomics [21],
Avizo [22], Qupath [23, 24], RECOMIA [25], nucleAIazer [26],
aimis3d [27], AutoRadiomics [21], MVision [28], MedViso
[29], ImFusion [30], and RSIP [31]. Although there is a
plethora of AI tools already in the literature that facilitate
2 Commercial and Open-Source Tools for AI 17

image segmentation under an efficient and user-friendly manner,


technical challenges frequently arise related to the number of
the annotated images required for model training, the inherent
physiological heterogeneity among patients, model overfitting,
and gradient vanishing [32].

2.3 Image Pre-processing

Image pre-processing, digital or geometric, focuses on the image


data quality improvement by removing noise, artifacts, and non-
uniformity in the signal intensities of an image. It is usually
conducted by following several discrete actions (e.g., image
reconstruction, registration, filtering, interpolation, bias field cor-
rection, intensity-based normalization, and discredization) and
has shown to have a considerable impact on the radiomics
analysis results since it comprises a series of techniques aiming
to transform and homogenize images from which reproducible
and reliable radiomics models can be produced. An in-depth
discussion of these steps is beyond the scope of this review,
and however the readers can review recent literature with a
comprehensive technical presentation in image pre-processing
[9, 11, 33].
Digital pre-processing is the process of the signal intensities
of an image, and image enhancement is one of its most fre-
quently used applications. Image enhancement refers to noise
reduction, artifact correction, and resolution improvements of
an image, focusing on highlighting features that are difficult
to be distinguished due to degraded or inherently poor image
contrast. AI has shown great success in this field through the
artificial neural networks [34]. Specifically, convolutional neural
networks (CNNs) based on super revolution (SR) have been
deployed to restore corrupted images when a low-quality and
noisy image (low resolution) is upsampled to match the size of a
high-resolution (HR) enhanced image [35]. Python programming
language [36] supports AI libraries for medical image resolution
improvement such as the MedSRGAN [37] and ANTsPvNet [38].
Regarding noise and artifacts reduction, denoising convolutional
18 E. Stamoulou et al.

neural network (DnCNN) [39] and intensity inhomogeneity cor-


rection network (InhomoNet) [40] have been successfully applied
to MRI studies. Geometric pre-processing, mostly performed
by image registration and reconstruction, is the process that
geometrically alters the view of the depicted anatomical areas
in an image. AI in image registration is used either to assess
the similarity of two or more images [41] or to search for their
best transformation (e.g., using evolutionary [42] and swarm
algorithms [43]). Novel approaches include learning-based image
registration methods [44] and generative adversarial networks
(GANs) [45]. Additionally, there are several AI tools for geo-
metric image pre-processing steps such as ImFusion [46] for
image registration and StudierFenster [47] for reconstruction. In
ImFusion [46], DL techniques are utilized to learn the features
used as descriptors for image registration. Image registration tools
are also embedded in 3DSlicer [48]. AI has also revolutionized
the field of image reconstruction, the agglutination of two or more
images to create two-dimensional (2D) high-resolution (named as
image stitching) or 3D images, mainly focusing on conventional
DL architectures [49] and novel approaches such as GANs [50].
Although several AI techniques exist for image pre-processing,
they should be used with caution since they can result to an
unrecoverable signal loss [7].
Indicatively, in a typical MRI-based radiomics analysis work-
flow where images suffer from spatial signal inhomogeneity
due to bias field corruption effect, it is a prerequisite to utilize
image analysis algorithms such as the N4 bias field correction
to minimize or correct for this low-frequency signal variation
[51]. In addition, there is an evident variability in studies where
radiomics is applied to heterogeneous data coming from different
acquisition protocols and/or vendors. Since this can hamper the
quality of the extracted radiomics features [52], image-based
harmonization is proposed as a solution to reduce this inherent
variability and bring images in a common analysis workspace
[11]. To this direction, a lot of effort has been put into developing
user-friendly open-source tools for image pre-processing. Among
others, this chapter highlights software such as the ImageJ [53],
MIPAV [54], MITK [55], 3DSlicer [48], and LIFEx [56]. It is
2 Commercial and Open-Source Tools for AI 19

worth mentioning that the readers with experience in software


development using Python can also refer to open-source libraries
including the SimpleITK [57] and PyRadiomics [58]. PyRa-
diomics provides several pre-processing options and supports a
wide variety of images formats. It can be used standalone or
using the 3DSlicer which has incorporated a plugin providing a
convenient front-end interface for PyRadiomics. MITK not only
provides image processing tasks but more importantly a stable,
well-tested, and extensible framework for radiomics analysis.
Also, it has a unified framework for different user applications
accessible either though GUI, a command line tool, Python, or
C.++, making it usable for users with experience in software
programming. An indicative image pre-processing example is
given in the “from theory to practice” section using 3DSlicer.

2.4 Radiomics Extraction

The emerging field of radiomics proposes a challenging image


analysis framework, involving a massive extraction of quantitative
features to reveal non-intuitive properties of the imaging pheno-
type and tissue micro-environment [59]. The analysis demands an
effective and efficient representation of the image content which
is performed either using several mathematical equations from the
image analysis domain to extract handcrafted features (descrip-
tors of shape, size and textural patterns) from a ROI or using
a complex DL architecture that uses non-linear transformations
of the image to extract a massive amount of “deep features”
without the need of any human intervention. For handcrafted
feature extraction, we can list, among the widely used approaches,
software tools that require little to no programming skills from
the user like MaZda [60], LIFEx [56], IBEX [61], StudierFenster
[47], RadiomiX ToolBox [62], AutoRadiomics [21], and 3DSlicer
[48, 63]. On the other hand, software platforms that utilize AI to
calculate “deep features” directly from images include the deep
learning studio [64] and Nvidia’s Digits [65].
While there is a wide range of AI applications and image
analysis techniques that can be incorporated in the radiomics
20 E. Stamoulou et al.

extraction phase, the recent literature raise concerns about sources


of variation (e.g., inter-patient and inter-scanner factors) that
can significantly hamper radiomics features stability [11]. These
variation effects are more evident in multicenter studies, hav-
ing a significant impact on radiomics analysis repeatability and
robustness [66]. To overcome this issue, efforts from the image
biomarker standardization initiative (IBSI) have focused on the
standardization of the image pre-processing in order to improve
reproducible radiomic feature calculations [67]. It is of note to
mention that the aforementioned RadiomiX ToolBox, AutoRa-
diomics, 3DSlicer, and LIFEx are software tools that can extract
fully IBSI compliant radiomics features. Particularly, the first
three integrate PyRadiomics that follows the IBSI guidelines,
whereas LIFEx is an IBSI compliant freeware tool with its own
user interface [58]. Additional effort involves the development
of data-driven techniques that operate directly on the calculated
features (i.e., feature-based) to compensate for variability effects.
These include ComBatTool [68], a popular standalone web appli-
cation that focuses to adjust the feature values into a common
space by estimating their differences using an empirical Bayes
framework and Neuroharmony [69], an AI tool that enables
radiomics feature harmonization without having knowledge of the
image acquisition parameters. An illustrative representation of the
radiomics extraction using 3DSlicer is given in the “from theory
to practice” section.

2.5 Radiomics Modeling

The use of AI in radiomics analysis is gaining momentum since


it shows great promise in transforming the field of medical
imaging and bringing medicine from the era of “sickcare” to
the era of healthcare and prevention. This potential is promi-
nent in AI algorithms that once they are trained to learn from
existing imaging data they can perform a statistical inference
to make accurate predictions on new “unseen” data [70, 71].
This learning process facilitates the development of a variety of
traditional machine learning models such as the naive Bayes [72],
2 Commercial and Open-Source Tools for AI 21

logistic regression [73], support vector machines [74], decision


trees [75], and random forests [76]. Different from the tradi-
tional ML architecture, deep learning introduces a subfield of
ML that operates on large-scale neuronal architectures in which
higher level features are obtained by applying multiple non-
linear transformations to the input images [77]. The most popular
DL models are Autoencoders, Convolutional Neural Networks
(CNNs), Recurrent Neural Networks (RNNs), and Generative
Adversarial Networks (GANs) [77].
In general, an analysis pipeline comprising model training,
hyperparameter optimization, selection, and validation is tech-
nically demanding and can be properly performed by a data
scientist with AI skills. A thorough review of the basics in ML
and DL is far beyond the scope of this chapter, and however
several practical guides and tutorials are available online [78–80].
Automated Machine Learning (AutoML) was recently introduced
to address technical challenges by proposing off-the-shelf AI
implementations through a user-friendly interface [81]. A com-
prehensive review of the functionalities and benefits working with
AutoML can be found in a recent survey [82]. To this direction,
users can benefit from cloud-based AutoML platforms such as
Google AutoML [82] and Microsoft AutoML [83] to train and
validate high-quality models. Recently, a popular commercial-
based AutoML platform was published (RapidMiner [84]), incor-
porating a comprehensive set of tools for every step of the
analysis pipeline. Another commercial platform from AutoML
is JADbio [85], which includes predictive and diagnostic clinical
models for the analysis of both low- and high-dimensional data.
AutoRadiomics [21] and RadiomiX [62] have been designed to
meet the needs of ML-based radiomics. KNIME [86] and BigML
[87] are user-friendly AI tools that perform basic data analytics
and assist users with interactive data visualization to build ML
models without coding experience. In the literature, radiologists
recommend the use of WEKA [88, 89] and Orange [90] con-
sidering the simplicity and ease they provide to non-experts to
conduct their own experiments [10]. Table 2.1 summarizes state-
of-the-art platforms that are mainly dedicated to end users with
little to no programming skills. For the readers even having low-
22 E. Stamoulou et al.

level programming skills, Python provides a significant list of


AI libraries such as Google’s TensorFlow [91], Auto-Sklearn
[92], AUTOkeras [93], and MLbox [94]. An indicative radiomics
analysis is presented in the “from theory to practice” section using
RapidMiner.

2.6 From Theory to Practice

It is evident from Table 2.1 that no single software tool can serve
as a one-stop-shop solution for the design of a complete radiomics
analysis workflow since each one has its own design style,
customization, and functionality. To the best of our knowledge,
several AI models can be deployed and executed within the
3DSlicer ecosystem (e.g., MONAI), and however this integration
demands Python software skills to enable compatibility between
the components. The user lacking programming skills can only
partially exploit the capabilities of each software. It might be
discouraging to create a seamless AI workflow where the output
of each action is fed as input to a different software platform, not
only because a general overview of the available tools is necessary
but also because ensuring compatibility is a non-trivial task. A
possible course of actions to structure a complete ML-based
radiomics project in practice is presented in the following para-
graph, shedding light on the interplay between different actions
and how they can be combined to compose the full pathway from
the clinical question to the AI derived answer. To this direction,
our demonstration on how to perform a radiomics workflow starts
with an MRI region of a cancer patient to predict diagnosis
and assess the evolution of cancer and its mortality. Since our
proposed pipeline is designed to be used by doctors or medical
physicists with little or no programming skills, we recommend
the 3DSlicer as a user-friendly tool which provides the flexibility
on using the PyRadiomics benefits for both image pre-processing
and radiomics feature extraction steps. As for the radiomics
harmonization process, ComBaTool seems to be the ideal tool as it
needs no programming skills and can be used online by selecting
each parameter manually. Then, in order to develop the ML-based
2 Commercial and Open-Source Tools for AI 23

Table 2.1 Tools and plugins for radiomics analysis


Image process Tool Coding Commercial Platform
Image AutoRadiomics Python No Any
segmentation [21]
Avizo [22] No Yes Any
MedViso [29] Matlab Both Windows
3DSlicer No No Any
[19, 48, 63]
FiJi [15] JavaScript No Any
Qupath [23, 24] Python No Any
RECOMIA [25] No Yes Online
MVision [28] No Yes Windows
nucleAIazer [26] Python No Windows,
Linux
MedSeg [14] No No Online
RSIP [31] No No Online
aimis3d [27] Python No Windows
ImFusion [30] No Yes Windows
Image ImageJ [53] No No macOS, Linux,
pre-processing Windows
MIPAV [54] No No Any
3DSlicer [48] No No Any
and
PyRadiomics
[58]
LIFEx [56] No No Any
StudierFenster No No Online
[47]
ImFusion [46] No Yes Windows
MITK [55] No No Windows,
Linux
Radiomics 3DSlicer [48] No No Any
extraction and
PyRadiomics
[58]
RadiomiX [62] Matlab Yes macOS,
and Windows
PyRadiomics
[58]
(continued)
24 E. Stamoulou et al.

Table 2.1 (continued)


Image process Tool Coding Commercial Platform
AutoRadiomics Python No Any
[21] and
PyRadiomics
[58]
MaZda [60] No No Windows
IBEX [61] No No Windows
LIFEx [56] No No Any
StudierFenster No No Online
[47]
Nvidia’s Digits No No Any, Cloud
[65]
Deep Learning No No Windows,
Studio [64] Linux, Cloud
Radiomics NeuroHarmony Python No Any
[69]
Harmonization
ComBaTool [68] No Online Any
Modeling AutoRadiomics Python No Any
[21]
RadiomiX [62] Matlab No macOS,
Windows
RapidMiner [95] No Yes Any
JADBio [85] No Yes Online
WEKA [88, 89] No No Windows, Linux
Orange [90] No No macOS, Linux,
Windows
KNIME [86] No No macOS, Linux,
Windows
BigML [87] No No Online
Google [82] No Yes Online
Microsoft [83] No Yes Online
2 Commercial and Open-Source Tools for AI 25

radiomics model, we propose the RapidMiner, one of the most


popular data science AutoML tools. It is preferred because of its
ability to provide a unique user-friendly visual workflow helping
the user to build models with speed and automation.
The acquired raw images are imported in 3DSlicer in DICOM
format and information is retrieved by the DICOM headers
about the acquisition protocol, study date, etc. 3DSlicer supports
image viewing and a segment editor module which activates a
wide range of segmentation methods (Fig. 2.1a). Within 3DSlicer,
the user can create and edit slice-by-slice manual ROI seg-
mentations (e.g., paint, draw, etc.), semi-automated (e.g., using
thresholding, region growing etc.), and fully automated (e.g.,
MONAI plugin). Subsequently, the image pre-processing phase
using 3DSlicer and PyRadiomics (Fig. 2.1b) supports the N4 bias
field correction, filtering modules for noise and artifact reduc-
tion, and embedded tools for manual and automatic registration
and reconstruction. Further pre-processing (e.g., interpolation,
intensity-based normalization, and discredization) can be per-
formed at the feature extraction step (Fig. 2.1c), either by setting
up the required parameters from the user interface or by loading
automatically a parameter file (e.g., YAML or JSON struc-
tured text file) from PyRadiomics. Indicative parameter files can
be found in the PyRadiomics GitHub repository (https://github.
com/AIM-Harvard/pyradiomics/tree/master/examples). For more
detailed instructions and examples, extensively, documentations
are available in PyRadiomics (https://pyradiomics.readthedocs.
io/en/latest/) and 3DSlicer (https://slicer.readthedocs.io/en/latest/
index.html). As we have mentioned in the radiomics extraction
section, radiomics features need to be harmonized before model-
ing. This is implemented using a free online application of Com-
BaTool (https://forlhac.shinyapps.io/Shiny_ComBat/) simply by
uploading the radiomics values in tabular format (e.g., csv or txt
file) (Fig. 2.1d). Except from the radiomics values, the uploaded
file includes an extra column containing information about the
image protocol (e.g., extracted from the DICOM header). Density
plots and descriptive statistics of the radiomics values before
and after the harmonization are also available through the online
application (the details can be found in [96]). At the next step, the
26

B. Image pre-processing
with 3DSlicer & PyRadiomics

Bias Field Correction Filtering Interpolation


A. Image segmentation
with 3DSlicer

Intensity Normalization
Reconstruction / Registration
Discretization

E. Radiomics Modeling
D. Radiomics Harmonization
with RapidMiner C. Radiomics extraction
with ComBaTool
with 3DSlicer & PyRadiomics

radiomics features

deep features

Fig. 2.1 A proposed radiomics analysis pipeline using commercial tools and plugins
E. Stamoulou et al.
2 Commercial and Open-Source Tools for AI 27

harmonized radiomics features are downloaded in a csv file and


imported in RapidMiner (Fig. 2.1e). For the beginners, the Auto
Model is a recommended extension of RapidMiner for model
development. It consists of the following steps: (i) data selection,
(ii) task selection (e.g. classification or regression), (iii) target
preparing, (iv) inputs selection, and (v) model selection (including
automated validation and optimization of the selected models). In
the RapidMiner menu by selecting the Overview section, the user
can find the resulting performance metrics for each model, while
by selecting the ROC comparison section the AUC curves are
visualized. Furthermore, there is the “open process” tool in which
the user can make changes without needing to start the analysis
from scratch. Plenty of tutorials, examples, and instructions are
included on the online documentation of RapidMiner (https://
docs.rapidminer.com/).

2.7 Discussion

Despite the contribution of AI in medical image analysis, there are


still challenges that need to be addressed. Two main challenges
are (i) the reproducibility of radiomics features and (ii) the
explainability of AI models [97]. To the best of our knowledge,
there is no user-friendly platform that enables to access and
qualify these parameters automatically. As we have mentioned,
radiomics features suffer from variability making the feature
selection based on their stability an appropriate step for building
robust radiomics models [98]. In the literature, the stability has
been investigated by calculating the concordance correlation coef-
ficient (CCC) and intra-class correlation coefficient (ICC) [11]
after the radiomics extraction step. However, the discriminating
power of radiomics cannot be guaranteed, and therefore harmo-
nization methods have been proposed [99]. Another key problem
in using AI is the lack of model interpretation which makes the
users unable to understand how model predictions are generated.
Explainable AI (XAI) methods are needed to understand the
28 E. Stamoulou et al.

mechanism behind the models and how the selected radiomics


features are correlated with biological phenotypes [97]. Recently,
SHapley Additive exPlanations (SHAP) method has been pro-
posed as an explanatory technique [100] and has been suc-
cessfully incorporated into a multiparametric radiomics model
for the diagnosis of schizophrenia [33] and IDH mutations in
gliomas [2]. Efforts should be made to integrate XAI into medical
software tools in order to increase reliability and transparency in
AI predictions.
After a thorough and meticulous search on the armamentarium
of medical image processing tools, it is evident that there are
some common points and trends. First of all, the majority of the
aforementioned tools are not commercial which means that they
can be used without any engagement or cost. An experienced user
can witness a steady, albeit slow, transition from conventional
methods to more advanced and sophisticated ones influenced
by the current trends in AI. In addition, a significant number
of them can be installed on any platform, which can be useful
for users not familiar with Linux or MacOs. Since the results
of AI cannot be explained, it is very often described as black
box mechanism in healthcare [101]. In order to encourage an
increased use of AI in the clinical domain, the user needs to
approach the rationale bridging the input to the output by coming
in contact with some intermediate results or the strength of each
contributing factor for the final result. The latter is the object
of interest of XAI mechanisms research, which should grow in
parallel with any other branch of AI. Last but not least, the
lack of necessity for programming skills in a large number of
applications is contributing to the development of an extended
medical society that can produce, share, and discuss their AI
studies and thus molding with interdisciplinary knowledge and
feedback the relevant works. This in turn will produce results
tightly bounded to the clinical needs and will positively affect
both the clinical practice with new insights and also the technical
parties with real world scenarios and abundant data resources.
2 Commercial and Open-Source Tools for AI 29

2.8 Conclusion

There is a plethora of AI tools indicative of the vivid interest


of users in this area. Nevertheless, there is still a need for
further research and work that needs to be done in order to
increase their impact in radiology. First, the large number of
tools itself escalates the need for a method to unify or inte-
grate different medical-image-acquisition-oriented tools to form a
seamless workflow embracing different imaging modalities (i.e.,
the tool should be able to handle images of different image
acquisition techniques). The optimal option would be to unite
them into a single tool that can deal with many tomographic
techniques (MRI, CT, Ultrasound, etc.). Second, the transition
from conventional methods to AI-based needs to be supported
by additional improvements in AI algorithms regarding their
area of applicability. Although AI techniques tend to overcome
limitations of traditional techniques, they frequently focus on
particular body areas or clinical problems. Another thing that also
needs to be taken into account is that, despite the advancements
in AI, the usefulness of these methods is still affected by the
way a specific pipeline is constructed. Therefore, the construction
of an optimal AI analysis pipeline requires the complementary
knowledge of physicians, image acquisition experts, and AI
specialists. To conclude, the ultimate challenge is to develop and
construct an AI API that performs the aforementioned tasks on
all medical images, regardless of the image acquisition technique,
with high-quality results in a seamless user-friendly fashion.

Acknowledgments Georgios C. Manikis is a recipient of a postdoctoral


scholarship from the Wenner–Gren Foundations (www.swgc.org (accessed
on January 27, 2023)) (grant number F2022-0005).

References
1. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, De Jong EEC,
Van Timmeren J, Sanduleanu S, Larue RTHM, Even AJG, Jochems A,
et al. Radiomics: the bridge between medical imaging and personalized
medicine. Nat Rev Clin Oncol. 2017;14(12):749–62
30 E. Stamoulou et al.

2. Manikis GC, Ioannidis GS, Siakallis L, Nikiforaki K, Iv M, Vozlic


D, Surlan-Popovic K, Wintermark M, Bisdas S, Marias K. Multicenter
DSC–MRI-based radiomics predict IDH mutation in gliomas. Cancers.
2021;13(16):3965
3. Afshar P, Mohammadi A, Plataniotis KN, Oikonomou A, Benali H.
From handcrafted to deep-learning-based cancer radiomics: challenges
and opportunities. IEEE Sig Process Mag. 2019;36(4):132–60
4. Tian J, Dong D, Liu Z, Zang Y, Wei J, Song J, Mu W, Wang S, Zhou M.
Radiomics in medical imaging—detection, extraction and segmentation.
In: Artificial intelligence in decision support systems for diagnosis in
medical imaging. Berlin: Springer; 2018. p. 267–333
5. Severn C, Suresh K, Görg C, Choi YS, Jain R, Ghosh D. A pipeline
for the implementation and visualization of explainable machine
learning for medical imaging using radiomics features. Sensors.
2022;22(14):5205
6. Bibault J-E, Xing L, Giraud P, El Ayachy R, Giraud N, Decazes P,
Burgun A. Radiomics: a primer for the radiation oncologist. Can-
cer/Radiothérapie. 2020;24(5):403–10
7. Papanikolaou N, Matos C, Koh DM. How to develop a meaningful
radiomic signature for clinical use in oncologic patients. Cancer Imag-
ing. 2020;20(1):1–10
8. Lohmann P, Galldiks N, Kocher M, Heinzel A, Filss CP, Stegmayr C,
Mottaghy FM, Fink GR, Shah NJ, Langen K-J. Radiomics in neuro-
oncology: basics, workflow, and applications. Methods. 2021;188:112–
21
9. Van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B.
Radiomics in medical imaging—“how-to” guide and critical reflection.
Insights Imaging. 2020;11(1):1–16
10. Koçak B, Durmaz EŞ, Ateş E, Kılıçkesmez Ö. Radiomics with artifi-
cial intelligence: a practical guide for beginners. Diagn Interventional
Radiol. 2019;25(6):485
11. Stamoulou E, Spanakis C, Manikis GC, Karanasiou G, Grigoriadis G,
Foukakis T, Tsiknakis M, Fotiadis DI, Marias K. Harmonization strate-
gies in multicenter MRI-based radiomics. J Imaging. 2022;8(11):303
12. Kumar BV, Sabareeswaran S, Madumitha G. A decennary sur-
vey on artificial intelligence methods for image segmentation. In:
Advanced engineering optimization through intelligent techniques.
Berlin: Springer; 2020. p. 291–311
13. Sakinis T, Milletari F, Roth H, Korfiatis P, Kostandy P, Philbrick K,
Akkus Z, Xu Z, Xu D, Erickson BJ. Interactive segmentation of med-
ical images through fully convolutional neural networks; arxiv 2019.
Preprint. arXiv:1903.08205
14. Medseg, October 2021
2 Commercial and Open-Source Tools for AI 31

15. Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M,


Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B. et al. Fiji:
an open-source platform for biological-image analysis. Nat Methods.
2012;9(7):676–82
16. Arganda-Carreras I, Kaynig V, Rueden C, Eliceiri KW, Schindelin
J, Cardona A, Seung HS. Trainable weka segmentation: a machine
learning tool for microscopy pixel classification. Bioinformatics.
2017;33(15):2424–26
17. Diaz-Pinto A, Alle S, Ihsani A, Asad M, Nath V, Pérez-García F, Mehta
P, Li W, Roth HR, Vercauteren T, Xu D, Dogra P, Ourselin S, Feng
A, Cardoso MJ. MONAI label: a framework for AI-assisted interactive
labeling of 3D medical images. arXiv e-prints; 2022
18. Wasserthal J, Meyer M, Breit H-C, Cyriac J, Yang S, Segeroth M.
TotalSegmentator: robust segmentation of 104 anatomical structures in
CT images. Preprint. arXiv:2208.05868; 2022
19. Zeineldin RA, Weimann P, Karar ME, Mathis-Ullrich F, Burgert O.
Slicer-DeepSeg: open-source deep learning toolkit for brain tumour
segmentation. Curr Directions Biomed Eng. 2021;7(1):30–4
20. Zaffino P, Marzullo A, Moccia S, Calimeri F, De Momi E, Bertucci
B, Arcuri PP, Spadea MF. An open-source covid-19 CT dataset with
automatic lung tissue classification for radiomics. Bioengineering.
2021;8(2):26
21. Woznicki P, Laqua F, Bley T, Baeßler B, et al. AutoRadiomics:
A framework for reproducible radiomics research. Front Radiol.
(2022);2:919133. https://doi.org/10.3389/fradi.2022.919133
22. Fermentas Inc. Thermo scientifictm amira-avizo software; 2021.
November 2008
23. Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG,
Dunne PD, McQuaid S, Gray RT, Murray LJ, Coleman HG, et al.
Qupath: open source software for digital pathology image analysis. Sci
Rep. 2017;7(1):1–7
24. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S. Slic
superpixels. Technical report; 2010
25. Trägårdh E, Borrelli P, Kaboteh R, Gillberg T, Ulén J, Enqvist O,
Edenbrandt L. Recomia—a cloud-based platform for artificial intel-
ligence research in nuclear medicine and radiology. EJNMMI Phys.
2020;7(1):1–12
26. Hollandi R, Szkalisity A, Toth T, Tasnadi E, Molnar C, Mathe B, Grexa
I, Molnar J, Balind A, Gorbe M, et al. nucleAIzer: a parameter-free deep
learning framework for nucleus segmentation using image style transfer.
Cell Syst. 2020;10(5):453–58
27. Jia G, Huang X, Tao S, Zhang X, Zhao Y, Wang H, He J, Hao
J, Liu B, Zhou J, et al. Artificial intelligence-based medical image
segmentation for 3D printing and naked eye 3D visualization. Intell
Med. 2022;2(01):48–53
32 E. Stamoulou et al.

28. Kiljunen T, Akram S, Niemelä J, Löyttyniemi E, Seppälä J, Heikkilä


J, Vuolukka K, Kääriäinen O-S, Heikkilä V-P, Lehtiö K, et al. A deep
learning-based automated CT segmentation of prostate cancer anatomy
for radiation therapy planning-a retrospective multicenter study. Diag-
nostics. 2020;10(11):959
29. Heiberg E, Sjögren J, Ugander M, Carlsson M, Engblom H, Arheden
H. Design and validation of segment-freely available software for
cardiovascular image analysis. BMC Med Imag. 2020;10(1):1–13
30. Salehi M, Prevost R, Moctezuma J-L, Navab N, Wein W. Precise ultra-
sound bone registration with learning-based segmentation and speed
of sound calibration. In: International conference on medical image
computing and computer-assisted intervention. Berlin: Springer; 2017.
p. 682–90
31. Lee Y, Veerubhotla K, Jeong MH, Lee CH. Deep learning in per-
sonalization of cardiovascular stents. J Cardiovasc Pharmacol Ther.
2020;25(2):110–20
32. Hesamian MH, Jia W, He X, Kennedy P. Deep learning techniques
for medical image segmentation: achievements and challenges. J Digit
Imag. 2019;32(4):582–96
33. Bang M, Eom J, An C, Kim S, Park YW, Ahn SS, Kim J, Lee S-
K, Lee S-H. An interpretable multiparametric radiomics model for the
diagnosis of schizophrenia using magnetic resonance imaging of the
corpus callosum. Transl Psychiatry. 2021;11(1):1–8
34. Chen Z, Pawar K, Ekanayake M, et al. Deep learning for image
enhancement and correction in magnetic resonance imaging—state-of-
the-art and challenges. J Digit Imag. 2023;36:204–230. https://doi.org/
10.1007/s10278-022-00721-9
35. Yamashita K, Markov K. Medical image enhancement using super res-
olution methods. In: International conference on computational science.
Berlin: Springer; 2020. p. 496–508
36. Van Rossum G, Drake FL Jr. Python reference manual. Centrum voor
Wiskunde en Informatica, Amsterdam, 1995
37. Gu Y, Zeng Z, Chen H, Wei J, Zhang Y, Chen B, Li Y, Qin
Y, Xie Q, Jiang Z, et al. MedSRGAN: medical images super-
resolution using generative adversarial networks. Multimedia Tools
Appl. 2020;79(29):21815–40
38. Tustison NJ, Cook PA, Holbrook AJ, Johnson HJ, Muschelli J, Devenyi
GA, Duda JT, Das SR, Cullen NC, Gillen DL, et al. The ANTsX
ecosystem for quantitative biological and medical imaging. Sci Rep.
2021;11(1):1–13
39. Zhang K, Zuo W, Chen Y, Meng D, Zhang L. Beyond a gaussian
denoiser: residual learning of deep CNN for image denoising. IEEE
Trans Image Process. 2017;26(7):3142–55
2 Commercial and Open-Source Tools for AI 33

40. Venkatesh V, Sharma N, Singh M. Intensity inhomogeneity correction


of MRI images using InhomoNet. Comput Med Imaging Graphics.
2020;84:101748
41. Cheng X, Zhang L, Zheng Y. Deep similarity learning for multimodal
medical images. Comput Methods Biomech Biomed Eng Imag Visual-
ization. 2018;6(3):248–52
42. Spanakis C, Mathioudakis E, Kampanis N, Tsiknakis M, Marias K.
Machine-learning regression in evolutionary algorithms and image reg-
istration. IET Image Process. 2019;13(5):843–49
43. Manoj S, Ranjitha S, Suresh HN. Hybrid BAT-PSO optimization tech-
niques for image registration. In: 2016 International conference on
electrical, electronics, and optimization techniques (ICEEOT). Piscat-
away: IEEE; 2016. p. 3590–3596
44. Wodzinski M, Müller H. DeepHistReg: unsupervised deep learning reg-
istration framework for differently stained histology samples. Comput
Methods Prog Biomed. 2021;198:105799
45. Dey N, Ren M, Dalca AV, Gerig G. Generative adversarial registration
for improved conditional deformable templates. In: Proceedings of the
IEEE/CVF international conference on computer vision; 2021. p. 3929–
41
46. Markova V, Ronchetti M, Wein W, Zettinig O, Prevost R. Global
multi-modal 2d/3d registration via local descriptors learning. Preprint.
arXiv:2205.03439; 2022
47. Li J, Deep learning for cranial defect reconstruction, [Master’s Thesis,
Graz University of Technology (90000)], 2020
48. Fedorov A, Beichel R, Kalpathy-Cramer J, Finet J, Fillion-Robin J-C,
Pujol S, Bauer C, Jennings D, Fennessy F, Sonka M, et al. 3d slicer
as an image computing platform for the quantitative imaging network.
Magn Reson Imaging. 2012;30(9):1323–41
49. Schlemper J, Caballero J, Hajnal JV, Price A, Rueckert D. A deep cas-
cade of convolutional neural networks for MR image reconstruction. In:
International conference on information processing in medical imaging.
Berlin: Springer; 2017. p. 647–58
50. Vasudeva B, Deora P, Bhattacharya S, Pradhan PM. Co-vegan: complex-
valued generative adversarial network for compressive sensing MR
image reconstruction. Preprint. arXiv:2002.10523; 2020
51. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA,
Gee JC. N4itk: improved n3 bias correction. IEEE Trans Med Imaging.
2010;29(6):1310–20
52. Carré A, Klausner G, Edjlali M, Lerousseau M, Briend-Diop J, Sun R,
Ammari S, Reuzé S, Andres EA, Estienne T, et al. Standardization of
brain MR images across machines and protocols: bridging the gap for
MRI-based radiomics. Sci Rep. 2020;10(1):1–15
53. Schneider CA, Rasband WS, Eliceiri KW. NIH image to ImageJ: 25
years of image analysis. Nat Methods. 2012;9(7):671–5
34 E. Stamoulou et al.

54. Bazin P-L, Cuzzocreo JL, Yassa MA, Gandler W, McAuliffe MJ,
Bassett SS, Pham DL. Volumetric neuroimage analysis extensions for
the MIPAV software package. J Neurosci Methods. 2007;165(1):111–
21
55. Götz M, Nolden M, Maier-Hein K. MITK phenotyping: an open-
source toolchain for image-based personalized medicine with radiomics.
Radiother Oncol. 2019;131:108–11
56. Nioche C, Orlhac F, Boughdad S, Reuzé S, Goya-Outi J, Robert C,
Pellot-Barakat C, Soussan M, Frouin F, Buvat I. LIFEx: a freeware
for radiomic feature calculation in multimodality imaging to accelerate
advances in the characterization of tumor heterogeneity. Cancer Res.
2018;78(16):4786–9
57. Yaniv Z, Lowekamp BC, Johnson HJ, Beare R. Simpleitk image-
analysis notebooks: a collaborative environment for education and
reproducible research. J Digital Imaging. 2018;31(3):290–303
58. Van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N,
Narayan V, Beets-Tan RGH, Fillion-Robin J-C, Pieper S, Aerts HJWL.
Computational radiomics system to decode the radiographic phenotype.
Cancer Res. 2017;77(21):e104–7
59. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B.
Radiomics in medical imaging—“how-to” guide and critical reflection.
Insights Imaging. 2020;11:91
60. Szczypiński PM, Strzelecki M, Materka A, Klepaczko A. Mazda–
the software package for textural analysis of biomedical images. In:
Computers in medical activity. Berlin: Springer; 2009. p. 73–84.
61. Zhang L, Fried DV, Fave XJ, Hunter LA, Yang J, Court LE. Ibex: an
open infrastructure software platform to facilitate collaborative work in
radiomics. Med Phys. 2015;42(3):1341–53
62. RadiomiX Research Toolbox. https://radiomics.bio/radiomix-toolbox/.
Available online; accessed on 21 Nov 2022
63. Talukder S. GPU-based medical image segmentation: Brain MRI analy-
sis using 3d slicer. In: Artificial intelligence applications for health care.
Boca Raton: CRC Press; 2022. p. 109–121
64. Deep Learning Studio. https://deeplearningstudio.com/. Available
online; accessed on 21 Nov 2022
65. Nvidia’s Digits System. https://developer.nvidia.com/digits. Available
online; accessed on 21 Nov 2022
66. Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk
V, Apte A, Ashrafinia S, Bakas S, Beukinga RJ, Boellaard R, et al. The
image biomarker standardization initiative: standardized quantitative
radiomics for high-throughput image-based phenotyping. Radiology.
2020;295(2):328–38
67. Zwanenburg A, Leger S, Vallières M, Löck S. Image biomarker stan-
dardisation initiative; arxiv 2016. Preprint. arXiv:1612.07003
2 Commercial and Open-Source Tools for AI 35

68. Fortin J-P, Cullen N, Sheline YI, Taylor WD, Aselcioglu I, Cook PA,
Adams P, Cooper C, Fava M, McGrath PJ, et al. Harmonization of
cortical thickness measurements across scanners and sites. Neuroimage.
2018;167:104–20
69. Garcia-Dias R, Scarpazza C, Baecker L, et al. Neuroharmony: A new
tool for harmonizing volumetric MRI data from unseen scanners. Neu-
roimage. 2020;220:117127. https://doi.org/10.1016/j.neuroimage.2020.
117127
70. Bouhali O, Bensmail H, Sheharyar A, David F, Johnson JP. A review of
radiomics and artificial intelligence and their application in veterinary
diagnostic imaging. Vet Sci. 2022;9(11):620
71. Wagner MW, Namdar K, Biswas A, Monah S, Khalvati F, Ertl-Wagner
BB. Radiomics, machine learning, and artificial intelligence—what the
neuroradiologist needs to know. Neuroradiology. 2021;63(12):1957–67
72. Webb GI, Keogh E, Miikkulainen R. Naïve bayes. Encycl Mach Learn.
2010;15:713–14
73. Wright RE, Logistic regression. In: Grimm LG, Yarnold PR, editors.
Reading and understanding multivariate statistics. American Psycholog-
ical Association; 1995. p. 217–44
74. Suthaharan S. Support vector machine. In: Machine learning models and
algorithms for big data classification. Berlin: Springer; 2016. p. 207–35
75. Myles AJ, Feudale RN, Liu Y, Woody NA, Brown SD. An introduction
to decision tree modeling. J Chemom J Chemom Soc. 2004;18(6):275–
85
76. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32
77. Puttagunta M, Ravi S. Medical image analysis based on deep learning
approach. Multimedia Tools Appl. 2021;80(16):24365–98
78. Hodgson J. The 5 Stages of Machine Learning Validation. Retrieved
from https://towardsdatascience.com/the-5-stages-of-machine-
learning-validation-162193f8e5db, 2022
79. Karthikeyan N. Step-by-step guide for a deep learning project.
Retrieved from https://medium.com/@neelesh_k/structuring-deep-
learning-projects-b83d29513aea, 2022
80. Brownlee, J. Machine learning mastery with python: understand your
data, create accurate models, and work projects end-to-end; 2016.
Melbourne: Machine Learning Mastery
81. Hutter F, Kotthoff L, Vanschoren J. Automated machine learning:
methods, systems, challenges; 2019. Berlin: Springer Nature
82. Mustafa A, Azghadi MR. Automated machine learning for healthcare
and clinical notes analysis. Computers. 2021;10(2):24
83. Microsoft AutoMl. https://www.microsoft.com/en-us/research/project/
automl/. Available online; accessed on 21 Nov 2022
84. Goudas T, Doukas C, Chatziioannou A, Maglogiannis I. A collabo-
rative biomedical image-mining framework: application on the image
analysis of microscopic kidney biopsies. IEEE J Biomed Health Inf.
2012;17(1):82–91
36 E. Stamoulou et al.

85. Tsamardinos I, Charonyktakis P, Papoutsoglou G, Borboudakis G,


Lakiotaki K, Zenklusen JC, Juhl H, Chatzaki E, Lagani V. Just add data:
automated predictive modelling for knowledge discovery and feature
selection. NPJ Precis Oncol. 2022;6:38
86. KNIME Software. 2021. https://www.knime.com/knime-software/.
Available online; accessed on 18 Nov 2022
87. BigML, Inc. Corvallis, Oregon, USA, 2011. https://bigml.com. Avail-
able online; accessed on 21 Nov 2022
88. Witten IH, Frank E, Hall MA, Pal CJ, MINING DATA. Practical
machine learning tools and techniques. In: Data mining. vol. 2; 2005
89. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH.
The weka data mining software: an update. ACM SIGKDD Explorations
Newsl. 2009;11(1):10–8
90. Demšar J, Curk T, Erjavec A, Gorup Č, Hočevar T, Milutinovič M,
Možina M, Polajnar M, Toplak M, Starič A, Štajdohar M, Umek L,
Žagar L, Žbontar J, Žitnik M, Zupan B. Orange: data mining toolbox
in python. J Mach Learn Res. 2013;14:2349–53
91. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado
GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A,
Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg
J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens
J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan
V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng
X. TensorFlow: large-scale machine learning on heterogeneous systems;
2015. Software available from https://tensorflow.org
92. Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F.
Efficient and robust automated machine learning. In: Advances in neural
information processing systems 28; 2015. p. 2962–70
93. Jin H, Song Q, Hu X. Auto-Keras: an efficient neural architecture
search system. In: Proceedings of the 25th ACM SIGKDD international
conference on knowledge discovery & data mining. New York: ACM;
2019. p. 1946–56
94. Vasile M-A, Florin POP, Mihaela-Cătălina N, Cristea V. MLBox:
machine learning box for asymptotic scheduling. Inf Sci. 2018;433:401–
416
95. Kotu V, Deshpande B. Predictive analytics and data mining: concepts
and practice with RapidMiner. Burlington: Morgan Kaufmann; 2014
96. Orlhac F, Eertink JJ, Cottereau A-S, Zijlstra JM, Thieblemont C,
Meignan M, Boellaard R, Buvat I. A guide to combat harmoniza-
tion of imaging biomarkers in multicenter studies. J Nucl Med.
2022;63(2):172–9
97. Fernandes S, Chong JJH, Paige SL, Iwata M, Torok-Storb B, Keller G,
Reinecke H, Murry CE. This is an open access article under the cc by
license http://creativecommons.org/licenses/by/4.0/
2 Commercial and Open-Source Tools for AI 37

98. Da-Ano R, Visvikis D, Hatt M. Harmonization strategies for multicenter


radiomics investigations. Phys Med Biol. 2020;65(24):24TR02
99. Mali SA, Ibrahim A, Woodruff HC, Andrearczyk V, Müller H, Primakov
S, Salahuddin Z, Chatterjee A, Lambin P. Making radiomics more
reproducible across scanner and imaging protocol variations: a review
of harmonization methods. J Pers Med. 2021;11(9):842
100. Lundberg SM, Lee SI. A unified approach to interpreting model pre-
dictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus
R, Vishwanathan S, Garnett R, editors. Advances in neural informa-
tion processing systems 30. Red Hook: Curran Associates Inc.; 2017.
p. 4765–74
101. Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR.
Application of explainable artificial intelligence for healthcare: A
systematic review of the last decade (2011-2022). Comput Methods
Programs Biomed. 2022;226:107161. https://doi.org/10.1016/j.cmpb.
2022.107161
Introduction to Machine Learning in
Medicine 3
Rossana Buongiorno, Claudia Caudai,
Sara Colantonio, and Danila Germanese

3.1 Introduction

The increasing availability of patient-related data is driving new


research trends addressing new personalized prediction and dis-
ease management. Nevertheless, the complexity of such an anal-
ysis makes it necessary the use of cognitive augmentation in the
form of Artificial Intelligence (AI) systems [1, 2].
Modern AI techniques have considerable potential to exploit
complex medical data toward an improvement of the current
healthcare.
Many of the advances in this field are tied to progress in a
subdomain of AI research known as Machine Learning (ML).
The scientist Arthur Lee Samuel was the first to introduce the
term Machine Learning in 1959. He created the checkers player,
a programme designed to develop its own logic while playing and
self-improve its performance.

R. Buongiorno · C. Caudai · S. Colantonio () · D. Germanese


Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa,
Italy
e-mail: [email protected]; [email protected]

© The Author(s), under exclusive license to Springer Nature 39


Switzerland AG 2023
M. E. Klontzas et al. (eds.), Introduction to Artificial
Intelligence, Imaging Informatics for Healthcare
Professionals, https://doi.org/10.1007/978-3-031-25928-9_3
40 R. Buongiorno et al.

ML algorithms are fruitfully applied in many research fields


involving the most varied applications [3–5]. As far as applica-
tions in the medical and clinical fields are concerned, many uses,
advantages, and opportunities can be enumerated [6–8].
Nevertheless, ML is not yet used in the medical field as a direct
diagnostic tool, but exclusively as a support to the diagnosis,
because human supervision is still indispensable and probably
will remain so for a long time. A vast trend is that of the analysis
and interpretation of medical images, such as, for example, X-ray,
US, echo, RM, using ML algorithms. Another widespread use is
the prediction of the prognosis associated with particular diseases,
such as different types of cancer. ML algorithms can also help
automatically select patients suitable for particular clinical path-
ways and experimental trials. The analysis of the ML anatomical
features can also be very useful in the real-time and non-real-
time support of surgery, with the aim of promptly identifying
unexpected critical or irregular anatomical structures. Other uses
also include drug discovery, genomic screening, cardiovascular
tasks, and epidemiology. In Table 3.1, many examples of medical
applications are summarized.
In this chapter, we aim to describe, as simply as possible, what
Machine Learning is and how it is possible to use it fruitfully in
the medical field. In Sect. 3.2, we describe the flow of a learning
algorithm, in Sect. 3.3, instead we report the main ML techniques
and their most widespread clinical applications. In Sect. 3.4, how-
ever, we briefly address some highly interesting issues (i.e., model
evaluation, explanability, reproducibility, sharing, and ethical and
legal problems) and the great challenges of precision medicine
and personalized medicine, which are increasingly handy thanks
to Machine Learning.

3.2 What Is Machine Learning?

Machine learning is a branch of AI which comprises elements of


mathematics, statistics, and computer science. The term machine
learning describes the ability of an algorithm to “learn” from data
by recognizing patterns and making inferences and/or predictions
3 Introduction to Machine Learning in Medicine 41

Table 3.1 Main medical applications for Machine Learning methods


Applications Description Ref.
Cardio Vascular Cardiovascular risk prediction 2018 [9]
Tasks Hyper-myocardial infarction prediction 2016 [10]
Coronary artery disease detection 2018 [11]
Heart failure prediction 2011 [12]
Stroke risk prediction
Cardiac arrhythmias evaluation
Human Cancer Cancer detection on CT-scan, X-ray, US, 2018 [13]
Imaging MRI images
Lesion segmentation 2016 [14]
Regression on cancer grade 2019 [15]
Cancer risk evaluation 2022 [16]
Cancer evolution prediction 2021 [17]
Cancer classification 2019 [18]
Cancer Cancer genomics classification 2007 [19]
Genomics Identification of pathogenic variants 2013 [20]
Detection of oncogenic states 2014 [21]
Bioactivity prediction 2003 [22]
Cancer cell-line specific interactions 2014 [23]
Clinical trials monitoring 2021 [24]
Functional Prediction of protein secondary structure 2001 [25]
Genomics Prediction of epigenomic marks 2011 [26]
Transcriptional and post-transcriptional 2018 [27]
regulation
Investigation 2015 [28]
Transcription biological rules detection 2020 [29]
Protein Mass Spectrometry classification 2021 [30]
Metabolic Metabolic syndrome risk prediction 2021 [31]
Disorders Alignment of metabolomic peaks 2014 [32]
Metabolites classification 2017 [33]
Kinetic metabolic modelling 2016 [34]
Prediction of Oestrogen receptor status 2022 [35]
(continued)
42 R. Buongiorno et al.

Table 3.1 (continued)


Applications Description Ref.
Prognostic Survival Prediction 2018 [36]
Prediction Disease prognosis 2019 [37]
Trials outcome prediction 2019 [38]
Identification of risk factors 2016 [39]
Disease recurrence prediction 2018 [40]
Drug target identification 2018 [41]
Drug Discovery
Target druggability prediction 2016 [42]
Splice variants classification 2015 [43]
Anticancer drugs prioritization 2019 [44]

of future events with minimal human intervention and without


explicit programming. In other words, the results produced by
ML algorithms are inferences made from complex statistical
analyses of adequately large datasets, expressed as the likelihood
of a relationship between variables [45]. Furthermore, machine
learning methods improve their performance adaptively as the
number of examples from which they learn increases.
Over the past 5 years, machine-learned tools have demon-
strated visible successes in the medical field, and in particular
in disease detection, patient monitoring, prognosis prediction,
surgical assistance, patient care, and systems management, by
supporting complex clinical decision-making [46, 47].
A wide variety of machine learning algorithms are in use
today. The choice of a particular model for a given problem is
determined by the characteristics of the data as well as the type
of desired outcome. The majority of ML algorithms can be cate-
gorized into three types learning techniques: supervised learning,
unsupervised learning, and reinforcement learning (Fig. 3.1).
However, any type of algorithm consists of a series of key
steps, as shown in Fig. 3.2:

• Data collection: as machines learn from the data that one


gives them, it is of the utmost importance to collect data from
reliable sources. The higher the quality of the data, the higher
3 Introduction to Machine Learning in Medicine 43

Fig. 3.1 ML algorithms are generally categorized as supervised learning,


unsupervised learning, and reinforcement learning

the accuracy of the developed model. Good data are relevant,


contain very few missing and repeated values, and have a good
representation of the various subcategories/classes present.
Noisy or incorrect data will clearly reduce the effectiveness
of the model.
• Data pre-processing: after collecting data, they have to be
correctly prepared. For instance, they may be randomized to
make sure that data are evenly distributed, and the ordering
does not affect the learning process. Also, data may be cleaned
by removing missing/duplicate values, by converting data type,
etc. Finally, the cleaned data should be split into two sets: a
training set and a testing set. The training set is the set the
model learns from. The testing set is used to check the accuracy
of the trained model.
44 R. Buongiorno et al.

Fig. 3.2 Core steps of any type of ML approach

• Model training: the prepared data are analysed, elaborated,


and interpreted by the machine learning model to find patterns
and make predictions. Over time, with training, the model gets
better at learning from the data and inferring.
• Model testing: after training the model, its performances have
to be checked. This is done by testing the accuracy and the
speed of the model on previously unseen data (testing set).
• Model improving: this step is also known as Parameter Tuning.
Once the model is created and evaluated, the tuning of model
parameters allows for improving the accuracy of the model
itself. Parameters are the variables in the model that better
fit the relationship between the data. At certain values of
the parameters set, the accuracy may reach the maximum.
Parameter tuning refers to finding these values.

In the following sections, the three main types of learning are


described: supervised, unsupervised, and reinforcement learning.
3 Introduction to Machine Learning in Medicine 45

3.3 Principal ML Algorithms

Machine learning concerns three main types of algorithms: super-


vised learning, unsupervised learning, and reinforcement learn-
ing. The difference between them is defined by how each algo-
rithm learns the data to make predictions.

3.3.1 Supervised Machine Learning

Supervised learning refers to approaches in which a model is


trained on a set of numerical inputs (or features, or predictors)
which are associated with known outcomes (also referred to as
ground truth or prior knowledge).
As reported in Fig. 3.3, the goal in the first stage of learning
is to best approximate the relationship between input and output
observables in the data. In the validation step, the model is itera-
tively improved to reduce the error of prediction using optimiza-
tion techniques: in other words, the learning algorithm iteratively
compares its predictions with the correct output (ground truth
label) and finds errors in order to modify itself accordingly. Once
the algorithm is successfully trained, it will be able to make
outcome predictions when applied to new data.
Predictions can be either discrete (sometimes referred to as
classes, e.g., positive or negative, benign or malignant, no risk–
low risk–high risk, etc.) or continuous (e.g., a value from 0 to
100). A model that maps input to a discrete output is based on
a classification algorithm (Fig. 3.4). Examples of classification
algorithms include those which predict if a tumour is benign or
malignant or to establish whether comments written by a patient
convey a positive or a negative sentiment [48–50]. Classifica-
tion algorithms return the probability of a class (between 0 for
impossible and 1 for definite). Typically, a probability of 0.50
will be transformed into a class of 1, but this threshold may
be modified according to the required algorithm performance.
A model that maps input to a continuous value is based on a
regression algorithm (Fig. 3.4). A regression algorithm might be
46 R. Buongiorno et al.

Fig. 3.3 Three steps of Supervised Learning: (i) training of the algorithm,
(ii) validation of the trained model, and (iii) test of the model

Fig. 3.4 How classification (on the left) and regression algorithms (on the
right) work. Classification algorithms find out the better hyperplane(s) which
divides the data into two (or more) classes; a regression model aims to find
out the better function that approximates the trend of the data

used, for instance, to estimate the percentage of fat in the liver in


case of steatosis or predict an individual’s life expectancy [51,52].
3 Introduction to Machine Learning in Medicine 47

Note that in [51] the estimation is performed based on ultra-


sound images. For this type of tasks, i.e., image processing,
the predictors must be processed by a feature selector. A fea-
ture selector extracts measurable characteristics from the images
dataset which then can be represented in a numerical matrix and
understood by the algorithm (see Fig. 3.3).
Four key concerns to be considered in supervised learning
are:

• Bias-variance trade-off: in any supervised model, there is a


balance between bias, which is the constant error term, and
variance, which is the amount by which the error may vary
between different training sets. Increasing bias will usually
lead to lower variance and vice versa. Generally, in order to
produce models that generalize well, the variance of the model
should scale with the size and complexity of the training data:
small datasets should usually be processed with low-variance
models, and large, complex datasets will often require higher
variance models to fully learn the structure of the data.
• Model complexity and volume of training data: the proper level
of model complexity is generally determined by the nature
of the training data. A small amount of data, or data that are
not uniformly spread throughout different possible scenarios,
will be better explained by a low-complexity model. This is
because a high-complexity model will overfit if used on a small
number of data points.
• Overfitting: it refers to learning a function that fits the training
data very well but does not generalize to other data points. In
other words, the model strictly learns the training data without
learning the actual trend or structure in the data themselves that
leads to those outputs.
• Noise in the output values: this issue concerns about the
amount of noise in the preferred output values. Noisy or
incorrect data labels will clearly reduce the effectiveness of the
trained model.

Here, the most prominent and common methods used in super-


vised machine learning are reported: Linear Regression, Support
48 R. Buongiorno et al.

Vector Machine, Random Decision Forest, Extreme Gradient


Boosting, and Naive Bayes.

3.3.1.1 Linear Regression


Linear Regression (LR) is one of the simplest and most used
supervised learning methods. In essence, this method formalizes
and identifies the relationship between two or more variables. The
assumption of linearity on the cost function is very strong, and
therefore more complex cost function regression methods have
been developed: Non-linear Regression, Polynomial Regression,
Logistic Regression with Sigmoid function, Poisson Regression,
and many others. As one of the oldest approaches, LR has been
widely used in many fields, including medical [15]. This approach
is mainly used when a relationship between variables is strongly
assumed, and the value of one variable (unknown) is to be
deduced starting from the values of the other (known). A recent
example of the use of LR in medicine concerns the prediction of
the evolution of systemic diseases starting from clinical evidence
[31, 53], or in the genomic field, for example, it can be useful
for estimating gene expression patterns in particular biological
conditions [54].

3.3.1.2 Support Vector Machine


Support Vector Machines (SVMs) are supervised learning meth-
ods for binary classification. The SVMs represent the data as
points in space, building a hyperplane, as wide as possible, which
can be positioned as a separator between the two classification
categories. The SVMs perform a linear classification, but it is also
possible to perform a nonlinear classification using an adequate
kernel, projecting the data into a multi-dimensional rather than a
two-dimensional space. This algorithm is used in many classifica-
tion and regression problems, and in the medical field it is often
used for signal separation or for clinical discrimination starting
from well-specified characteristics [55, 56]. A very interesting
and promising use concerns the early diagnosis [20] or the
classification of some types of cancer starting from genomic data
[19, 57].
3 Introduction to Machine Learning in Medicine 49

3.3.1.3 Random Decision Forest


Random Decision Forests (RDFs) were first proposed by Tin Kam
Ho in 1995 [58]. They is a learning method based on training
many Decision Trees (DTs), from which a decision strategy
is then aggregated. The various Decision Trees are based on
the observation of certain characteristics of the data, selected
randomly. RDTs are often unstable methods but have the great
advantage of being easily interpretable. They can be used for
both regression and classification. It is mainly used in problems
on which there is not yet a precise idea of the weight of the
data characteristics or of the relationships between them. In the
medical field, it is a widely used method for relating clinical
features and pathologies [22, 59–62]. In a recent work by Wang
et al. [63], RDFs have been used to detect the factors that most
impact on medical expenses for diabetic individuals in the USA,
while Hsich [12] studied the most critical risk factors for survival
in patients with systolic heart failure through Random Survival
Forests (RSFs). RDF models have also been used for the analysis
of genomic data [64].

3.3.1.4 Extreme Gradient Boosting


Extreme Gradient Boosting is a supervised Machine Learning
technique for regression and classification problems that aggre-
gates an ensemble of weak individual models to obtain a more
accurate final model. This method is applied to multicollinear-
ity problems in which there are high correlations between the
variables. It helps a lot in improving the predictive accuracy of
the model and is often used in risk assessment problems. In the
medical field, it has been used in many applications, and some
examples are represented by the evaluation of the outcomes of
clinical treatments [65] or the study of systemic diseases that
depend on multifactorial conditions that are difficult to interpret
[66].

3.3.1.5 Naive Bayes


Bayes Classifiers are ML methods that use Bayes’ theorem for the
classification process. These classifiers are very fast and, despite
their simplicity, are efficient at many complex tasks, even with
50 R. Buongiorno et al.

small training datasets. They are used to calculate the conditional


probability of an event, based on the information available on
other related events. A disadvantage of such classifiers is the
fact that they require knowledge of all the data of the problem,
especially the simple and conditional probabilities (information
that is difficult to obtain). They also assume the independence
of the characteristics of the input and therefore provide a simple
approximation (naive) of the problem. In the medical field, they
have often been used in classification problems [67] or in feature
selection [68]. Silla et al. [69] used an extension of the Naive
Bayes approach in the context of proteomics, for the hierarchical
classification of protein function, while Sandberg et al. [70] used
a naive Bayes classifier for the analysis of complete sequences of
bacterial genomes, capturing highly specific genomic signatures.

3.3.2 Unsupervised Machine Learning

In contrast with supervised learning, unsupervised learning mod-


els process unlabeled data to uncover the underlying data struc-
ture. In unsupervised learning, patterns are found out by algo-
rithms without any input from the user. Unsupervised techniques
are thus used to find undefined patterns or clusters of data points
which are “closer” or more similar to each other.
A visual illustration of an unsupervised dimension reduction
technique is given in Fig. 3.5. In this figure, the raw data (repre-
sented by various shapes in the left panel) are presented to the
algorithm which then groups the data into clusters of similar data
points (represented in the right panel). Note that data that do not
have sufficient commonality to the clustered data are typically
excluded, thereby reducing the number of features within of
the dataset. Indeed, these techniques are often referred to as
dimension reduction techniques.
Unsupervised methods ability to discover similarities and
differences in information make them the ideal solution for
exploratory data analysis. The output is highly dependent on the
algorithm and hyperparameters selected. Hyperparameters, also
called tuning parameters, are values used to control the behaviour
3 Introduction to Machine Learning in Medicine 51

Fig. 3.5 How Unsupervised Machine Learning algorithms work. They use a
more self-contained approach, in which a computer learns to identify patterns
without any guidance, only inputting data that are unlabeled and for which no
specific output has been defined. In practice, this type of algorithms will learn
to divide the data into different clusters based on the characteristics that unite
or discriminate them the most

of the ML algorithm (e.g., a number of clusters, distance or


density thresholds, type of linkage between clusters). Algorithms
exist to detect clusters based on spatial distance between data
points, space or subspace density, network connectivity between
data points, etc.
By compressing the information in a dataset into fewer
features, or dimensions, issues including multiple collinearity
between data or high computational cost of the algorithm may be
avoided.
Unsupervised approaches also share many similarities to sta-
tistical techniques which will be familiar to medical researchers.
Unsupervised learning techniques make use of similar algorithms
used for clustering and dimension reduction in traditional statis-
tics. Those familiar with Principal Component Analysis, for
instance, will already be familiar with many of the techniques
used in unsupervised learning.
Here, the most prominent and common methods used in Unsu-
pervised Machine Learning are reported: k-Nearest Neighbours,
Principal Component Analysis, and k-Means Clustering.
52 R. Buongiorno et al.

3.3.2.1 k-Nearest Neighbours


k-Nearest Neighbours is an instance-based learning algorithm,
used for pattern recognition for classification or regression. The
algorithm uses similarity criteria between elements that are close
to each other. In essence, the closest neighbours contribute more
to the attribution of characteristics than the distant ones. The
parameter k represents the number of neighbours that will con-
tribute to the training in the feature space. This algorithm is
often used in problems concerning the recognition of similarity
patterns aimed at classification, and it is a fast, precise, and
efficient algorithm but has the disadvantage that the precision of
the predictions is strongly dependent on the quality of the data. In
the medical field, it has often been used for the analysis of hidden
patterns on a very big amount of data from clinical repositories
[71] and in the genomic field [72].

3.3.2.2 Principal Component Analysis


Principal Component Analysis (PCA, also called the Karhunen–
Loève transform) is a statistical procedure for dimensionality
reduction of the space of variables. The PCA consists of a linear
transformation of the variables that projects the original ones
into a new Cartesian system in which the new variables try to
transfer most of the significance (variance) of the old variables
onto a plane, thus obtaining a dimensional reduction without
losing too much information. One major limitation of this method
is that it can only capture linear correlations between variables.
To overcome this disadvantage, sparse PCA and nonlinear PCA
have been recently introduced. PCA is widely used especially in
the fields of medicine and psychology, where the scientists works
with datasets made up of numerous variables [26, 73, 74].

3.3.2.3 k-Means Clustering


K-Means Clustering is a vector quantization method for parti-
tioning input data into k clusters. The goal of the algorithm is to
minimize the total intra-group variance; each group is identified
by a centroid or a mid-point. At each step of the algorithm, the
input points are assigned to the group with the closest centroid
to them. The centroids are recalculated at each step until the
3 Introduction to Machine Learning in Medicine 53

algorithm converges and the centroids are stable. k-Means is both


a simple and an efficient algorithm for clustering problems, but
it has the drawback of being very sensitive to outliers, which
can significantly deviate the centroids, and of having to choose
the number of clusters a priori. In medicine, it is mainly used in
situations where a lot of unlabelled data are available [75–77].

3.3.3 Artificial Neural Networks

Artificial Neural Networks deserve a very long description, which


is beyond the scope of this chapter. We will only say that they are
computational learning models made up of artificial “neurons,”
inspired by the simplification of biological neural networks. Such
models consist of layers of artificial neurons and processes using
computational connections. They are adaptive systems, which
change their structure based on the information they process
during the learning phase. The layers of neurons can also be
very deep (hence the term Deep Learning). There are several
ANN models, which can be trained in a supervised or unsuper-
vised manner. They are used to create very complex learning
algorithms, with very high abstraction capabilities, which are
therefore difficult to interpret. In the medical field, they have
many applications, especially in areas where large amounts of
data are available (Big Data) [78–81]. Among the ANNs, the Con-
volutional Neural Networks (CNNs) deserve particular attention,
in which the pattern of connectivity between neurons is inspired
by the organization of the animal visual cortex, and for this reason
they are particularly suitable for processing images, widely used
for the analysis of medical images for detection, segmentation,
and classification of anomalies or lesions [14, 82, 83].

3.3.4 Reinforcement Learning

Reinforcement learning is a machine learning technique in which


a computer (agent) keeps learning continuously to perform a task
through repeated trial-and-error interactions with an interactive
environment. In other words, agent is self-trained on reward and
54 R. Buongiorno et al.

Fig. 3.6 Basic diagram of Reinforcement Learning

punishment mechanisms (see Fig. 3.6). This learning approach


allows the agent to make a series of decisions that maximize
a reward metric for the activity, without being explicitly pro-
grammed to do so and without human intervention.

3.4 Issues and Challenges


3.4.1 Data Management

Data used in ML applications, derived by medical protocols and


experimental trials, may be incomplete, contain errors, biases, and
artefacts. Moreover, some data point may be missing for some
of the samples. In this scenario, data imputation, denoising, and
integration should be part of the design of ML algorithms applied
to medicine. We will not go into detail, but it is important to
know that the performance of AI algorithms is strongly dependent
on the quality of the dataset and that in the case of unbalanced,
incomplete, noisy, biased datasets, there are many solutions to be
applied to improve datasets and performances [84–88].

3.4.2 Machine Learning Model Evaluation Metrics

Evaluating the developed machine learning model is an essential


part of any project. Here are listed the most widely used evalua-
tion metrics:
3 Introduction to Machine Learning in Medicine 55

• Classification Accuracy: it is the sum of all the correct predic-


tions divided by the total number of input samples. Neverthe-
less, it works well only in the case of balanced dataset, i.e.,
there are equal number of samples belonging to each class. In
the case of unbalanced samples, it can be misleading and give
the false impression of achieving high accuracy. For instance,
if we have 98% samples of class “A” and 2% samples of class
“B,” the model can easily get 98% training accuracy by simply
predicting every training sample belonging to class A. But
when tested on a test set with 60% samples of class A and
40% samples of class B, the model can get a lower accuracy
of 60%. This issue may be a real problem when the cost of
misclassification of the minor class samples is very high: in
the case of serious pathologies, the cost of not diagnosing a
sick person’s disease is much higher than the cost of further
testing a healthy person.
• Logarithmic Loss (or Log Loss): it penalizes the false classifi-
cations and works well for multi-class classification. Log Loss
nearer to 0 indicates higher accuracy. In general, minimizing
Log Loss gives higher classifier accuracy.
• Area Under Curve (AUC): it is one of the most widely
used metrics for evaluation, especially for binary classification
problem. AUC of a classifier indicates the probability that the
classifier will rank a randomly chosen positive example higher
than a randomly chosen negative example. To understand how
it is computed, let us introduce (i) the True Positive Rate (TPR,
or Sensitivity), which corresponds to the proportion of positive
data samples that are correctly considered as positive, with
respect to all positive data samples and (ii) the True Negative
Rate (TNR, or Specificity), which corresponds to the propor-
tion of negative data samples that are correctly considered as
negative, with respect to all negative data samples. Sensitivity
and (1—Specificity) are plotted at varying threshold values
in the range [0,1] and the Receiver Operating Characteristic
(ROC) curve is graphed. AUC is the area under ROC curve,
and the higher the value, the better the performance of the ML
model.
56 R. Buongiorno et al.

• Mean Absolute Error: it is the average of the difference


between the correct outputs and the predicted outputs. It gives
us the measure of how far the predictions were from the actual
values. Nevertheless, they do not give any idea whether we are
under predicting the data or over predicting the data.
• Mean Squared Error (MSE): it is quite similar to Mean Abso-
lute Error, as it takes the average of the square of the difference
between the correct outputs and the predicted outputs. As the
square of the error is calculated, the effect of larger errors
becomes more pronounced than smaller error, and hence the
model can focus more on larger errors.

3.4.3 Explainability, Interpretability, and Ethical


and Legal Issues

Both the interpretability and the explainability of ML algorithms


go in the direction of strengthening trust in ML algorithms
[89, 90]. Interpretability concerns the way in which the ML
model reaches its conclusions and has the aim of verifying
that the accuracy of the model derives from a correct repre-
sentation of the problem and not from artefacts present in the
training data, while explainability seeks to explain the reasons
why an algorithm made one decision rather than another. The
concept of explainability is receiving more and more attention
from a legal point of view, according to the right of every
individual to receive detailed explanations regarding the deci-
sions that impact on his life. In general, explainability and
interpretability are well achievable with simple classification
algorithms such as decision trees or linear regressors, but the
relationships between data characteristics are not always linear,
and consequently, these approaches may not be suitable in many
tasks.
Most of the ML algorithms are too complex to be under-
stood directly, so it is necessary to adopt the post hoc analysis
[91]. In some cases, the explanation is achieved by deriving a
transparent model that mimics the original one, although such
an approach is not always possible. It is often necessary to
3 Introduction to Machine Learning in Medicine 57

work hard to obtain explanations, through empirical approaches,


recursive attempts, and series of examples. Applications of ML
in the medical field are often directed toward diagnosis and
therapy, and as with doctors, they can make mistakes, gener-
ate delays, and sometimes lead toward wrong therapies, run-
ning into major legal and economic problems. For this reason,
the use of ML in the medical field is still limited to support-
ing diagnosis, providing for the review of outputs by human
experts.
Another very important aspect of the application of ML in
medicine concerns the equity of data. By their nature, and cer-
tainly not by intention, artificial intelligence algorithms can be
highly discriminatory against minorities, generating deep ethical
issues [92,93]. Given their heavy dependence on available data, it
is clear that their performance will be very good for people from
well-represented populations and very bad, misleading and even
dangerously wrong for people from underrepresented populations
[94]. With most of the medical data repositories coming from
hospitals in rich, industrialized countries, developing countries
risk finding themselves completely shut out of new medical
advances if they rely heavily on ML. This is certainly a problem
that needs attention and prompt solutions.

3.4.4 Perspectives in Personalized Medicine

The synergy between artificial intelligence and precision


medicine is revolutionizing the concepts of diagnosis, prognosis,
and assistance [95, 96]. Conventional symptom-based treatment
of patients is slowly giving way to more holistic approaches
in which aggregate data and biological indicators of each
patient are combined with specific observations and general
patterns inferred from artificial intelligence approaches on large
numbers of patients. Genetics, genomics, and precision medicine,
combined with machine learning and deep learning algorithms,
make it possible to generate a personalized therapy even for
individuals with less common therapeutic responses or with
particular medical needs [97–99].
58 R. Buongiorno et al.

In general, the shared adoption of the EHR (Electronic Health


Record) format for storing clinical data has allowed for the
systematic collection of a great deal of information on the health
of individuals in digital format. This format has greatly facilitated
the use of ML tools in the medical field, thanks to the uniformity
of data and the ease of retrieval and use. Other projects born with
the aim of facilitating the application of ML in the medical field
are the health databases containing data of millions of individuals,
such as the All of US Research Program Project, the Human
Genome Project, the UK Biobank, the IARC Biobank, and the
European Biobank.
To conclude our contribution on the innovative, indeed, rev-
olutionary perspectives of the application of ML in medicine,
we would like to mention the Digital Twins, models that mimic
the biological background of a patient as much as possible,
making it possible to test drugs, therapies, treatments, maximizing
the results, and minimizing the risks on the patient’s health
[100].

3.5 Conclusions

The introduction of Artificial Intelligence has upset all fields of


research, necessarily medicine as well. It has changed our percep-
tion of diseases, of treatments, our relationship with doctors, it has
changed procedures, and it has opened many doors and brought
to the table issues and problems that we had not really thought
about yet. We consider ML a huge opportunity to improve more
or less everything we can operate on, but it is a tool to understand,
to handle with care, and to use with attention. In this chapter,
we have provided some basic indications to navigate the ocean
of ML, especially for non-experts. We have described the most
used methods, giving guidance on when to use them, and we have
provided plenty of bibliography, so you know where to go further.
We have tried not so much to provide answers, but rather to help
understand what are the right questions to ask when you want to
use ML in the medical field, which is already a very important
starting point.
3 Introduction to Machine Learning in Medicine 59

USEFUL GLOSSARY

. Accuracy: Measure of the algorithm ability in giving


correct predictions.
. Algorithm: Any systematic calculation scheme or
procedure.
. Classification: Learning process in which the
data are divided into two or more classes and the system
assigns one or more classes among the available ones to
an input.
. Clustering: Learning process in which a set of data
is divided into groups that are not known a priori.
. Features: Interesting parts, qualities or characteris-
tics of something.
. Layer: Collection of nodes operating together.
. Model: Formal representation of knowledge related to
a phenomenon.
. Normalisation: Process of feature transformation
to obtain a similar scale.
. Neural Networks: Computational model made of
artificial neurons, vaguely inspired by the simplification
of a biological neural network.
. Node: Computational unit (also called artificial neu-
ron) which receives one or more inputs and composes
them to produce an output.
. Overfitting: Excessive reliance of the model on
training data, leading to inability to generalise and
evaluate other data well.
. Pre-processing: Adjusting data before it is used
in order to ensure or improve performance in the data
mining process.
. Regression: Learning process similar to classifica-
tion, with the difference that the output has a continuous
domain and not a discrete one.
(continued)
60 R. Buongiorno et al.

. Training: the process of creating a model from the


training data. The data is fed into the training algorithm,
which learns a representation for the problem, and
produces a model. Also called “learning”.
. Training Set: Set of data used in input during the
learning process to fit the parameters.
. Test Set: Set of data, independent to the training set,
used only to assess the performances of a fully specified
classifier or a regressor.
. Validation Set: Set of data used to tune the param-
eters and to assess the performances of a classifier or a
regressor. It is sometimes also called the development
set (dev set).
. Weights: Parameters within a neural network cali-
brating the transformation of input data.

References
1. Wartman S, Combs C. Medical education must move from the informa-
tion age to the age of artificial intelligence. Acad Med. 2018;93:1107–9.
2. Obermeyer Z, Lee T. Lost in thought – the limits of the human mind and
the future of medicine. N Engl J Med. 2017;377(13):1209–11.
3. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li
B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine
learning in drug discovery and development. Nat Rev Drug Discov.
2019;18:463–77.
4. Valletta JJ, Torney C, Kings M, Thornton A, Madden J. Applica-
tions of machine learning in animal behaviour studies. Anim Behav.
2017;124:203–20.
5. Recknagel F. Applications of machine learning to ecological modelling.
Ecol Modell. 2001;146:303–310.
6. Garg A, Mago V. Role of machine learning in medical research: a survey.
Comput Sci Rev. 2021;40:100370.
7. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J
Med. 2019;380:1347–1358.
3 Introduction to Machine Learning in Medicine 61

8. Erickson B, Korfiatis P, Akkus Z, Kline T. Machine learning for


medical imaging. Radiographics Rev Publ Radiol Soc North Am Inc.
2017;37(2):505–15.
9. Poplin R, Varadarajan A, Blumer K, Liu Y, McConnell M, Corrado G,
Peng L, Webster D. Prediction of cardiovascular risk factors from retinal
fundus photographs via deep learning. Nat Biomed Eng. 2018;2:158–64.
10. Rumsfeld J, Joynt K, Maddox T. Big data analytics to improve cardiovas-
cular care: promise and challenges. Nat Rev Cardiol. 2016;13:350–9.
11. Ramalingam V, Dandapath A, Raja M. Heart disease prediction using
machine learning techniques: a survey. Int J Eng Technol. 2018;7:684.
12. Hsich E, Gorodeski E, Blackstone E, Ishwaran H, Lauer M. Identi-
fying important risk factors for survival in patient with systolic heart
failure using random survival forests. Circul Cardiovas Qual Outcomes.
2011;4:39–45.
13. Zhou M, Scott J, Chaudhury B, Hall L, Goldgof D, Yeom K, Iv M, Ou Y,
Kalpathy-Cramer J, Napel S, Gillies R, Gevaert O, Gatenby R. Radiomics
in brain tumor: image assessment, quantitative feature descriptors, and
machine-learning approaches. Am J Neuroradiol. 2018;39:208–16.
14. Setio A, Ciompi F, Litjens G, Gerke P, Jacobs C, Riel S, Wille M,
Naqibullah M, Sánchez C, Ginneken B. Pulmonary nodule detection in ct
images: false positive reduction using multi-view convolutional networks.
IEEE Trans Med Imag. 2016;35:1160–9.
15. Sidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: a
practical introduction. BMC Med Res Methodol. 2019;19(1):64. PMID:
30890124; PMCID: PMC6425557. https://doi.org/10.1186/s12874-019-
0681-4.
16. Koh DM, Papanikolaou N, Bick U, Illing R, Kahn CE Jr, Kalpathi-Cramer
J, Matos C, Martí-Bonmatí L, Miles A, Mun SK, Napel S, Rockall A,
Sala E, Strickland N, Prior F. Artificial intelligence and machine learning
in cancer imaging. Commun Med (Lond). 2022;2:133. PMID: 36310650;
PMCID: PMC9613681. https://doi.org/10.1038/s43856-022-00199-0.
17. Zerouaoui H, Idri A. Reviewing machine learning and image processing
based decision-making systems for breast cancer imaging. J Med Syst.
2021;45:1–20.
18. Bi W, Hosny A, Schabath M, Giger M, Birkbak N, Mehrtash A, Allison T,
Arnaout O, Abbosh C, Dunn I, Mak R, Tamimi R, Tempany C, Swanton
C, Hoffmann U, Schwartz L, Gillies R, Huang R, Aerts H. Artificial
intelligence in cancer imaging: clinical challenges and applications. Ca.
2019;69:127–57.
19. Liao C, Li S. A support vector machine ensemble for cancer classification
using gene expression data. In: International symposium on bioinformat-
ics research and applications; (2007).
20. Zhang F, Kaufman H, Deng Y, Drabier R. Recursive SVM biomarker
selection for early detection of breast cancer in peripheral blood. BMC
Med Genom. 2013;6:S4–S4.
62 R. Buongiorno et al.

21. Kircher M, Witten D, Jain P, O’Roak B, Cooper G, Shendure J. A general


framework for estimating the relative pathogenicity of human genetic
variants. Nat Genet. 2014;46:310–5.
22. Vlahou A, Schorge J, Gregory B, Coleman R. Diagnosis of ovarian
cancer using decision tree classification of mass spectral data. J Biomed
Biotechnol. 2003;2003:308–14.
23. Hsu Y, Huang P, Chen D. Sparse principal component analysis in cancer
research. Transl Cancer Res. 2014;3(3):182–90.
24. Chen L, Li H, Xie L, Zuo Z, Tian L, Liu C and Guo X. Editorial: big data
and machine learning in cancer genomics. Front. Genet. 2021;12:749584.
https://doi.org/10.3389/fgene.2021.749584.
25. Pan XM. Multiple linear regression for protein secondary structure
prediction. Proteins. 2001;43(3):256–9. PMID: 11288175. https://doi.
org/10.1002/prot.1036.
26. Taguchi Y, Okamoto A. Principal component analysis for bacterial
proteomic analysis. In: 2011 IEEE international conference on bioinfor-
matics and biomedicine workshops (BIBMW); 2011. p. 961–3.
27. Cao C, Liu F, Tan H, Song D, Shu W, Li W, Zhou Y, Bo X, Xie Z.
Deep learning and its applications in biomedicine. Genomics Proteomics
Bioinformatics. 2018;16:17–32.
28. Asgari E, Mofrad MRK. Continuous distributed representation of bio-
logical sequences for deep proteomics and genomics. PLoS ONE
2015;10(11): e0141287. https://doi.org/10.1371/journal.pone.0141287.
29. Mathlin J, Le Pera L, Colombo T. A census and categorization
method of epitranscriptomic marks. Int J Mol Sci. 2020;21(13):4684.
PMID: 32630140; PMCID: PMC7370119. https://doi.org/10.3390/
ijms21134684.
30. Caudai C, Galizia A, Geraci F, Pera L, Morea V, Salerno E, Via A,
Colombo T. AI applications in functional genomics. Comput Struct
Biotechnol J. 2021;19:5762–90.
31. Coelewij L, Waddington KE, Robinson GA, Chocano E, McDonnell T,
Farinha F, Peng J, Dönnes P, Smith E, Croca S, Bakshi J, Griffin M, Nico-
laides A, Rahman A, Jury EC, Pineda-Torra I. Serum metabolomic sig-
natures can predict subclinical atherosclerosis in patients with systemic
lupus erythematosus. Arterioscler Thromb Vasc Biol. 2021;41(4):1446–
1458. PMID: 33535791; PMCID: PMC7610443. https://doi.org/10.1161/
ATVBAHA.120.315321.
32. Da-Yuan, Liang Y, Yi L, Xu Q, Kvalheim O. Uncorrelated linear discrim-
inant analysis (ULDA): a powerful tool for exploration of metabolomics
data. Chemom Intell Lab Syst. 2008;93:70–9.
33. Alakwaa F, Chaudhary K, Garmire L. Deep learning accurately predicts
estrogen receptor status in breast cancer metabolomics data. J Proteome
Res. 2017;17:337–47.
3 Introduction to Machine Learning in Medicine 63

34. Khodayari, A., Maranas, C. A genome-scale Escherichia coli


kinetic metabolic model k-ecoli457 satisfying flux data for multiple
mutant strains. Nat Commun 2016;7:13806. https://doi.org/10.1038/
ncomms13806.
35. Yang H, Yu B, Ouyang P, Li X, Lai X, Zhang G, Zhang H. Machine
learning-aided risk prediction for metabolic syndrome based on 3
years study. Sci Rep. 2022;12(1):2248. PMID: 35145200; PMCID:
PMC8831522. https://doi.org/10.1038/s41598-022-06235-2.
36. Grinfeld J, Nangalia J, Baxter J, Wedge D, Angelopoulos N, Cantrill R,
Godfrey A, Papaemmanuil E, Gundem G, Maclean C, Cook J, O’Neil
L, O’meara S, Teague J, Butler A, Massie C, Williams N, Nice F,
Andersen C, Hasselbalch H, Guglielmelli P, McMullin M, Vannucchi
A, Harrison C, Gerstung M, Green A, Campbell P. Classification and
personalized prognosis in myeloproliferative neoplasms. N Engl J Med.
2018;379:1416–30.
37. Denis F, Basch E, Septans A, Bennouna J, Urban T, Dueck A, Letellier C.
Two-year survival comparing web-based symptom monitoring vs routine
surveillance following treatment for lung cancer. JAMA. 2019;321:306–
7.
38. Hasnain Z, Mason J, Gill K, Miranda G, Gill IS, Kuhn P, Newton
PK. Machine learning models for predicting post-cystectomy recur-
rence and survival in bladder cancer patients. PLoS One 2019 Feb
20;14(2):e0210976. PMID: 30785915; PMCID: PMC6382101. https://
doi.org/10.1371/journal.pone.0210976.
39. Nie D, Zhang H, Adeli E, Liu L, Shen D. 3D deep learning for multi-
modal imaging-guided survival time prediction of brain tumor patients.
Med Image Comput Comput Assist Interv. 2016;9901:212–220. PMID:
28149967; PMCID: PMC5278791. https://doi.org/10.1007/978-3-319-
46723-8_25.
40. Meiring C, Dixit A, Harris S, MacCallum NS, Brealey DA, Watkinson
PJ, Jones A, Ashworth S, Beale R, Brett S, Singer M, Ercole A. Optimal
intensive care outcome prediction over time using machine learning.
PLoS ONE 2018;13(11):e0206862. https://doi.org/10.1371/journal.pone.
0206862.
41. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep
learning in drug discovery. Drug Discovery Today. 2018;23(6):1241–50.
42. Gupta S, Chaudhary K, Kumar R, Gautam A, Nanda JS, Dhanda SK,
Brahmachari SK, Raghava GP. Prioritization of anticancer drugs against
a cancer using genomic features of cancer cells: A step towards per-
sonalized medicine. Sci Rep. 2016;6:23857. PMID: 27030518; PMCID:
PMC4814902. https://doi.org/10.1038/srep23857.
43. Hejase H, Chan, C. Improving drug sensitivity prediction using different
types of data. CPT: Pharmacometrics Syst Pharmacol. 2015;4.
44. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li
B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine
64 R. Buongiorno et al.

learning in drug discovery and development. Nat Rev Drug Discov.


2019;18:463–77.
45. Frankish K, Ramsey W. The Cambridge Handbook of Artificial Intelli-
gence. Cambridge: Cambridge University Press; 2014.
46. Smiti A. When machine learning meets medical world: current status and
future challenges. Comput Sci Rev. 2020;37:100280. https://doi.org/10.
1016/j.cosrev.2020.100280.
47. Garg A, Mago V. Role of machine learning in medical research: a survey.
Comput Sci Rev. 2021;40:100370. https://doi.org/10.1016/j.cosrev.2021.
100370.
48. Esteva A, Kuprel B, Novoa R, Ko J, Swetter S, Blau H, Thrun S.
Dermatologist-level classification of skin cancer with deep neural net-
works. Nature. 2017;542:115–8.
49. Hawkins J, Brownstein J, Tuli G, Runels T, Broecker K, Nsoesie E,
McIver D, Rozenblum R, Wright A, Bourgeois F, Greaves F. Measuring
patient-perceived quality of care in US hospitals using Twitter. BMJ Qual
Saf. 2015;25:404–13.
50. Mangasarian O, Street W, Wolberg W. Breast cancer diagnosis and
prognosis via linear programming. Oper Res. 1995;43:570–7.
51. Colantonio S, Salvati A, Caudai C, Bonino F, De Rosa L, Pascali MA,
Germanese D, Brunetto MR, Faita F. A deep learning approach for
hepatic steatosis estimation from ultrasound imaging. In: Proceedings of
ICCCI 2021 – 13th international conference on computational collective
intelligence, Rhodes, Greece; 2021. p. 703–4.
52. Ali N, Srivastava D, Tiwari A, Pandey A, Pandey AK, Sahu A. Predicting
life expectancy of hepatitis B patients using machine learning. In: IEEE
international conference on distributed computing and electrical circuits
and electronics (ICDCECE); 2022.
53. Simos N, Manikis G, Papadaki E, Kavroulakis E, Bertsias G, Marias K.
Machine learning classification of neuropsychiatric systemic lupus ery-
thematosus patients using resting-state fMRI functional connectivity. In:
2019 IEEE international conference on imaging systems and techniques
(IST); 2019. p. 1–6.
54. Liu S, Lu M, Li H, Zuo Y. Prediction of gene expression patterns with
generalized linear regression nodel. Front. Genet. 2019;10:120. https://
doi.org/10.3389/fgene.2019.00120.
55. Taylor RA, Moore CL, Cheung KH, Brandt C. Predicting urinary tract
infections in the emergency department with machine learning. PLoS
One. 2018;13(3):e0194085. PMID: 29513742; PMCID: PMC5841824.
https://doi.org/10.1371/journal.pone.0194085.
56. Leha A, Hellenkamp K, Unsöld B, Mushemi-Blake S, Shah AM,
Hasenfuß G, Seidler T. A machine learning approach for the prediction
of pulmonary hypertension. PLoS One. 2019;14(10):e0224453. PMID:
31652290; PMCID: PMC6814224. https://doi.org/10.1371/journal.pone.
0224453.
3 Introduction to Machine Learning in Medicine 65

57. Huang S, Cai N, Pacheco P, Narrandes S, Wang Y, Xu W. Applications


of support vector machine (SVM) learning in cancer genomics. Cancer
Genom Proteom. 2018;15(1):41–51.
58. Ho T. Random decision forests. In: Proceedings of 3rd international
conference on document analysis and recognition; 1995. vol. 1. p. 278–
82.
59. Zhu M, Xia J, Jin X, Yan M, Cai G, Yan J, Ning G. Class weights
random forest algorithm for processing class imbalanced medical data.
IEEE Access. 2018;6:4641–52.
60. Martin-Gutierrez L, Peng J, Thompson NL, Robinson GA, Naja M,
Peckham H, Wu W, J’bari H, Ahwireng N, Waddington KE, Bradford
CM, Varnier G, Gandhi A, Radmore R, Gupta V, Isenberg DA, Jury EC,
Ciurtin C. Stratification of patients with Sjögren’s syndrome and patients
with systemic lupus erythematosus according to two shared immune cell
signatures, with potential therapeutic implications. Arthritis & Rheuma-
tology 2021;73(9):1626–37. https://doi.org/10.1002/art.41708.
61. Seccia R, Gammelli D, Dominici F, Romano S, Landi AC, Salvetti M,
Tacchella A, Zaccaria A, Crisanti A, Grassi F, Palagi L. Considering
patient clinical history impacts performance of machine learning mod-
els in predicting course of multiple sclerosis. PLoS ONE 2020;15(3):
e0230219. https://doi.org/10.1371/journal.pone.0230219.
62. Baumgartner C, Bóhm C, Baumgartner D. Modelling of classification
rules on metabolic patterns including machine learning and expert knowl-
edge. J Biomed Inf. 2005;38(2):89–98.
63. Wang J, Shi L. Prediction of medical expenditures of diagnosed diabetics
and the assessment of its related factors using a random forest model,
MEPS 2000–2015. Int J Qual Health Care. 2020;32(2):99–112. PMID:
32159759. https://doi.org/10.1093/intqhc/mzz135.
64. Chen X, Ishwaran H. Random forests for genomic data analysis.
Genomics. 2012;99(6):323–9.
65. Mo X, Chen X, Ieong C, Zhang S, Li H, Li J, Lin G, Sun G, He F, He Y,
Xie Y, Zeng P, Chen Y, Liang H, Zeng H. Early prediction of clinical
response to etanercept treatment in juvenile idiopathic arthritis using
machine learning. Front Pharmacol. 2020;11:1164. PMID: 32848772;
PMCID: PMC7411125. https://doi.org/10.3389/fphar.2020.01164.
66. Murray S, Avati A, Schmajuk G, Yazdany J. Automated and flexible
identification of complex disease: building a model for systemic lupus
erythematosus using noisy labeling. J Am Med Inf Assoc JAMIA.
2019;26(1):61–5.
67. D’souza K, Ansari Z. Big data science in building medical data classifier
using Naïve Bayes model. In: 2018 IEEE international conference on
cloud computing in emerging markets (CCEM); 2018. p. 76–80.
68. Degroeve S, Baets B, Peer Y, Rouzé P. Feature subset selection for splice
site prediction. Bioinformatics. 2002;18(Suppl 2):S75–83.
66 R. Buongiorno et al.

69. Silla C, Freitas A. A global-model naive bayes approach to the hierar-


chical prediction of protein functions. In: 2009 Ninth IEEE international
conference on data mining; 2009. p. 992–997.
70. Sandberg R, Winberg G, Brändén C, Kaske A, Ernberg I, Cöster J.
Capturing whole-genome characteristics in short sequences using a naïve
Bayesian classifier. Gen Res. 2001;11(8):1404–9.
71. Khamis H. Application of k-nearest neighbour classification in medical
data in the context of Kenia. Digit Repositry Unimib. 2014.
72. Parry R, Jones W, Stokes T, Phan J, Moffitt R, Fang H, Shi L,
Oberthuer A, Fischer M, Tong W, Wang M. k-Nearest neighbor models
for microarray gene expression analysis and clinical outcome prediction.
Pharmacogenomics J. 2010;10:292–309.
73. Alexe G, Dalgin G, Ganesan S, DeLisi C, Bhanot G. Analysis of breast
cancer progression using principal component analysis and clustering. J
Biosci. 2007;32:1027–39.
74. Maisuradze G, Liwo A, Scheraga H. Principal component analysis for
protein folding dynamics. J Mol Biol. 2009;385(1):312–29.
75. Le T. Fuzzy C-means clustering interval type-2 cerebellar model artic-
ulation neural network for medical data classification. IEEE Access.
2019;7:20967–73.
76. Khanmohammadi S, Adibeig N, Shanehbandy S. An improved overlap-
ping k-means clustering method for medical applications. Expert Syst
Appl. 2017;67:12–8.
77. Handhayani T, Hiryanto L. Intelligent kernel K-means for clustering gene
expression. Procedia Comput Sci. 2015;59:171–7.
78. Greenspan H, Ginneken B, Summers R. Guest editorial deep learning
in medical imaging: overview and future promise of an exciting new
technique. IEEE Trans Med Imaging. 2016;35:1153–9.
79. Litjens G, Kooi T, Bejnordi B, Setio A, Ciompi F, Ghafoorian M, Laak
J, Ginneken B, Sánchez C. A survey on deep learning in medical image
analysis. Med Image Anal. 2017;42:60–88.
80. Gao X, Lin S, Wong T. Automatic feature learning to grade
nuclear cataracts based on deep learning. IEEE Trans Biomed Eng.
2015;62:2693–701.
81. Sundaram L, Gao H, Padigepati S, McRae J, Li Y, Kosmicki J, Fritzilas N,
Hakenberg J, Dutta A, Shon J, Xu J, Batzoglou S, Li X, Farh K. Predicting
the clinical impact of human mutation with deep neural networks. Nat
Genet. 2018;50:1161–70.
82. Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan
H. GAN-based synthetic medical image augmentation for increased
CNN performance in liver lesion classification. Neurocomputing.
2018;321:321–31.
83. Kamnitsas K, Ledig C, Newcombe V, Simpson J, Kane A, Menon
D, Rueckert D, Glocker B. Efficient multi-scale 3D CNN with fully
connected CRF for accurate brain lesion segmentation. Med Image Anal.
2017;36:61–78.
3 Introduction to Machine Learning in Medicine 67

84. Jain A, Patel H, Nagalapatti L, Gupta N, Mehta S, Guttula S, Mujumdar S,


Afzal S, Mittal R, Munigala V. Overview and importance of data quality
for machine learning tasks. In: Proceedings of the 26th ACM SIGKDD
international conference on knowledge discovery & data mining. 2020.
85. Dai W, Yoshigoe K, Parsley W. Improving data quality through deep
learning and statistical models. ArXiv. abs/1810.07132; 2018.
86. Luca A, Ursuleanu T, Gheorghe L, Grigorovici R, Iancu S, Hlusneac M,
Grigorovici A. Impact of quality, type and volume of data used by deep
learning models in the analysis of medical images. Inf Med Unlocked.
2022;29:100911. https://doi.org/10.1016/j.imu.2022.100911.
87. Wang Z, Poon J, Sun S, Poon S. Attention-based multi-instance neural
network for medical diagnosis from incomplete and low quality data. In:
2019 International joint conference on neural networks (IJCNN); 2019.
p. 1–8.
88. Chang Y, Yan L, Chen M, Fang H, Zhong S. Two-stage convolutional
neural network for medical noise removal via image decomposition. IEEE
Trans Instrument Meas. 2020;69:2707–21.
89. Marcinkevics R, Vogt J. Interpretability and explainability: a machine
learning zoo mini-tour. ArXiv. abs/2012.01805; 2020.
90. Samek W, Müller K. Towards explainable artificial intelligence. ArXiv.
abs/1909.12072; 2019.
91. Montavon G, Samek W, Müller K. Methods for interpreting and under-
standing deep neural networks. ArXiv. abs/1706.07979; 2018.
92. Chen I, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. Ethical
machine learning in health care. Ann Rev Biomed Data Sci. 2021;4:123–
44.
93. Yoon C, Torrance R, Scheinerman N. Machine learning in medicine:
should the pursuit of enhanced interpretability be abandoned? J Med
Ethics. 2021;48:581–5.
94. Martin A, Kanai M, Kamatani Y, Okada Y, Neale B, Daly M. Clinical use
of current polygenic risk scores may exacerbate health disparities. Nat
Genet. 2019;51:584–91.
95. Johnson K, Wei W, Weeraratne D, Frisse M, Misulis K, Rhee K, Zhao J,
Snowdon J. Precision medicine, AI, and the future of personalized health
care. Clin Transl Sci. 2020;14:86–93.
96. Quazi, S. Artificial intelligence and machine learning in precision and
genomic medicine. Med Oncol 2022;39:120. https://doi.org/10.1007/
s12032-022-01711-1.
97. Xu J, Yang P, Xue S, Sharma B, Sanchez-Martin M, Wang F, Beaty
K, Dehan E, Parikh B. Translating cancer genomics into precision
medicine with artificial intelligence: applications, challenges and future
perspectives. Hum Genet. 2019;138:109–24.
98. Grapov D, Fahrmann J, Wanichthanarak K, Khoomrung S. Rise of deep
learning for genomic, proteomic, and metabolomic data integration in
precision medicine. OMICS J Integrat Biol. 2018;22:630–6.
68 R. Buongiorno et al.

99. Hamamoto R, Komatsu M, Takasawa K, Asada K, Kaneko S. Epigenetics


analysis and integrated analysis of multiomics data, including epige-
netic data, using artificial intelligence in the era of precision medicine.
Biomolecules. 2019;10(1):62. PMID: 31905969; PMCID: PMC7023005.
https://doi.org/10.3390/biom10010062.
100. Björnsson B, Borrebaeck C, Elander N, Gasslander T, Gawel DR,
Gustafsson M, Jörnsten R, Lee EJ, Li X, Lilja S, Martínez-Enguita D,
Matussek A, Sandström P, Schäfer S, Stenmarker M, Sun XF, Sysoev O,
Zhang H, Benson, M. Digital twins to personalize medicine. Gen Med.
2019;12(1):4. PMID: 31892363; PMCID: PMC6938608. https://doi.org/
10.1186/s13073-019-0701-3.
Machine Learning Methods for
Radiomics Analysis: Algorithms 4
Made Easy

Michail E. Klontzas and Renato Cuocolo

4.1 Introduction

Radiomics analysis represents the extraction of quantitative textu-


ral data from medical images by mathematically obtaining a series
of values representing signal intensities or a variety of pixel inter-
relationship metrics [1]. Radiomics is the image-based alternative
of traditional omics methods that provide big data for biological
systems including genomics, proteomics, transcriptomics, and
metabolomics [2]. In the case of radiomics data can be extracted
from any kind of medical image including X-rays, ultrasound, CT,
MRI, and PET in an attempt to obtain a detailed representation of
image characteristics that cannot be seen with the bare eye of a

M. E. Klontzas ()
University Hospital of Heraklion, Heraklion, Greece
Institute of Computer Science, Foundation for Research and Technology
(FORTH), Heraklion, Greece
R. Cuocolo
Department of Medicine, Surgery and Dentistry, University of Salerno,
Baronissi, Italy

© The Author(s), under exclusive license to Springer Nature 69


Switzerland AG 2023
M. E. Klontzas et al. (eds.), Introduction to Artificial
Intelligence, Imaging Informatics for Healthcare
Professionals, https://doi.org/10.1007/978-3-031-25928-9_4
70 M. E. Klontzas and R. Cuocolo

radiologist and that can be used for the quantitative characteriza-


tion of tissues and the identification of novel imaging biomarkers
[3].
The use of big data for the characterization of biological
systems and disease states has emerged over the past couple of
decades because of shortcomings of single variable analysis. The
complexity of biological systems and the multitude of factors
that dictate the imaging appearance of a normal or diseased
tissue cannot be fully described by univariate or limited mul-
tivariate analyses. Omics analyses provide global evaluation of
the examined system/tissue/image region and overcome limita-
tions attributed to factor interactions and information loss when
examining a limited number of variables [4]. Radiomics provides
a comprehensive analysis of image components allowing the
identification of image biomarkers that cannot be seen with the
eye of radiologists but can be derived by means of mathematical
transformations of the original image [1, 5, 6].
Dealing with big data necessitates the use of sophisticated
algorithms that allow data curation and analysis to extract mean-
ingful information and allow the construction of predictive mod-
els [4]. These algorithms include a series of machine learning
methods that span the majority of steps required for radiomics
analysis from the segmentation of regions of interest, to data
curation and selection of radiomics features and the achievement
of predictions based on the extracted data [1, 6].The aim of this
chapter is to present a basic overview of the most commonly
used machine learning algorithms in radiomics manuscripts for
each of the aforementioned radiomics pipeline steps (Fig. 4.1).
This overview is primarily aimed at physicians aiming to pursue
radiomics research or wanting to acquire a basic knowledge of the
field.
4 Machine Learning Methods for Radiomics Analysis 71

Fig. 4.1 Overview of the most common machine learning methods used in
radiomics analysis (created with biorender.com)

4.2 Methods for Region of Interest


Segmentation

Once the image dataset has been constructed and appropriately


preprocessed to increase the quality of images and reduce inherent
sources of noise and bias, the tissue or lesion of interest needs to
be identified in order to extract radiomics features. This process
of selecting the appropriate region of interest (ROI) is called seg-
mentation and was traditionally performed in a manual manner,
where a radiologist would draw a line by hand at the border of
the lesion to delineate it. This manual approach is accompanied
by significant bias and can lead to significant errors since a sig-
nificant number of commonly encountered radiomics features are
sensitive to small changes in segmentation borders [1, 6]. Other
72 M. E. Klontzas and R. Cuocolo

traditional segmentation techniques include thresholding, region-


based and edge-based methods. These methods are sensitive to
noise, may require manual interaction to define a seed point, and
are sensitive to intensity inhomogeneities which are very common
in medical images.
For all the aforementioned reasons researchers are starting
to use tools for automatic AI-based lesion segmentation. These
methods reduce the bias associated with manual segmentation
which may depend on the meticulousness, the skills, and expe-
rience of the reader. Both supervised and unsupervised methods
have been used for automatic segmentation. Unsupervised meth-
ods include k-mean, hard e-mean, and fuzzy e-mean algorithms.
Supervised methods include deep learning architectures such as
encoder–decoder networks (e.g., U-Net, V-Net), regional convo-
lutional neural networks (e.g., Fast-RCNN, Faster-RCNN, Mask-
RCNN), and DeepLab models. Recently, transformers have been
also combined with U-Net architectures for image segmentation
purposes (e.g., UNETR, TransUNet).

4.2.1 R-CNN

One of the biggest breakthroughs in object detection and seg-


mentation was the development of R-CNN (regions with convo-
lutional neural networks). This method receives an input image,
extracts approximately 2000 region proposals, utilizes a CNN
to compute features for each region proposal, and finally uses a
support vector machine (SVM) classifier to classify regions to
different objects [7]. Several variations with improvements of R-
CNN have been published including Fast-RCNN, Faster-RCNN,
and Mask-RCNN. These overcome drawbacks of R-CNN such as
the training and object detection speed [8]. Mask RCNN specif-
ically creates object masks in addition to the object bounding
boxes produced by other versions of the algorithm such as Faster
R-CNN [9].
4 Machine Learning Methods for Radiomics Analysis 73

4.2.2 U-Net and V-Net

U-Net is a fully convolutional neural network which is named


after its U-shaped architecture where information through the
layers of the network flows up and down due to their symmetrical
shape. The characteristic of U-Net is that it is composed of a com-
bination of an encoder and decoder module which are symmetric.
The encoder module functions by reducing the spatial resolution
of the image and increasing the number of filters followed by the
decoder module which performs the exact opposite operation by
increasing spatial resolution and decreasing the number of filters.
A characteristic of U-Net is that each decoder incorporates a fea-
ture map derived from the corresponding encoder. This analysis
allows the U-Net to understand the input image as a whole while
identifying and segmenting the objects of interest. Ultimately,
the network results in the segmentation of the original image,
labeling each image pixel as part of either the background or the
object of interest [10]. V-Net is another type of fully convolutional
neural network similar to U-Net which was published in 2016.
V-Net performs segmentation in 3D in comparison to U-Net that
performs segmentation in 2D yielding an output where each voxel
is labeled as background or object of interest. The original V-
Net paper introduced a novel objective function for segmentation
based on the maximization of Dice coefficient [11].

4.2.3 DeepLab

DeepLab is a fully convolutional neural network published by


Google for semantic segmentation purposes with the ability to
capture information at a variety of scales. It has a structure
similar to U-net with convolutional and pooling layers that reduce
resolution while increasing feature maps followed by a sequence
of transposed convolutional layers that increase resolution while
decreasing feature maps [12]. DeepLab architectures introduced
the “atrous convolution” method that allows the network to
“understand” information at various scales allowing changes
74 M. E. Klontzas and R. Cuocolo

in kernel sizes without increasing the number of pixels to be


processed. This also offers advantages in terms of speed. A series
of DeepLab versions have been published (v1, v2, v3) with the
latest being v3+ which is a revision of v3 that includes depth-
wise separable convolution and extra batch normalization [12,
13]. Detailed description of these network structures falls beyond
the scope of this chapter.

4.3 Methods for Exploratory Data Analysis

Exploratory data analysis is the first unsupervised step to rec-


ognize groups/clusters between the samples of a dataset. This
process is important to identify data patterns and potential data
problems such as outliers skewing the data. Exploratory data
analysis starts by exploring the summary statistics of our dataset
including group member counts, mean, minimum, maximum,
standard deviation. Histograms and/or box plots are commonly
used to visualize these summary statistics.

4.3.1 Correlation Analysis

An important part of exploratory data analysis that is of great


importance for radiomics is correlation analysis. Correlation anal-
ysis is a statistical method that is used to measure the strength
and direction of the relationship between two variables. Even
though it does not represent a machine learning method per
se, its combination with clustering brings it at the interface of
classical statistics and machine learning [14, 15]. It can be used
for feature selection by identifying which features are highly
correlated with the target variable, and which features are highly
correlated with each other. In order to select between the two
types of correlation coefficient, Pearson and Spearman, one needs
to know if the data are normally distributed. In cases of normally
distributed data Pearson correlation can be used which measures
the linear association between two continuous variables. It ranges
between −1 and 1, where −1 represents a perfect negative
4 Machine Learning Methods for Radiomics Analysis 75

linear correlation, 0 represents no correlation, and 1 represents


a perfect positive linear correlation. Spearman correlation is the
non-parametric equivalent of Pearson correlation. It ranges also
between −1 and 1. Analyzing correlations between variables can
provide important information, such as the identification of highly
correlated and redundant features, the identification of outliers,
and the identification of relationships between variables. Such
correlation data is usually presented on correlation heatmaps [15].
It is important to keep in mind that correlation analysis only
measures linear relationships and will not be able to capture non-
linear relationships between variables.

4.3.2 Clustering

Identification of clusters in the data can be done in a supervised or


unsupervised manner. The former requires labeled data, whereas
the latter looks for relationships in the data disregarding any
labels. Since most radiomics manuscripts use supervised learn-
ing for the development of classification models in this initial
exploratory analysis step, unsupervised clustering is most suitable
to identify patterns in the data. Hierarchical clustering is one of
the most commonly used methods for unsupervised clustering
which creates a hierarchical representation of the clusters, where
each cluster is represented as a node in a tree-like structure
called a dendrogram [16]. The similarity between clusters can
be measured using different metrics such as Euclidean distance,
Manhattan distance, or cosine similarity. The choice of similarity
metric depends on the nature of data and the problem. However,
for high dimensional data such as omics data, Manhattan has
been suggested as the optimal distance metric [17]. Dendrograms
can be formed either in an agglomerative or a divisive fashion.
In cases of agglomerative clustering each data point starts as its
own cluster and clusters are iteratively merged based on their
similarity, whereas in divisive clustering all data points start in
the same cluster and this cluster is iteratively divided into smaller
clusters based on similarity. Results can be visualized using
76 M. E. Klontzas and R. Cuocolo

dendrograms and associated heatmaps which provide a visual


representation of the range of radiomics features [18].

4.3.3 Principal Component Analysis

Another way of unsupervised visualization of data patterns is


the use of dimensionality reduction techniques, such as principal
component analysis, linear discriminant analysis, and multidi-
mensional scaling. Like other omics analysis, radiomics suffers
from the “curse of dimensionality” problem, where the presence
of more features than dataset samples increases the chance of
encountering redundant information and correlations in high
dimensional datasets, obscuring data interpretation, and reducing
the performance of machine learning algorithms [19]. A way to
solve this problem is by using dimensionality reduction tech-
niques. This also allows the visualization of data relationships
on the 2D or 3D space which is not possible otherwise. The
main idea behind PCA is to project the data points onto a new
set of axes, called principal components, which are orthogonal
(and not-correlated) to each other and capture the most impor-
tant information of the dataset. A PCA graph typically shows
the data points projected onto the first two or three principal
components. The principal components are chosen such that
the first principal component explains the most variation in the
data, the second principal component explains the second most
variation, and so on. In a two-dimensional PCA graph, the first
principal component is represented on the x-axis, and the second
principal component is represented on the y-axis. Each data point
is then plotted as a point in this space, with the position of the
point indicating the values of the data point along the principal
components [20, 21].
4 Machine Learning Methods for Radiomics Analysis 77

4.4 Methods for Feature Selection

The number of features extracted by radiomics methods/software


(see Chap. 2) is usually significantly higher than the number of
data points used for the analysis. This renders model construction
significantly prone to overfitting. Therefore, a set of feature
reduction techniques are used to produce a smaller number of
valid and reproducible radiomics features that can be used for
model construction [6]. Several algorithms can be used including
regression methods (e.g., least absolute shrinkage and selec-
tion operator—LASSO), some of which are based on machine
learning methods (e.g., Boruta, recursive feature elimination,
maximum relevance—minimum redundancy). This section will
present the machine learning based feature selection methods
most commonly encountered in radiomics literature.

4.4.1 Boruta

Boruta is an algorithm for feature selection in machine learn-


ing. It is a wrapper method, which means that it uses another
algorithm (such as random forests) to evaluate the importance
of each feature. Boruta works by creating copies of each feature
in the dataset and then randomly shuffling the values of these
copies (referred to as “shadow features”). The algorithm creates
a “shadow” dataset from the original provided. The algorithm
then trains a feature selection algorithm (e.g. random forest)
utilizing both the original features and the shadow features. If
the performance of the algorithm is not affected by the shuf-
fled values of a feature, it is considered not important. Boruta
statistically compares the performance of random forests on the
original features with the performance on the shadow features.
If the performance on the original feature is significantly better
than the performance on its corresponding shadow feature, the
original feature is considered important. The algorithm repeats
this process multiple times, recording the number of times each
feature is selected as important or not important. Finally, it uses a
78 M. E. Klontzas and R. Cuocolo

threshold to decide which features are truly important [22]. One


of the most important advantages of Boruta is that it accounts for
the presence of correlated features being therefore robust against
overfitting. In addition, it yields a measure of feature importance,
enabling better model interpretation.

4.4.2 Recursive Feature Elimination

Another commonly used machine learning method for feature


selection is recursive feature elimination (RFE). RFE recursively
removes the least important features of a dataset, training a model
on the remaining features. Similar to Boruta it is also a wrapper
method, which means that it uses another algorithm (such as a
decision tree or a linear model) to build a model for the evaluation
of feature importance. RFE starts by training a model with
all the dataset features, sequentially removing the features that
contribute the least to the performance of the model. This process
is repeated, training the model on the remaining features after
each feature removal. The process stops when reaching either a
pre-specified number of features or meeting a stopping criterion.
RFE is computationally expensive since it requires training a
new model for each feature set. In addition, it is affected by
the selected model and the number of features removed at each
model-training round [23–25].
Both Boruta and RFE are widely used in radiomics literature
and are well accepted as methods. However, certain differences
can be identified between them that could be considered when
selecting which one of the two to use. The first major difference
is that RFE does not provide a measure of feature importance
compared to Boruta which can provide an importance statistic.
Another important difference is that RFE is more sensitive to
overfitting than Boruta, since feature removal depends on their
performance in the training set, whereas Boruta is robust to over-
fitting since it eliminates features by comparing the performance
of the model to a model created with the randomized versions
of the features. Moreover, the fact that RFE creates a separate
model for each elimination round renders it more computationally
4 Machine Learning Methods for Radiomics Analysis 79

expensive. Nonetheless, RFE is a simple and efficient method and


the choice between the two greatly depends on the problem, the
size of the dataset, and the computational power available [22,
25].

4.4.3 Maximum Relevance: Minimum Redundancy

In comparison to Boruta which seeks to identify all relevant fea-


tures that can be used to construct a radiomics signature [22], the
maximum relevance—minimum redundancy method (mRMR)
aims to identify the minimum number of relevant features that
when combined with each other can predict an outcome. mRMR
requires the user to set the number of features desired to be
selected. This number is usually defined empirically based on the
number of images, the model that will be subsequently used and
the computational capacity of the system. As the name implies,
mRMR attempts to select features with the maximum relevance to
the outcome but with the minimum redundancy. This is performed
by calculating a relevance (based on F-statistic) and a redundancy
metric (based on Pearson correlation). These metrics are used to
rank the features based on a score that accounts for both metrics at
each iteration of the algorithm. The feature with the highest score
at each round is selected [26]. It is worth mentioning that Boruta
and mRMR have been also combined in literature to extract all
relevant features with Boruta and then rank them with mRMR
[27, 28].

4.5 Methods for Predictive Model Construction

Once the final set of important features has been selected using
the aforementioned methods the final step is to use these features
to create predictive models. These can be classification or regres-
sion models and a set of traditional machine learning methods
can be used for this purpose. The most commonly used ones
are logistic regression, decision trees, random forests, gradient
boosting models, support vector machines (SVM), and neural
80 M. E. Klontzas and R. Cuocolo

networks. These represent supervised machine learning models,


trained and tuned on a partition of the dataset, tested on another
partition and ideally on one or more external datasets from other
institutions. Regression methods include algorithms shared with
classical statistics, such as linear and logistic regression, as well as
purely machine learning ones, such as random forest and support
vector regressors. Given linear and logistic regression stand at the
borders between statistical and machine learning models, they fall
out of the scope of this text.

4.5.1 Decision Trees

A decision tree is a non-parametric algorithm that works by


recursively partitioning the data into subsets based on the values
of the input variables. It is named after its tree-like hierarchical
structure starting with a root node and branching into internal
nodes and finally leaf nodes. The algorithm performs repeated
splitting of the dataset in a top-down fashion until it finds the
optimal split and classified most of the records in the predefined
labels. As the size of the trees increases it is extremely difficult to
maintain its purity and is prone to overfitting [29].

4.5.2 Random Forests

Random forest is an ensemble method that combines multiple


decision trees to create a “forest,” thus improving the accuracy
and robustness of the predictions. It was created based on the
assumption that combining multiple unrelated decision trees in
one model can give better predictions than each single decision
tree. Random forest uses the input dataset to create multiple
decision trees by randomly selecting subsets of the data (resam-
pling it with replacement, i.e., bootstrapping) and feature sets,
then yielding a combined final result through majority voting
(“bootstrap aggregating,” i.e., bagging) [30]. Combining several
“weak” decision trees significantly overcomes the overfitting
problem encountered in single, complex decision trees. However,
4 Machine Learning Methods for Radiomics Analysis 81

as it can be easily understood, combining several decision trees


in one ensemble model is computationally costlier than running a
single decision tree. Gini importance and variable importance can
be computed to provide an estimate of factor importance for the
resulting model [31].

4.5.3 Gradient Boosting Algorithms

Gradient boosting algorithms include some of the most successful


machine learning algorithms for model development with tabular
data. Gradient boosting includes powerful algorithms such as
XGBoost, LightBoost, AdaBoost, and CatBoost. They represent
ensemble methods that combine multiple weak learners to
improve the accuracy and robustness of the predictions. AdaBoost
first appeared in 1997 [29, 32] setting the basis of subsequent
gradient boosting models. Gradient boosting algorithms utilize a
loss function such as log-loss (classification) that is optimized and
multiple weak learners (decision trees) which provide potential
splits that are in turn being added one at a time and loss is
minimized using a gradient descent method [33]. Gradient
descent is a commonly used optimization algorithm that finds
local minima of given functions (in our case the loss function)
[34]. The most successful and commonly used gradient boosting
algorithm is XGBoost which has been used to win a series
of machine learning competitions with tabular data (a list of
competitions won with the use of XGBoost can be found here
https://github.com/dmlc/xgboost/blob/master/demo/README.
md#machine-learning-challenge-winning-solutions). XGBoost
aims to minimize a regularized objective function, representing
a convex loss function that penalizes model complexity.
Importantly, XGBoost can be scaled up using minimal resources
[35]. Computations are performed in C++ but there are R and
python packages that support the use of XGBoost with easy
commands. Gradient boosting algorithms have been widely used
in radiomics studies achieving excellent performance [36–38].
82 M. E. Klontzas and R. Cuocolo

4.5.4 Support Vector Machines

Support vector machines (SVM) is one of the most common


traditional machine learning techniques that works well with
small training sets. The aim of SVM is to find a geometrical way
to maximize the difference between data classes. This is realized
by identifying a separating hyperplane that passes between data
classes in the n-dimensional space (where n is the number of
features for each sample of the dataset), after transformation
of the dataset using a kernel function. The model then finds
the hyperplane function that maximizes the margin between the
different classes in the data [29]. SVM has been widely used in
radiomics manuscripts because of its ease of use and its excellent
generalization capacity.

4.5.5 Neural Networks

Neural networks have also been used with radiomics data but will
be extensively discussed in Chap. 6. However, it is important to
mention that even though deep learning is excellent in computer
vision tasks, it has been proven to underperform when used with
tabular data, especially given the relatively small size of datasets
typically available in medical imaging. Importantly it has been
shown that when using tabular data, methods such as ensembles
(e.g., random forest and XGBoost) may perform better than deep
learning models [39] and need less data and tuning than deep
learning models. This is the reason why other machine learning
algorithms are often preferred over deep learning when creating
radiomics predictive models in literature.

4.6 Conclusion

Radiomics has revolutionized medical imaging research by pro-


viding the imaging alternative to traditional biological omics.
Radiomics analysis is a multi-step pipeline that utilizes machine
learning methods in almost every step of the process. Basic
4 Machine Learning Methods for Radiomics Analysis 83

understanding of these machine learning algorithms is crucial


in comprehending radiomics manuscript and in selecting the
appropriate methods in radiomics research. Selection of the ideal
algorithm at each step of the pipeline depends on the applica-
tion, the dataset, and the experience of the user. In conclusion,
radiomics is a “partner in crime” with machine learning and one
needs to understand both in order to keep up with developments
in the field.

References
1. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B.
Radiomics in medical imaging—“how-to” guide and critical reflection.
Insights Imaging. 2020;11:91.
2. Yamada R, Okada D, Wang J, Basak T, Koyama S. Interpretation of omics
data analyses. J Human Gen. 2021;66(1):93–102. https://doi.org/10.1038/
s10038-020-0763-5.
3. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than
pictures, they are data. Radiology. 2016;278(2):563–77. [cited 2021 Oct
18] https://pubs.rsna.org/doi/abs/10.1148/radiol.2015151169
4. Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: an R package
for ‘omics feature selection and multiple data integration. PLoS Comput
Biol. 2017;13(11):e1005752.
5. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van
Timmeren J, et al. Radiomics: the bridge between medical imaging and
personalized medicine. Nat Rev Clin Oncol. 2017;14(12):749–62. https:/
/doi.org/10.1038/nrclinonc.2017.141.
6. Papanikolaou N, Matos C, Koh DM. How to develop a meaningful
radiomic signature for clinical use in oncologic patients. Cancer Imaging.
2020;20(1):33.
7. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierar-
chies for accurate object detection and semantic segmentation. ArXiv.
2013;1311.2524. http://arxiv.org/abs/1311.2524.
8. Girshick R. Fast R-CNN. ArXiv. 2015;1504.08083. http://arxiv.org/abs/
1504.08083.
9. He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. ArXiv.
2017;1703.06870. http://arxiv.org/abs/1703.06870.
10. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks
for biomedical image segmentation. ArXiv. 2015;1505.04597. http://
arxiv.org/abs/1505.04597.
84 M. E. Klontzas and R. Cuocolo

11. Milletari F, Navab N, Ahmadi SA. V-Net: fully convolutional neu-


ral networks for volumetric medical image segmentation. ArXiv.
2016;1606.04797. http://arxiv.org/abs/1606.04797.
12. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab:
semantic image segmentation with deep convolutional nets, atrous convo-
lution, and fully connected CRFs. ArXiv. 2016;1606.00915:1–14. http://
arxiv.org/abs/1606.00915.
13. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder
with atrous separable convolution for semantic image segmentation.
ArXiv. 2018;1802.02611:1–18. http://arxiv.org/abs/1802.02611.
14. Wu HM, Tien YJ, Ho MR, Hwu HG, Lin WC, Tao MH, et al. Covariate-
adjusted heatmaps for visualizing biological data via correlation decom-
position. Bioinformatics. 2018;34(20):3529–38.
15. Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns
and correlations in multidimensional genomic data. Bioinformatics.
2016;32(18):2847–9.
16. Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC. Evaluation and
comparison of gene clustering methods in microarray analysis. Bioinfor-
matics. 2006;22(19):2405–12.
17. Aggarwal CC, Hinneburg A, Keim DA. On the surprising behavior of
distance metrics in high dimensional space. In: van den Bussche J, Vianu
V, editors. Database theory — ICDT 2001. Berlin, Springer; 2001. p.
420–34.
18. Roux M. A comparative study of divisive and agglomerative hierarchical
clustering algorithms. J Classif. 2018;35(2):345–66.
19. Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated omics: tools,
advances, and future approaches. J Mol Endocrinol. 2018;JME-18-0055.
20. Yeung KY, Ruzzo WL. Principal component analysis for cluster-
ing gene expression data. Bioinformatics. 2001;17(9):763–74. http://
www.cs.washington.edu/homes/kayee/pca
21. Yao F, Coquery J, Lê Cao KA. Independent principal component analysis
for biologically meaningful dimension reduction of large biological data
sets. BMC Bioinf. 2012;13(1):24.
22. Kursa MB, Rudnicki WR. Feature selection with the Boruta Package. J
Stat Softw. 2010;36(11):1–13. http://www.jstatsoft.org/.
23. Guyon I, Weston J, Barnhill S. Gene selection for cancer classification
using support vector machines. Mach Learn. 2002;46:389–422.
24. Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection
methods for random forests and omics data sets. Brief Bioinform.
2019;20(2):492–503.
25. Pfannschmidt L, Hammer B. Sequential feature classification in the
context of redundancies. ArXiv. 2020;2004.00658:1–10. http://arxiv.org/
abs/2004.00658.
26. Ding C, Peng H. Minimum redundancy feature selection from microarray
gene expression data. J Bioinform Comput Biol. 2005;3(2):185–205.
4 Machine Learning Methods for Radiomics Analysis 85

27. Li ZD, Guo W, Ding SJ, Chen L, Feng KY, Huang T, et al. Identifying
key microRNA signatures for neurodegenerative diseases with machine
learning methods. Front Genet. 2022;13:880997.
28. Zhang YH, Li H, Zeng T, Chen L, Li Z, Huang T, et al. Identifying
transcriptomic signatures and rules for SARS-CoV-2 infection. Front Cell
Dev Biol. 2021;8:627302.
29. Wu X, Kumar V, Ross QJ, Ghosh J, Yang Q, Motoda H, et al. Top 10
algorithms in data mining. Knowl Inf Syst. 2008;14(1):1–37.
30. Goldstein BA, Polley EC, Briggs FBS. Random forests for genetic
association studies. Stat Appl Genet Mol Biol. 2011;10(1):32.
31. Toloşi L, Lengauer T. Classification with correlated features: unreliability
of feature ranking and solutions. Bioinformatics. 2011;27(14):1986–94.
32. Freund Y, Schapire RE. A decision-theoretic generalization of on-line
learning and application to boosting. J Comput Syst Sci. 1997;55:119–
39.
33. He Z, Lin D, Lau T, Wu M. Gradient boosting machine: a survey point
zero one technology. ArXiv. 2019;1908.06951:1–9.
34. Ruder S. An overview of gradient descent optimization algorithms.
ArXiv. 2016;1609.04747:1–14. http://arxiv.org/abs/1609.04747.
35. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. ArXiv.
2016;1603.02754:1–13. http://arxiv.org/abs/1603.02754.
36. Klontzas ME, Manikis GC, Nikiforaki K, Vassalou EE, Spanakis K,
Stathis I, et al. Radiomics and machine learning can differentiate
transient osteoporosis from avascular necrosis of the hip. Diagnostics.
2021;11:1686.
37. Chen PT, Chang D, Yen H, Liu KL, Huang SY, Roth H, et al. Radiomic
features at CT can distinguish pancreatic cancer from noncancerous
pancreas. Radiol Imaging Cancer. 2021;3(4):e210010.
38. Awe AM, van den Heuvel MM, Yuan T, Rendell VR, Shen M, Kampani
A, et al. Machine learning principles applied to CT radiomics to predict
mucinous pancreatic cysts. Abdom Radiol. 2022;47(1):221–31.
39. Shwartz-Ziv R, Armon A. Tabular data: deep learning is not all you need.
ArXiv. 2021;2106.03253:1–13. http://arxiv.org/abs/2106.03253.
Natural Language Processing
5
Salvatore Claudio Fanni, Maria Febi,
Gayane Aghakhanyan, and Emanuele Neri

5.1 Brief History of NLP

Natural language processing (NLP) is a field of artificial intelli-


gence (AI), computational linguistics, and computer science and
it is related to the interaction between natural human languages
and computers [1].
The beginning of the field is often attributed to the early
1950s as a subfield of AI and Linguistics, with the aim of
studying the problems derived from the automatic generation and
understanding of natural language. The beginning of the field is
often attributed to the early 1950s. However rudimental works
from earlier periods can be found, it was in 1950 that Alan
Mathison Turing, who was a leading cryptanalyst during World
War II at the Government Code and Cypher School in Bletchley
Park, Buckinghamshire, England, published an article entitled
“Computing Machinery and Intelligence,” where he proposed a
method “intelligence criteria” nowadays widely known as Turing

S. C. Fanni () · M. Febi · G. Aghakhanyan · E. Neri


Academic Radiology, Department of Translational Research, University of
Pisa, Pisa, Italy
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature 87


Switzerland AG 2023
M. E. Klontzas et al. (eds.), Introduction to Artificial
Intelligence, Imaging Informatics for Healthcare
Professionals, https://doi.org/10.1007/978-3-031-25928-9_5
88 S. C. Fanni et al.

test, to empirically determine whether a computer has achieved


intelligence [2]. Turning test provides a powerful, simple, trace-
able, and pragmatic tool to evaluate the ability of computer to
perform indistinguishably from a human. A necessary tenant of
the Turing test is that the computer does not have to think like a
human; rather the computer must simulate intelligence so that it
is indistinguishable from human intelligence [3].
Several cornerstone stages can be identified in the history of
NLP characterized by momentous events that drew the history of
NLP, such as machine translation, the impact of AI, adaptation
of a logico-grammatical style, and the use of massive language
data [4]. In the modern era, the NLP has undergone a renaissance
that was primarily fueled by researchers working at Google. In
2013, the Word2Vec algorithm (Google, https://code.google.com/
archive/p/word2vec/) was developed, which employed neural net-
works to learn word associations from free text without additional
input from the user. This was further developed and refined in
2018, with the advent of the bidirectional encoder representations
from transformers (BERT) language model, which builds on the
framework of Word2Vec to learn from not only the text itself but
the context in which it is used [5].
The first phase of work in NLP was a period of enthusiasm
and optimism and was focused on machine translation. Most of
the NLP research done in this period was focused on syntax,
partly because syntactic processing was manifestly necessary,
and partly through implicit or explicit endorsement of the idea
of syntax-driven processing [4]. This was also driven because
many researchers came to NLP research with a background
and established status in linguistic and language study rather
than computer science, as in more later periods. In 1964, the
U.S. National Research Council (NRC) created the Automatic
Language Processing Advisory Committee (ALPAC), whose task
was to evaluate the progress of Natural Language Processing
research. In 1966, the NRC and ALPAC initiated the first AI
and NLP stoppage, by halting the funding of research on NLP
and machine translation. However, after 12 years of research, and
about $20 million dollars of investments, machine translations
were still more expensive than manual human translations, and
5 Natural Language Processing 89

there were still no computers that came anywhere near being able
to carry on a basic conversation. The machine translation research
was almost killed by the 1966 ALPAC Report, which concluded
that machine translation was nowhere near achievement and led
to significant cut of funding [4].
The second phase of NLP was prompted by AI, with much
more emphasis on world knowledge and on its role in the con-
struction and manipulation of meaning representations. Overall, it
took nearly 14 years (until 1980) for NLP AI research to recover
from the broken expectations created by extreme enthusiasts
during the first phase of the development. Albeit the second
phase of NLP work was AI-driven and semantics-oriented, the
third phase can be described, in reference to its dominant style,
as a grammatical-logical phase. This trend, as a response to
the failures of practical system building, was stimulated by the
development of grammatical theory among linguists during the
1970s, and by the move toward the use of logic for knowledge
representation and reasoning in AI. Computational grammar the-
ory became a very active area of research linked with work on
logics for meaning and knowledge representation that can deal
with the language user’s beliefs and intentions and can capture
discourse features and functions like emphasis and theme, as
well as indicate semantic case roles. Research and development
extended worldwide, notably in Europe and Japan, aimed not only
at interface subsystems but at autonomous NLP systems, as for
message processing or translation [1].
Until the 1980s, the majority of NLP systems used complex,
“handwritten” rules, however, in the late 1980s, a revolution in
NLP came about. This was the result of both the steady increase
of computational power, and the shift to machine learning (ML)
algorithms. In the 1990s, the popularity of statistical models
for natural language processes analyses rose dramatically. The
pure statistics NLP methods have become remarkably valuable.
In addition, the recurrent neural network (RNN) models have
been introduced and found their niche in 2007 for voice and
text processing. Currently, neural network models are considered
the cutting edge of research and development in the NLP’s
understanding of text and speech generation [1].
90 S. C. Fanni et al.

Nowadays, the combination of a dialog manager with NLP


makes it possible to develop a system capable of holding a conver-
sation, and sounding human-like, with back-and-forth questions,
prompts, and answers. Nevertheless, the current modern AI mod-
els are still not able to pass Alan Turing’s test, and still do not
sound like real human beings.
The NLP in the field of medical informatics and, in particu-
larly, in radiological reporting has received increasing attention
only in the recent years. A PubMed search for [natural language
processing] or [text mining] showed that 52 manuscripts were
published in 1998, compared to 1862 manuscripts in 2022 with
an overall 35.8-fold increase. Recently, the Food and Drugs
Administration (FDA) and the Centers for Disease Control and
Prevention (CDC) launched a collaborative effort for the “Devel-
opment of a Natural Language Processing (NLP) Web Service
for Structuring and Standardizing Unstructured Clinical Informa-
tion.” This project aims to create a NLP Platform for clinical
text that will be extensible for many different subdomains [6].
The overall plan is to perform the necessary development for
maximizing the use of existing tools and filling certain gaps. It
should be noted that NLP is a generic term involving a wide
variety of techniques to account for the complexity of language,
and it has humbler origins dating much further back [7]. When
the phrases “natural language processing” and “radiology” are
included in the PubMed search, it yielded only 7 manuscripts in
1998, while 135 manuscripts in 2022, a 19.3-fold increase. Radi-
ology is particularly suited to benefit from applications of NLP,
given that the primary mode of inter-physician and physician-to-
patient communication is by way of the radiology report. In the
following sections, we will address the NLP basics and cover the
cutting-edge applications in the field of radiology and radiological
reporting.
5 Natural Language Processing 91

5.2 Basic of Natural Language Processing

NLP encompasses any computer-based methods that analyze both


written and spoken human language to convert it into structured
and mineable data [8].
Through the combination of linguistic, statistical, and AI meth-
ods NLP can be used either to determine the meaning of a text or
even to produce a human-like response [9]. According to these
two different purposes, NLP can be categorized into two sub-
sets: natural language understanding (NLU) and natural language
generation (NLG). Natural language understanding (NLU) is the
subset of NLP dedicated to the analysis of text and speech to inter-
pretate natural language, determine the meaning, and identify the
context using syntactic and semantic analysis. Conversely, natural
language generation (NLG) focuses on producing a response in
human language based on previously analyzed data input [10].
To accomplish these tasks, different approaches have been
investigated, reflecting the above-mentioned different phases of
NLP history and not really differing from those already described
in the previous chapters for image analysis. Similarly to the
best-known radiomic pipeline, the first step of NLP analysis is
represented by segmentation, in this case meant as the identifi-
cation of section/paragraphs in the analyzed text. Each section
is further divided into sentences (sentence splitting) and words
(tokenization). Before starting more sophisticated analysis, it is
necessary to normalize the words by determining their lexical root
(stemming), expanding abbreviation and spelling mistakes. When
normalization is completed, a syntactic analysis is carried out to
determine the part of speech of words (e.g. noun, verb, adverb,
adjective) and their dependency relations, followed by a semantic
analysis to determine the meaning of words [9].
Similarly to radiomics, the results of this preprocessing analy-
sis are defined NLP features and are used as input for subsequent
rule-based or ML-based processing steps [11]. The first NLP sys-
tems that were developed were rule-based classifiers and resemble
computer-aided-detection system. Rule-based classifier is explic-
itly programmed by experts and relies on very sophisticated
92 S. C. Fanni et al.

handcrafted or “handwritten” rules. Conversely, ML classifier is


based on rules that are automatically generated based on input
labeling. These two approaches may be combined in a hybrid
approach, where both handcrafted and automatically generated
rules are used to generate an output. In both cases, it is necessary
to perform the preprocessing step of NLP feature extraction,
which is not actually required with deep learning (DL) based
classifier [12].
DL is a subfield of ML based on artificial neural networks
(ANN), whose structure resembles that of neural cortex [13].
ANN consist of artificial neurons organized in input, hidden
computational and output layers and are classified according to
their structure [14].
Convolutional neural networks (CNN) are one of the best-
known and are widely adopted, but not exclusively, in image
analysis for detection, classification, or segmentation tasks. As
written or spoken text are a sequence of words, RNN is the ideal
ANN for NLP [15]. RNNs process sequential information and
consist of neurons connected sequentially in a long chain. In
RNNs the processed output is transferred from one neuron to the
next one, and this transfer generates a “memory.” However, just as
humans, that memory effect may lose effectiveness when facing
long sentences. Thus, long short-term memory network has been
developed, with higher effectiveness for long and complex written
text analysis [16].
Recently, DL-based NLP outperformed the performance of
traditional rule-based classifier or ML-based algorithms [17]. To
quantitatively measure the performance and compare different
systems, several metrics have been adopted, but undoubtedly
F1 score is the most frequently used. F1 score is defined as a
harmonized average of recall (sensitivity) and precision (positive
predictive value) and it is an overall measure of NLP algorithms’
performance.
Different variables affect the algorithms performance, and, as
well as for image analysis, one of the most important is the
quality of the dataset used for the training. However, beyond
the different system and approaches, it is worth noting the
importance of a standardize vocabulary allowing NLP engine
5 Natural Language Processing 93

to properly work with medical terms. To solve this problem


many biomedical lexicons are born, like the unified medical
language system (UMLS) developed by the US National Library
of Medicine. UMLS includes a large lexicon of biomedical and
general English and it integrates terms and codes from other
vocabularies (Metathesaurus) such as CPT, ICD-10-CM, LOINC,
MeSH, RxNorm, and SNOMED CT. UMLS integrates a semantic
network: each concept in the Metathesaurus is assigned one
or more semantics and they are linked together by semantic
relationships [18]. SNOMED CT (Systematized Nomenclature of
Medicine Clinical Terms) is a multilingual clinical healthcare ter-
minology maintained and distributed by SNOMED International
[19].

5.3 Current Applications of Natural Language


Processing

The development of new NLP models, nowadays empowered


by AI, results potentially in an infinite number of applications.
Contrary to common thought, NLP is already part of our everyday
life, e.g. in our computer software, in our mobile phones, but more
and more also in our hospitals. Everyday life uses of NLP are
language translation, virtual assistant, and e-mail spam detection.
As previously described, translation is one of the very first
applications of NLP and it is nowadays used to instantly translate
online from one language to another [20]. Many people use virtual
assistants (e.g. Alexa, Siri, Google Assistant) that are designed
with NLP and AI. Virtual assistants can understand written and
spoken language and answer correctly holding a conversation
with the user. NLP is used also in e-mail spam detection, to
elaborate the written text and classify the e-mail and then to
decide if it is spam or not. While e-mail spam detection uses
mostly NLU, which allows PCs to understand human language,
virtual assistants use both NLU and NLG [21].
Regarding healthcare application, the real and clinical use of
NLP is still in its beginning, but some hospitals use it with very
94 S. C. Fanni et al.

good results to facilitate and fasten the work of doctors and


researchers.
In healthcare, the use of NLP is spreading, and some hospitals
have integrated them into their systems, thanks to the large use of
electronic health records (EHR). Two of the most used application
of NLP in healthcare are “information extraction” and “infor-
mation retrieval.” Besides these, however, many applications are
being developed, and many will be discovered in the future.
Information extraction is the capability to extract structured
information using a large pool of unstructured free text. Hospitals
produce a large amount of written free-text data, which are
vital for the patient, covering his diagnosis, treatment, follow-
up, clinical and private information, and, by using an information
extraction software, it is possible to extract relevant informa-
tion for clinical decision support, evidence-based medicine or
even research [22]. MedLEE (Medical Language Extraction and
Encoding system), born in 1995 and developed by the Columbia
University of New York, is one of the first invented and used
NLP software for clinical information extraction [23]. MedLEE
can be used for diverse clinical applications like surveillance,
quality assurance, decision support, data mining, and adverse
drug event detection. MedLEE has been used to detect patients
with suspected tuberculosis and breast cancer [24, 25]. Another
American information extraction NLP engine is CTakes (Clinical
Text Analysis and Knowledge Extraction System). CTakes is open
source and developed in 2006 at Mayo Clinic and under the
Apache License [26]. A recent study used CTakes for assessing
the validation of a pneumonia diagnosis in radiology reports [27].
The aim of information retrieval, instead, is to return a set
of documents in response to a user’s query. The most common
example in everyday life is Google search [23]. In healthcare,
in a single hospital, there are different informatics systems that
have to communicate with each other, like the reservation system,
the picture and archiving communication system or the radiology
information system, and there may not be a single user interface
to search across all of them. The use of NLP to retrieve the
right data at the right time can simplify clinical practice and
research work, as it has been done with CogStack. CogStack is an
5 Natural Language Processing 95

open-source information retrieval and information extraction NLP


engine implemented in King’s College Hospital (KCH) in the UK
[28]. CogStack allowed to implement a real-time psychosis risk
detection and alerting service or a clinical decision support system
for diabetes management [29, 30].
Radiology is a branch of medicine that would significantly
benefit from the applications of NLP software due to the use
of written reports and the need to communicate with clinicians
and patients. NLP could be applied in all types of reports, both
structured report and the far more common unstructured free-
text report. Moreover, NLP can be used to automatically convert
unstructured into structured report and combine the advantage of
both the reporting style [31, 32].
Some of the main uses of NLP in radiology are information
extraction, text classification, topic modeling, simplification, and
summarization.
“Information extraction” allows to identify specific words or
phrases in millions of reports and then classifying them to answer
clinical questions.
Using “text classification” and “topic modeling,” radiology
reports can be organized by categories, like diagnosis, topics,
severity, etc. This could be very helpful, for example, to find
cohorts for clinical trials. Because radiology reports use a specific
language that can be challenging to understand for patients, NLP
finds a field of application also in simplifying reports. Moreover,
radiologists need to communicate effectively with clinicians, and
the use of “simplification” and “synthesis” applications could
simplify and accelerate clinical and therapeutic decision-making
[5].
In literature, there is an increase, especially in recent years,
of new NLP models used in radiology. A recently published
(2021) systematic review about NLP applied to radiology reports
included 164 publications from 2015 and the techniques most
used result were rule-based and ML. However, DL publication are
raising in recent years, with “recurrent neural networks” (RNN)
being the most common type of DL architecture. The embedding
models (used to convert the report texts into numbers) most
used result being Word2Vec, followed by GLOVE (global vectors
96 S. C. Fanni et al.

for words representations), FastText, ELMo (embeddings from


language models), and Bert (bidirectional encoder representations
from transformers) [33].
An example of rule-based classifier is PeFinder (Pulmonary
Embolism Finder), a tool developed in 2011 by Chapman et
al. PeFinder classifies reports based on the presence/absence
of pulmonary embolism, the temporal state (acute/chronic), the
certainty of the diagnosis, and also the quality of the exam (if
diagnostic or not) [34]. Miao et al. developed a DL-based NLP
method, to extract BI-RADS findings from breast ultrasound
reports to support clinical decisions and breast cancer research
[35]. A different use is the one from Brown et al., which uses a
ML-based software to predict the radiology resource utilization
in patients with hepatocellular carcinoma, starting with abdomen
CT exam reports, for translation to healthcare management to
improve decision-making and reduce costs [36].
Another growing use of NLP and a current challenge is its
application in “sentiment analysis,” which analyzes people’s
attitudes, opinions, but also emotions. This field relies mostly on
social media, an enormous pool of written opinions from people
around the world. For example, during the COVID-19 pandemic,
some NLP models were suggested to understand the population’s
feelings toward COVID-19 and the vaccine using comments and
tweets [37]. Moreover, with NLP it is possible to extract health
information through social media to diagnose depression, mental
health problems, or insomnia [38, 39]. Sentiment analysis is being
used also in radiology to extract radiologist opinion about the
severity and the urgency of treatment of a radiological finding
[5].

References
1. Kochmar E. Getting started with natural language processing. New York:
Simon and Schuster; 2022.
2. Turing AM. Computing machinery and intelligence. Mind.
1950;236:433–60.
3. Harnad S. Minds, machines and searle. J Exp Theor Artif Intell.
1989;1(1):5–25.
5 Natural Language Processing 97

4. Jones KS. Natural language processing: a historical review. In: Zam-


polli A, Calzolari N, Palmer M, editors. Current issues in compu-
tational linguistics: in honour of Don Walker. Dordrecht: Springer;
1994. p. 3–16. https://doi.org/10.1007/978-0-585-35958-8_1. http://
link.springer.com/10.1007/978-0-585-35958-8_1.
5. Mozayan A, Fabbri AR, Maneevese M, Tocino I, Chheang S. Practi-
cal guide to natural language processing for radiology. RadioGraphics.
2021;41(5):1446–53.
6. Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, et
al. Natural language processing systems for capturing and standardizing
unstructured clinical information: a systematic review. J Biomed Inform.
2017;73:14–29.
7. Chen P-H. Essential elements of natural language processing: what the
radiologist should know. Acad Radiol. 2020;27(1):6–12.
8. Fanni SC, Gabelloni M, Alberich-Bayarri A, Neri E. Structured reporting
and artificial intelligence. In: Fatehi M, Pinto dos Santos D, editors.
Structured reporting in radiology. Imaging informatics for healthcare
professionals. Cham: Springer; 2022. https://doi.org/10.1007/978-3-030-
91349-6_8.
9. Pons E, Braun LM, Hunink MG, Kors JA. Natural language processing in
radiology: a systematic review. Radiology. 2016;279(2):329–43. https://
doi.org/10.1148/radiol.16142770.
10. Kao A, Poteet S. Overview. In: Kao A, Poteet S, editors. Natural language
processing and text mining. New York: Springer; 2007. p. 1–7.
11. Goecks J, Jalili V, Heiser LM, Gray JW. How machine learning will trans-
form biomedicine. Cell. 2020;181(1):92–101. https://doi.org/10.1016/
j.cell.2020.03.022.
12. Cheng LT, Zheng J, Savova GK, Erickson BJ. Discerning tumor status
from unstructured MRI reports—completeness of information in existing
reports and utility of automated natural language processing. J Digit
Imaging. 2010;23(2):119–32. https://doi.org/10.1007/s10278-009-9215-
7. Epub 2009 May 30.
13. Soffer S, Ben-Cohen A, Shimon O, Amitai MM, Greenspan H, Klang
E. Convolutional neural networks for radiologic images: a radiologist’s
guide. Radiology. 2019;290:590–606.
14. Chartrand G, Cheng PM, Vorontsov E, et al. Deep learning: a primer for
radiologists. RadioGraphics. 2017;37:2113–31.
15. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG.
A simple algorithm for identifying negated findings and diseases in
discharge summaries. J Biomed Inform. 2001;34(5):301.
16. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput.
1997;9:1735–80.
17. Ruder S. NLP’s ImageNet moment has arrived. 2018. https://
thegradient.pub/nlp-imagenet/. Accessed Mar 2021.
98 S. C. Fanni et al.

18. Lindberg C. The unified medical language system (UMLS) of the national
library of medicine. J Am Med Rec Assoc. 1990;61(5):40–2.
19. Millar J. The need for a global language – SNOMED CT introduction.
Stud Health Technol Inform. 2016;225:683–5.
20. Khurana D, Koli A, Khatter K, Singh S. Natural language processing:
state of the art, current trends and challenges. Multimed Tools Appl. 2022.
https://doi.org/10.1007/s11042-022-13428-4.
21. Garg P, Girdhar N. A systematic review on spam filtering tech-
niques based on natural language processing framework. In: 2021
11th international conference on cloud computing, data science
& engineering (confluence). 2021. p. 30–5. https://doi.org/10.1109/
Confluence51648.2021.9377042.
22. Malmasi S, Hosomura N, Chang L-S, Brown CJ, Skentzos S, Turchin A.
Extracting healthcare quality information from unstructured data. AMIA
Annu Symp Proc. 2017;2017:1243–52.
23. Iroju OG, Olaleke JO. A systematic review of natural language processing
in healthcare. BMC Med Inform Decis Mak. 2015;21:179. https://doi.org/
10.5815/ijitcs.2015.08.07.
24. Jain NL, Knirsch CA, Friedman C, Hripcsak G. Identification of sus-
pected tuberculosis patients based on natural language processing of
chest radiograph reports. In: Proceedings: a Conference of the American
Medical Informatics Association. AMIA Fall Symposium. 1996. pp. 542–
6.
25. Jain NL, Friedman C. Identification of findings suspicious for breast
cancer based on natural language processing of mammogram reports.
Proceedings: a Conference of the American Medical Informatics Asso-
ciation. AMIA Fall Symposium. 1997. pp. 829–33.
26. Savova GK, et al. Mayo clinical text analysis and knowledge extraction
system (cTAKES): architecture, component evaluation and applications.
J Am Med Inform Assoc. 2010;17(5):507–13. https://doi.org/10.1136/
jamia.2009.001560.
27. Panny A, et al. A methodological approach to validate pneumonia
encounters from radiology reports using natural language processing.
Methods Inf Med. 2022;61(1–2):38–45. https://doi.org/10.1055/a-1817-
7008.
28. Jackson R, et al. CogStack – experiences of deploying integrated infor-
mation retrieval and extraction services in a large National Health Service
Foundation Trust hospital. BMC Med Inform Decis Mak. 2018;18(1):1–
13. https://doi.org/10.1186/s12911-018-0623-9.
29. Wang T, et al. Implementation of a real-time psychosis risk detection and
alerting system based on electronic health records using CogStack. J Vis
Exp. 2020;159:60794. https://doi.org/10.3791/60794.
5 Natural Language Processing 99

30. Patel D, et al. An implementation framework and a feasibility evaluation


of a clinical decision support system for diabetes management in sec-
ondary mental healthcare using CogStack. BMC Med Inform Decis Mak.
2022;22(1):100. https://doi.org/10.1186/s12911-022-01842-5.
31. Spandorfer A, Branch C, Sharma P, et al. Deep learning to convert
unstructured CT pulmonary angiography reports into structured reports.
Eur Radiol Exp. 2019;3:37. https://doi.org/10.1186/s41747-019-0118-1.
32. Fanni SC, Colligiani L, Spina N, Colasanti G, Gabelloni M, Cioni D, et
al. Current knowledge of radiological structured reporting. J Radiol Rev.
2022;9:93–9. https://doi.org/10.23736/S2723-9284.22.00189-1.
33. Casey A, et al. A systematic review of natural language process-
ing applied to radiology reports. BMC Med Inform Decis Mak.
2021;21(1):179. https://doi.org/10.1186/s12911-021-01533-7.
34. Chapman BE, Lee S, Kang HP, Chapman WW. Document-level clas-
sification of CT pulmonary angiography reports based on an extension
of the ConText algorithm. J Biomed Inform. 2011;44(5):728–37. https://
doi.org/10.1016/j.jbi.2011.03.011.
35. Miao S, et al. Extraction of BI-RADS findings from breast ultrasound
reports in Chinese using deep learning approaches. Int J Med Inform.
2018;119:17–21. https://doi.org/10.1016/j.ijmedinf.2018.08.009.
36. Brown AD, Kachura JR. Natural language processing of radiology reports
in patients with hepatocellular carcinoma to predict radiology resource
utilization. J Am Coll Radiol. 2019;16(6):840–4. https://doi.org/10.1016/
j.jacr.2018.12.004.
37. Sv P, et al. Twitter-based sentiment analysis and topic modeling of social
media posts using natural language processing, to understand people’s
perspectives regarding COVID-19 booster vaccine shots in India: crucial
to expanding vaccination coverage. Vaccine. 2022;10:11. https://doi.org/
10.3390/vaccines10111929.
38. Doan S, Yang EW, Tilak SS, Li PW, Zisook DS, Torii M. Extracting
health-related causality from twitter messages using natural language
processing. BMC Med Inform Decis Mak. 2019;19(3):79. https://doi.org/
10.1186/s12911-019-0785-0.
39. Patel R, et al. Frequent discussion of insomnia and weight gain with
glucocorticoid therapy: an analysis of Twitter posts. NPJ Digit Med.
2018;1:20177. https://doi.org/10.1038/s41746-017-0007-z.
Deep Learning Fundamentals
6
Eleftherios Trivizakis and Kostas Marias

Abbreviations
AE autoencoder
AI Artificial intelligence
ANN Artificial neural networks
CAD/CADe Computer-aided diagnosis/detection
CLAHE Contrast-limited adaptive histogram equalization
CNN Convolutional neural networks
CT Computed tomography
DL Deep learning
DrCNN Denoising residual convolutional neural network
FL Federated learning

E. Trivizakis ()
Computational BioMedicine Laboratory (CBML), Institute of Computer
Science (ICS), Foundation for Research and Technology-Hellas (FORTH),
Heraklion, Greece
e-mail: [email protected]
K. Marias
Computational BioMedicine Laboratory (CBML), Institute of Computer
Science (ICS), Foundation for Research and Technology-Hellas (FORTH),
Heraklion, Greece
Department of Electrical and Computer Engineering, Hellenic
Mediterranean University, Heraklion, Greece
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature 101


Switzerland AG 2023
M. E. Klontzas et al. (eds.), Introduction to Artificial
Intelligence, Imaging Informatics for Healthcare
Professionals, https://doi.org/10.1007/978-3-031-25928-9_6
102 E. Trivizakis and K. Marias

GAN Generative adversarial network


GPU Graphics processing unit
HE Histogram equalization
HU Hounsfield units
ML Machine learning
MRI Magnetic resonance imaging
PET Positron emission tomography
ROI Region of interest
TL Transfer learning
VAE Variational autoencoder
VGG Visual geometry group
ViT Visual transformer
XAI Explainable artificial intelligence

6.1 Deep Learning in Medical Imaging


6.1.1 Key Concepts

The field of medical image analysis has recently shifted its focus
from traditional “hand-crafted” image processing and simple
statistical models to the cutting-edge technique of deep learning
analysis. Medical professionals can be potentially benefited by
the accurate lesion identification, segmentation of regions of
interest, progression tracking, and categorization of pathological
anatomical structures to aid them in clinical practice. Therefore,
it is crucial for healthcare to adopt DL-based applications for
the aforementioned tasks, since they can provide overburdened
doctors with more agency and facilitate swift decision-making in
the highly demanding clinical environment.
The data analysis in deep neural networks follows a hierar-
chical architecture that progressively identifies hidden patterns
inside the examined region of interest [1] and can potentially
correlate those with clinical outcomes. Like biological neurons,
artificial neurons receive a number of inputs, perform some sort
of computation, and then output the results. In each neuron, a
straightforward computation is performed, including a nonlinear
activation function and a mechanism for sparse feature coding.
Some typical nonlinear activation [2] functions of ANN include
the sigmoid transformation, hyperbolic tangent, and the com-
6 Deep Learning Fundamentals 103

monly used rectified linear unit. Subsequently, salient elements of


the imaging examination can be isolated and enhanced for certain
tasks (such as classification or detection), while less significant
ones would be suppressed.
During the supervised learning process, each training image
is assigned a label, and the model’s parameters are tuned by
utilizing the prediction error of image-label pairings. A DL model
is able to uncover complicated patterns in specialized datasets
by utilizing a method called back-propagation [3] to demonstrate
how a machine should adjust the internal weights needed to
calculate representations in each layer based on the feature in
the preceding layer. In a similar way, in unsupervised learning,
deep models learn hidden features or underlying patterns even
in the absence of labels by reconstructing the input image into
the output. Contrastive learning [4] uses a loss that is based on
a similarity metric, aiming to maximize similar feature pairings
and eliminate dissimilar pairs. Therefore, the trained model can
identify discriminative features in the targeted domain.
In this context, DL models are greatly adaptable and train-
able universal approximators of data distributions that allow
establishing a relationship between the input data, including
medical images, and clinical outcomes. In particular, these types
of models are a collection of merely complicated mathematical
transformations consisting of millions of learnable parameters.
Their performance accuracy is highly dependent on the quality
of dataset, data analysis method, and training approach [5].
The rapidly evolving topic of deep learning has shown remark-
able potential in several areas of medical image analysis. These
areas include disease classification, pixel-based segmentation,
detection, and image registration in anatomical regions such as the
nervous system [6], liver [7], lungs [8], breast [9], and bones [10].
Significant research has been made on this topic, however, there
are still numerous potential technological difficulties or pitfalls
to overcome before DL-based CAD schemes with high scientific
consistency can be widely adopted in a clinical setting.
One of the most significant drawbacks of using deep learning
for medical image analysis is that medical institutions do not
always make accessible adequate imaging examinations [11] that
104 E. Trivizakis and K. Marias

could be used for model convergence due to privacy concerns,


and challenges in data ownership and patient protection. Other
problems in DL for medical imaging include the lack of stan-
dardization of medical terminology or knowledge for clinical
endpoints, access to well-defined and descriptive information for
data, laborious image annotation processes, the lack of extensive
clinical trials for assessing the impact of DL in a clinical setting,
and the lack of established criteria for quality, quantity, specificity,
and labeling protocols for imaging data.

6.1.2 DL Architectures for Medical Image Analysis

Convolutional neural network (CNN) This type of network


was first built by LeCun et al. [12] as an end-to-end image analysis
model for identifying handwritten digits. The core concept of this
deep model is to exhaustively process and integrate feature maps
to infer nonlinear correlations between the input data and the
known ground truth. A set of convolutions with adaptive filters is
incorporated hierarchically into the feature extraction part of the
model, while the feature selection, reduction, and classification
of these imaging features are performed on the neural part of the
model, as depicted in Fig. 6.1a. In medical image analysis, CNNs
are widely used for the fully-automated and end-to-end analy-
sis of high-dimensional imaging data, offering robust modeling
for challenging clinical endpoints. Popular architectures include
VGG [13], Inception [14], ResNet [15], and DenseNet [16] and
are mainly utilized as pre-trained models.

CapsuleNet A capsule is a collection of neurons that represents


a component of the input subject by activating a subspace of its
features. The CapsuleNet [17] is composed of separate groups
of capsules, as opposed to kernels or filters in CNNs, which
propagate information to successively higher-level capsules via
the routing by agreement procedure. This paradigm is one of the
most recent developments in the field of deep learning and has
6 Deep Learning Fundamentals 105

Fig. 6.1 Deep learning architectures: (a) convolutional neural network, (b)
artificial neural network, (c) autoencoder

not yet been thoroughly evaluated by data scientists and medical


practitioners.

AutoEnconders (AE) By recreating their input, AEs learn a


compact representation of the examined distribution. After each
hidden layer, the encoder component of the AE contains fewer
neurons per layer, thereby decreasing the dimensionality of the
input image. The complexity of the architecture and the layout
of the neurons differentiate a basic ANN (Fig. 6.1b) from an
AE (Fig. 6.1c). Back-propagating the reconstruction error, the
decoder reconstructs from the learned latent space an estimation
of the original image. AEs are self-supervised deep models that
are widely used in medical imaging for pre-training of deep
models, feature extraction, and synthetic image generation.
106 E. Trivizakis and K. Marias

Visual Transformers (ViT) Transformers have been effectively


translated to a number of computer vision applications, delivering
state-of-the-art performance and challenging the dominance of
CNNs as the de facto choice for image analysis tasks [18]. Using
these advancements in computer vision, visual transformers have
been applied on medical imaging because they can better exploit
global dependencies in data as opposed to other architectures that
leverage local features from a narrow receptive field. Furthermore,
a lot of time and effort has been invested in integrating attention
mechanisms into CNN-based architectures. These hybrid CNN-
transformer models have emerged as an alternative to ViT but
promising option, mainly because of their capacity to both encode
global dependencies and obtain highly discriminative feature
distributions.
In particular, these attention mechanisms were inspired by
the biological primary cortex of mammals, which isolates only
the relevant sensory information for scene interpretation. The
transformer architecture uses self-attention to capture the global
interdependence of labeled data without the need for sequential
computing. These neural network layers (attention blocks) cap-
ture information from the full input sequence. Attention modules
can focus on adaptively learned localizations of interest points,
so model predictions are based on the most relevant visual
information and attributes. Soft and hard are two categories of
attention based on how image localizations are addressed. The
former learns a weighted average of all characteristics, whereas
the latter samples from a uniform subset.
A typical algorithm for training a ViT will initially extract
fixed-size patches from the original examination, transform these
patches into vectors that will be encoded into compact embed-
dings, and finally adapt the ViT encoder on the embedding
sequences for the downstream classification task.

Generative Adversarial Networks (GAN) This novel deep


model framework has attracted interest for its capacity to produce
synthetic data that matches the real data after learning its
distribution. Synthetic data can be quite beneficial in certain
cases, such as training deep models on size-limited datasets and
6 Deep Learning Fundamentals 107

potentially alleviating biases related to distribution imbalances by


augmenting the available samples of the minority class [19].
It is extremely difficult to generate synthetic data for the
varying medical imaging modalities in oncology due to the sub-
stantial biological diversity that leads to various genetic subtypes
and the physical composition of neoplasms [20]. Furthermore,
medical imaging has specific properties and intricacies, like the
high variability of measurable signals that depend on the scanner
manufacturers, acquisition settings, and many other confounding
variables. By increasing the heterogeneity and volume of the
analyzed sample distributions, a well-fitted generative model has
the potential to overcome some of these disadvantages.
In particular, GAN is a methodology for fitting generative
models that pulls samples from the targeted distribution without
defining a probability distribution. A pair of models, generator
G and discriminator D, is adapted in an adversarial manner. The
generative model G is fed a noise vector, sampled from a Gaussian
distribution, with the objective of matching it to the desired
distribution. The resulting new samples are supposed to mimic
data from an existing distribution. The discriminating network
D assigns a probability that a sample originates from a known
set instead of being a product of synthesis from G. Training the
pair of deep models is like a two-player minimax game. The
discriminator D is trained to maximize the probability of labeling
synthetic and genuine samples correctly, while the generator G
maximizes the error rate of the D model. G will eventually create
real samples by capturing the underlying data distribution via the
adversarial method. The architecture of a typical GAN is shown in
Fig. 6.2. The deep generative modeling is a major methodology
for overcoming significant drawbacks that make collecting data
challenging, like the disease subtype rarity, privacy concerns at
clinical sites, and the supply of imaging data from different
scanner vendors.
108 E. Trivizakis and K. Marias

Fig. 6.2 A high-level illustration of a generative adversarial network. The


generator G creates synthetic samples (red), and the discriminator D assigns
a probability to the input image for being either real (green) or generated. The
adversarial loss L estimates the distance between the two distributions

6.1.3 Cloud Computing for Deep Learning

Because of the sheer volume of imaging data, it is imperative to


employ specialized analytics infrastructure for predictive mod-
eling and demanding computational tasks in order to process
the high-dimensional medical data. The rise of cloud computing
platforms from major tech giants such as Alphabet, Amazon,
and Microsoft, in addition to large data repositories [21], will
greatly simplify swift diagnosis and clinical outcome prediction
using advanced DL models. Success in this area would not have
been possible without the use of parallel computing (large GPU
server farms) and cloud-based infrastructure (large volumes of
storage, fast networking infrastructure, high-performance com-
puting), which have been crucial in resolving the difficulties
inherent in processing large amounts of data.
6 Deep Learning Fundamentals 109

6.1.4 DL-Based Computer-Aided Diagnosis

Medical image analysis is crucial at all phases of patient treat-


ment and monitoring lesion progression. The state of the art
in image analysis has been vastly enhanced by deep learning,
which uses many processing layers to learn representations of
data with different degrees of abstraction. In particular, medical
image applications benefit from the use of DL architectures that
automatically learn intermediate and high-level abstractions from
raw input. Computer-aided diagnosis/detection (CADx/CADe)
might benefit from DL-based analysis because the latter has the
potential to provide unbiased assessments of medical imaging
data free from interobserver variability. Demystifying the DL’s
black-box functionality by uncovering associations and underly-
ing connections with patient data will be crucial to the adoption
of this technology in clinical practice. Therefore, a CADx/CADe
system should intelligently convey suggestions based on AI
estimations to physicians, link the findings with other patients’
data and their overall health state, and provide more justification
whenever clinicians have doubts regarding the proposed recom-
mendations. Using explainability techniques for DL will allow
the diagnosis system to deliver interpretable smart recommenda-
tions to clinicians, establishing a trustworthy framework. Future
CAD/CADe systems should also be able to interpret multimodal
data simultaneously, therefore mirroring clinicians’ reasoning,
who likewise consider a variety of sources prior to treating
patients [22].
The conventional function of medical practitioners will be
bolstered by the advent of DL-based applications in terms of
precision, repeatability, and scalability, all of which contribute to
the delivery of effective care across a wide range of geographical
areas. Moreover, medical imaging is anticipated to make signif-
icant strides forward in the new areas of deep learning model
development and deployment. Artificial intelligence (AI) can
also streamline the diagnostics and treatment decision-making
processes in the near future. In contrast, medical professionals
will have more time and energy to devote to doing what they were
110 E. Trivizakis and K. Marias

trained to do: treating patients and preventing illness, as opposed


to staring at an endless amount of raw data.

6.2 Quality and Biases of Medical Databases

It is extremely challenging for modern machine learning to


properly converge to a useful model without access to high-
dimensional data. One factor that has contributed to the rapid
development and widespread adoption of conventional DL sys-
tems is the ease with which vast quantities of human-annotated
data can be shared nowadays. However, significant difficulties
still exist in imaging data gathering, expert annotation pro-
cedures, accessibility, and availability of infrastructure; all of
which contribute to data scarcity for specialized medical cases
and, consequently, limit the efficacy of DL-based models. These
constraints during data collection frequently result in selection
bias.
A typical pattern of selection bias might occur when a sin-
gle center is providing data used for model convergence and
development, thereby leading to biases toward patient groups
that are included in the training cohort. The integration of an AI
system known to perform poorly and discriminate against patients
with traditionally underrepresented characteristics in the original
medical center’s dataset when adopted in a different institution
with a different acquisition protocol may be problematic and lack
the generalization ability of a medical device.
Data shift is a type of selection bias and is one of the
greatest challenges to DL systems’ generalization and usefulness.
Data shift often occurs due to changes in the distribution of
data used to train a DL-based model and does not precisely
represent the features of the forthcoming data that are utilized in
future deployments of the AI. These distribution shifts and biases
can be detrimental to the generalizability of AI. For example,
when attempting to redistribute DL-based systems developed in
advanced industrialized countries with an underrepresented rural
population to regions with different population characteristics,
6 Deep Learning Fundamentals 111

the AI model will most certainly have reduced efficiency and


prediction performance [5].
Due to differences across equipment or acquisition protocols,
imaging acquisition might suffer from technical bias, which is
particularly common in the field of radiology. Critical issues
including a suitable study design, a well-defined data collection
protocol, realistic goals for data availability, proper infrastruc-
ture for collecting large amounts of imaging examinations, and
outlining a quality control process are usually overlooked by
researchers while collecting data. Researchers must be committed
to mitigating the causes of poor data quality because applying DL
to such data distribution has been demonstrated to systematize
or exacerbate biases [23]. Evidence shows [24] that improving
the aforementioned problems might assist in decreasing human
bias in the clinical setting and, therefore, enhance best practices
in healthcare.
Although radiologists frequently evaluate and consider tech-
nical differences during image acquisition, like voxel spacing,
scanner model or vendor, the resulting AI is not aware of these
properties of the data distribution unless such differences were
included in the model’s convergence process. Therefore, to facil-
itate robust development of models trained with a single insti-
tution’s data and to make multi-institutional model deployments
feasible, it is necessary to establish standardized protocols and
pre-processing methodologies that allow the harmonization of
data distributions across medical centers and different scanners.
Annotating and labeling imaging data to train DL models
necessitates specialist knowledge. In spite of the importance of
this, high-quality data can be difficult to collect since human
annotations are regularly plagued by unclear boundaries or noisy
information [25]. Because of the different interpretations of med-
ical examinations, the performance of the DL analysis is deter-
mined by the quality of the annotated data. The human element in
the process of data annotation is always subjective, which can be a
significant obstacle to the success of a research project. Therefore,
annotations from different experts are likely to vary due to the
disagreements on the boundaries of the examined regions of
interest. The underlying class distribution could potentially be
112 E. Trivizakis and K. Marias

approximated or modeled more accurately because of the use of a


consensus from numerous annotations.
Data quality can be significantly improved with the stratifi-
cation of patients according to specific features or case rarity,
the isolation of data with limited availability, and image anno-
tations, which require time-consuming and laborious tasks but
also multiple expert clinicians are required to avoid interobserver
variability in the delineations of important regions, and explicitly
define criteria for selecting a certain region of interest are needed.

6.3 Pre-processing for Deep Learning


6.3.1 CT Radiation Absorption Map to Grayscale

Windowing, often referred to as gray-level mapping, is the


method of manipulating the computed tomography (CT)
grayscale component using CT Hounsfield units (HU) and other
relevant parameters of the CT. This alters the look of the image to
emphasize certain anatomical structures. The image’s luminosity
is modified by using the window level and contrast by adjusting
the window width. DL image analysis tasks are greatly affected
by HU windowing, especially when converting DICOM to other
image formats (e.g., png) [26].

6.3.2 MRI Bias Field Correction

The bias field deformity in MR imaging results in more voxel


intensity variation across scans that were acquired using the same
device and patient. In cases where the bias field and MR image
have different spatial frequencies, the bias field can be easily
removed by filtering out the spatial frequencies that represent the
magnetic field, which is a common method for correcting the scan
with distance minimization algorithms such as the Manhattan
distance and squared Euclidean distance. A few other prevalent
bias field correction approaches include the N4ITK [27] and joint
removal of bias [28].
6 Deep Learning Fundamentals 113

6.3.3 Tissue-Based Standardization

Standardization based on a reference tissue has been used exten-


sively in brain MRI tasks [29]. This type of standardization can
be applied to get a uniform pixel distribution in MR examina-
tions, which will allow tissue-specific signatures across scans and
enable improved quantification. This can be observed in Fig. 6.3,
where the original pixel intensities of each prostate MRI scan
(Fig. 6.3a) were standardized (Fig. 6.3b) using the distribution
of the fat tissue near the prostate gland as a constant.

6.3.4 Pixel Intensities Normalization

Normalization is an essential pre-processing procedure that trans-


lates the spectrum of pixel intensity data to a standard scale;
usually the new minimum will be close to zero and maximum
near one. Standardization or z-score normalization is a method of
normalization that employs the statistical features of mean and
standard deviation. DL models converge better when the input
data are standardized to have a zero mean and unit variance [30].

6.3.5 Harmonization

Harmonization strategies have been recommended as a way of


minimizing the inherent variability of medical imaging [31].
Harmonization generally aims to address the lack of unifor-
mity across medical scans and the loss of stability in imaging
characteristics. Several pre-processing techniques can be applied
for harmonization on the image level, like applying a uniform
voxel spacing across scans, filtering techniques to produce sim-
ilar spatial characteristics and noise patterns, quantitative-based
standardization, and using generative models to standardize mul-
ticenter images [32].
114 E. Trivizakis and K. Marias

Fig. 6.3 A comparison of histograms of pixel intensities calculated from the:


(a) original MRI scans, and (b) tissue-based normalized data
6 Deep Learning Fundamentals 115

6.3.6 Spacing Resampling

Across upsampling and downsampling, matrix interpolation


resizes an image from its initial pixel grid to an interpolated
grid. Several resampling techniques have been proposed and
are widely used in literature, such as the nearest neighbor,
trilinear, tricubic convolution, and tricubic spline interpolation.
This technique allows for the extraction of texture characteristics
from rotationally invariant tomographic scans. Additionally, an
isotropic voxel space across scans can potentially increase the
reproducibility of radiomics in multicenter studies with different
spatial properties [33–35].

6.3.7 Image Enhancement

There are several image enhancement methods, including his-


togram equalization and its many variants. The typical histogram
equalization (HE) method shifts the distribution of pixel intensi-
ties across a broader histogram, increasing the perceived contrast
in an imaging examination. The alternative contrast-limited adap-
tive histogram equalization (CLAHE) restricts the histogram
range of the processed image, preventing or clipping outlier pixel
values that may reduce the benefits of HE.

6.3.8 Image Denoising

To maintain a reliable quantification and high diagnostic value


in medical imaging, it is necessary to minimize the impacts
of medical equipment and imaging distortions such as light
oscillations and signal loss, while retaining the texture quality of
key tissues or regions of interest.
Gaussian noise is a form of electronic noise that originates
from the amplifier and detector components of the device and
distorts pixel distributions. Using median and other statistical-
based filters is the standard method for reducing different types of
noise from images without compromising textural information.
116 E. Trivizakis and K. Marias

Fig. 6.4 The architecture of a DrCNN model used for denoising. This model
architecture estimates the noise distribution patterns of the input dataset.
DrCNN, denoising residual convolutional neural network

Utilizing several filtering techniques [36] such as the Gaussian,


averaging, and Wiener filters can reduce noise but also introduce
a soft appearance to the image that results in losing edge informa-
tion in landmarks [37].
Deep learning-based models have significantly improved
image denoising over traditional methods like the aforementioned
average or median filtering. In particular, the internal represen-
tation of DL models is adapted to the specific data distribution
during training, whereas traditional denoising uses predetermined
methods that cannot be customized for the needs of specific
datasets. Deep learning denoising maintains texture edges and
granular details in the image. Additionally, the utilization of DL
models in medical imaging, like the DrCNN in Fig. 6.4, has
led to significant image quality enhancements while preserving
the high-frequency information and has been applied to several
anatomical areas, including scans of the liver [37], abdominal
[38], lung [39], and pelvis [40] regions.

6.3.9 Lowering Dimensionality at the Imaging Level


for Deep Learning

The DL models require constant image dimensionality for their


input, which can be challenging in oncological tasks since tumors
appear in a variety of shapes and forms but also because the
6 Deep Learning Fundamentals 117

Fig. 6.5 Two patch extraction methods are presented: (a) exhaustive with no
overlapping patches from a high resolution pathology image and (b) based on
regions of interest

imaging data themselves might have extremely high resolution


like a typical pathology image (Fig. 6.5a). Trimming unwanted
tissue in radiology data based on a segmentation mask will result
in pixel arrays with different dimensions, as shown in Fig. 6.5b.
To mitigate this problem, zero padding can be employed as the
preferred method for DL applications since zero is the neutral
digit of the convolutional operator. However, when standard-
ization is applied, an appropriate padding constant should be
selected. Normalization is performed prior to padding because it
can alter key statistical features such as the standard deviation,
maximum, and mean values of the dataset that are used to perform
this pre-processing step.
Patch extraction is an important pre-processing step for ViTs.
There are a few parameters that affect the extraction process, such
as the stride of the sliding window (overlapping), localization
based on a segmented mask of a region of interest (ROI), and
patch size. These parameters can have a negative effect on the
training process of a neural network since they can introduce
overfitting or underrepresented smaller regions of interest (e.g.,
strides larger than ROI, more patches from large tumors).
When using TL with ImageNet weights, the medical images
have to have the same pixel matrix as the used training samples.
Therefore, 3D scans should be fed to the deep model as 2D
slices. Patient-based stratification is required when these types
118 E. Trivizakis and K. Marias

of data handling are implemented because, for example, samples


(patches, 2D slices, etc.) from the same patient might be present
in both training and testing sets, introducing sample selection bias
that may result in overfitting on the model level.

6.4 Learning Strategies


6.4.1 Transfer Learning

Deep learning methods require a large amount of training data


under ideal conditions, but the level of generalizability of deep
models has a significant impact on performance of the application.
The restricted population of the available patient cohorts and the
human resources, like expert clinicians, required for annotating
these sets, as mentioned in previous sections, are well-known
hurdles in developing DL models. Transfer learning (TL) strate-
gies have been utilized in several studies to circumvent these
issues. It is common knowledge that people can accomplish
tasks that share some similarities by utilizing prior knowledge.
Deep models are also transferable between similar tasks and
can enhance performance on targeted tasks, a domain that may
have lacked the required amount of data. This has been the case
for medical imaging applications, where the efficacy of TL can
reduce computing costs and save time without compromising
prediction accuracy. Additionally, TL allows the introduction of
deeper models for medical analysis tasks. Two primary forms of
domain adaptation have been proposed in the literature: (1) fine-
tuning TL (Fig. 6.6a), the weights of the pre-trained model must
be updated for the target task via a new training procedure; (2)
“off-the-shelf” TL (Fig. 6.6b), the feature extraction component
of a trained model is used to produce imaging descriptors for use
in a separate downstream task.
In particular, “off-the-shelf” TL keeps the original convolu-
tional weights the same while discarding the fully-connected
layers, and a machine learning algorithm like support vector
machines or a Gaussian process classifier is used for the down-
stream task. Because the fine-tuning TL updates a subset of the
6 Deep Learning Fundamentals 119

Fig. 6.6 The two types of transfer learning methods that have been proposed
in the literature: (a) fine-tuning TL, the transferred weights are adapted for
the new data distribution, and (b) “off-the-shelf” TL, only the convolutional
weights are transferred to the target model for feature extraction

convolutional layers with new parameters, fine-tuning is similar


to training from scratch in terms of being time-consuming and
requiring a modest volume of data. TL has been successfully
integrated into a variety of medical image classification tasks,
such as lung lesions [41], colonic polyps [42], breast cancer [43],
tissue density calculation [44], and brain tumors [45], as well as
evaluated across a variety of other pathology imaging datasets
[46].
The model with a single fine-tuned layer for object detection
and two fine-tuned layers for the classification tasks achieved the
highest performance [47]. Fine-tuning on every layer is the most
popular TL strategy in the literature. However, this strategy does
not significantly increase the model’s performance, and it has a
higher computational cost compared to the abovementioned fine-
tuning strategies since it adapts all the layers. Therefore, gradually
updating the convolutional layers, usually by starting from the last
convolutional layer, is highly recommended.
120 E. Trivizakis and K. Marias

6.4.2 Multi-task Learning

Enhancing generalization is the goal of multi-task learning, which


does so by mixing information from many tasks (can be consid-
ered as enforcing restrictions on the model’s weights). Multi-task
learning is an effective method in situations where large amounts
of labeled input data for one task are available and can be
transferred to another task with significantly less labeled data
[48]. For instance, multi-task learning can be used in applications
where the same features might be utilized for other supervised
learning problems to predict different outcomes [49]. In this case,
the model’s feature extraction part may generalize the identical
inputs for different tasks since each output is predicted by a
separate portion of the model.

6.4.3 Ensemble Learning

Ensemble learning is a learning strategy that uses multiple models


converging simultaneously by using the same data to synergize
their predictions [50]. With ensemble learning, many models
process different data sources in parallel to provide enhanced
predictions that would be unattainable by a single and simpler
model. This involves the merging of data perspectives on different
model types used in the group as well as the fusion of pre-
trained individual models on a prediction level. This strategy may
increase the robustness of stochastic learning algorithms such
as CNNs. Common examples of group learning computations
include the bootstrap [51], weighted normal [52], and stacking
[53].

6.4.4 Multimodal Learning

The use of artificial intelligence has evolved to the point where


it is now a necessary approach for deducing information from
a high-dimensional space using a data-driven point of view in
a variety of fields. Increasing volumes of information in the
6 Deep Learning Fundamentals 121

field of medicine, particularly in oncology, might provide an


overview of the intricacies of the underlying biology of certain
lesions. Multimodal machine learning is inspired by the way that
people learn best when exposed to a variety of stimuli at once.
Currently, the majority of healthcare machine learning approaches
only consider data from one modality. Modern computer-aided
diagnosis systems should be able to handle several types of data
at once, just like human clinicians perform diagnosis for patients.
Typically, a radiologist is responsible for summarizing the
results of scans to support the physician in reaching a clinical
decision. A physician’s decision to select appropriate treatment
for a patient is based on inputs from a variety of data sources
that may include laboratory, pathology images, and multiple
modalities of radiographic scans. Therefore, it is apparent that
multimodality is an intrinsic property of healthcare data. It is
reasonable to assume that the vast majority of data produced and
amassed over the course of a patient’s lifetime will contain at least
some information useful for delivering precise and individualized
treatment.
In a clinical setting, an AI-based support system should be able
to reason with and interpret high-throughput data from multiple
imaging modalities and other sources, as illustrated in Fig. 6.7,
in order to make a clinically rational decision, just like a human
medical expert would. The utilization of high-dimensional and
high-throughput data (semantic, radiomics from varying modal-
ities, laboratory, clinical, and transcriptomics) can lead to the
discovery of composite markers with predictive properties for
assessing treatment outcomes in oncology [22].

Modality fusion The most intuitive multimodal technique is the


combination of multiple sensory inputs prior to a supervised
classification process. This is accomplished by merging different
data sources on a feature level with vector concatenation, addi-
tion, mean or maximum pooling. Fusion techniques are further
classified as early (feature level) and late (model level) fusion,
based on the stage in the analysis pipeline at which the merger is
performed.
122 E. Trivizakis and K. Marias

Fig. 6.7 An example of multimodal analysis with imaging (top) and


genomic (bottom) data in a common feature space by utilizing early fusion

Representation learning This approach is focused on acquiring


enhanced feature representations by using data from many
modalities. This process includes techniques that deal with self-
supervised learning (GAN, AE), weakly-supervised learning
(CNN), and contrastive learning. Raw imaging sequences with
multiple modalities can be integrated into a single multi-channel
data structure by preserving their spatial properties (image
processing tasks such as registration or interpolation are required)
and the importance of clinical contexts (e.g., using high b-
value diffusion MRI with T2-weighted sequences) with extensive
unlabeled data and sparse representations.

Modality translation A subcategory of multimodal learning


includes processes that translate data across modalities, such
as CT to PET [54]. This is particularly interesting since deep
generative networks are designed to learn nonlinear correlation,
6 Deep Learning Fundamentals 123

generally between an input image and the corresponding output


data. This is a promising technology for datasets with incomplete
or less relevant imaging sequences but with clinical data that
might provide solutions to unmet clinical needs.

6.4.5 Federated Learning

Modern DL models may learn tens of millions of parameters


through a training process that requires a large population to
achieve high performance in clinical scenarios and generaliz-
ability to unseen data distributions. These large quantities of
data, especially in the medical field, are quite challenging to
collect since they are sensitive in terms of privacy and are
carefully regulated in many jurisdictions to safeguard the secrecy
of the medical records. Besides, the deployment of a central-
ized infrastructure requires substantial effort that includes the
establishment of secure connections, the provision of efficient
and dependable communications among the different parties of
a centralized architecture, and the negotiation of complex data
sharing agreements and governance among multiple institutions
with varying jurisdictions. Additionally, maintaining and scaling
this type of infrastructure is also challenging. Data anonymization
may help to overcome some of these challenges; however, remov-
ing critical data information reduces the database’s usefulness for
future research. Federated learning (FL) is a strategy or platform
that allows learning across remote and separate data centers
without requiring their private data to be shared with an external
institution [55, 56].
The FL framework offers a robust environment for AI devel-
opment in the medical imaging area by exploiting existing com-
puting infrastructure and avoiding bureaucratic procedures. When
trained on limited data from a single institution, DL models
are susceptible to overfitting. Consequently, data distributions
for developing AI models must incorporate a diverse set of
cases, preferably originating from a variety of acquisition sites,
backgrounds, and demographics. Therefore, multi-institutional
patient cohorts are key to training reliable DL models.
124 E. Trivizakis and K. Marias

The contribution of remote agents, such as participating clin-


ical sites, is necessary for the development of a distributed
global predictive model that addresses unmet clinical needs [57].
The FL strategy has the potential to increase the robustness
and generalizability of a global predictive model by allowing
scalability via external agents that can provide data distributions
from different scanners and populations from geographically
distant regions with varied socioeconomic and genetic origins.
The aggregator server, acting as the orchestrator, is central to the
FL architecture because it provides all the necessary constraints
and functionality to perform vital tasks in a uniform manner
such as data pre-processing, selection of the DL architectures
and hyperparameters, evaluation protocols, and gradient distribu-
tion methods. The remote agents perform analysis based on the
distributed protocol on the locally accessible private databases.
Finally, the aforementioned agents contribute to the aggregator
server with all the parameters required to construct or refine the
global predictive model for the desired clinical outcome.

6.5 Interpretability and Trustworthiness


of Artificial Intelligence

It is a fundamental need in human nature [58] to want to compre-


hend how decisions are formed and what components motivate
them, especially when medical issues are involved. Transparency,
interpretability, and explainability are principles that are inti-
mately connected to ethics in data science and required for
establishing confidence in DL models that will be safe to deploy
and use for the benefit of patients. Interpretability is the capacity
to comprehend how an AI model generates decisions. Trans-
parency has a twofold meaning: (1) related to the way a model
is produced, and (2) the decision-making process of the model is
both observable and follows a meaningful path that is comprehen-
sible to an external observer.
The FUTURE-AI guidelines [59] were developed to provide
actionable directions for the creation, evaluation, and integration
of AI solutions for healthcare that are fair, universal, traceable,
6 Deep Learning Fundamentals 125

usable, robust, and explainable (FUTURE). The “black-box”


aspect of AI can be rather overwhelming for experts in radiology
and healthcare in general. Typically, a clinician can elaborate
on the reasoning behind a diagnosis. Likewise, procedures that
enable a certain level of traceability and explainability for DL-
based diagnostic assessments are necessary.

6.5.1 Reproducibility

The rapid advancement of computer vision is intimately tied


to the culture of research that promotes the repeatability of
experiments. In medical image analysis, an increasing num-
ber of researchers prefer to make their code accessible to the
public, which considerably aids in building strong foundations
for more complex projects and gaining the trust of a wider
community of data scientists and clinicians. A well-documented
and detailed data selection protocol is a strong indicator for
increased reproducibility of experiments and results. Therefore,
to ensure a reproducible AI system, the systematic accumulation
of metadata during the development phase includes extensive
data descriptions, the impact of various experimental settings
and hyperparameters, monitoring the performance metrics used to
evaluate the models, and detailed documentation of the complete
development cycle.

6.5.2 Traceability

The traceability aspects of an AI provide end-users (clinical


sites, clinicians, patients) with clarity of actions throughout the
development and deployment phases of a model. It is well-known
that DL models are susceptible to overfitting and memorization
of data distribution, particularly with size-limited datasets or due
to choices in specific parameters of the employed experimental
protocol. Additionally, data stratification on a subject or sam-
ple basis is crucial for fairly dividing the original dataset into
training, validation, and testing sets and is also indicative of the
126 E. Trivizakis and K. Marias

model’s validity. A platform that documents metadata records


for traceability must contain the most crucial parameters of an
experimental protocol, such as the data pre-processing protocol,
the convergence strategy of a DL model, the overall design of
a deep learning analysis, the patient cohorts used during the
different development phases (training, validation, evaluation),
and the optimal hyperparameters for ensuring repeatability or for
future reference.

6.5.3 Explainability

The explainability of an artificial intelligence (XAI) system is


interconnected with the deployment of transparency and trace-
ability in the so-called black-box DL systems. Despite the fact
that attempts to address issues related to the XAI have existed
for a number of years, there has been an extraordinary increase
in research studies over the past few years [60]. In regards to DL
model interpretability, deep saliency maps have been introduced
[61] to indicate which elements of an image the model has iden-
tified as the most relevant to the analysis of a particular clinical
outcome. This method is based on perceptual interpretability and
aims to reconstruct maps of causal dependencies among clinical
outcomes in the examined data distributions.

6.5.4 Trustworthiness

Low-quality data or poorly curated databases in radiology might


integrate and prolong socioeconomic biases that are currently
the cause of inequities in health services. A poorly fitted AI
system might produce predictions that are against treating patients
with lower income, as the model was adapted from biased
data distributions with limited representation of specific patient
groups. Consequently, the contribution of societal prejudices has
the potential to widen the inequalities in healthcare [62] when
they are not thoroughly evaluated during the system’s designing
phase. Moreover, whenever an AI system leads to unfavorable
6 Deep Learning Fundamentals 127

incidents, the developers responsible for its design must be able


to decipher and specify why and how the system reached that
decision. Finally, an AI-based CADx/CADe must fulfill some key
prerequisites in order to be consistent with trustworthy principles.
These requirements include complying with the legal system,
conforming to agreed-upon ethical norms (privacy protection,
fairness, and respect for individual rights), maintaining human
input, transparency in the behavior of the AI, and explainability
in the decision of the system.

Acknowledgments We thank Aikaterini Dovrou (FORTH) for providing the


original histograms of tissue-based normalization.

References
1. Chen X, Wang X, Zhang K, Fung KM, Thai TC, Moore K, Mannel RS,
Liu H, Zheng B, Qiu Y. Recent advances and clinical applications of deep
learning in medical image analysis. Med Image Anal. 2022;79:102444.
2. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing
human-level performance on ImageNet classification. 2015.
3. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by
back-propagating errors. Nat 1986 3236088. 1986;323:533–6.
4. Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F. A survey on
contrastive self-supervised learning. Technol 2021. 2020;9:2.
5. Luca AR, Ursuleanu TF, Gheorghe L, Grigorovici R, Iancu S, Hlusneac
M, Grigorovici A. Impact of quality, type and volume of data used by
deep learning models in the analysis of medical images. Informatics Med
Unlocked. 2022;29:100911.
6. Xia W, Hu B, Li H, et al. Deep Learning for automatic differential
diagnosis of primary central nervous system lymphoma and glioblastoma:
multi-parametric magnetic resonance imaging based convolutional neural
network model. J Magn Reson Imaging. 2021;54:880–7.
7. Trivizakis E, Manikis GC, Nikiforaki K, Drevelegas K, Constantinides M,
Drevelegas A, Marias K. Extending 2D convolutional neural networks to
3D for advancing deep learning cancer classification with application to
MRI liver tumor differentiation. IEEE J Biomed Heal Inform. 2018:1–1.
8. Asuntha A, Srinivasan A. Deep learning for lung Cancer detection and
classification. Multimed Tools Appl. 2020;79:7731–62.
9. Trivizakis E, Ioannidis G, Melissianos V, Papadakis G, Tsatsakis A,
Spandidos D, Marias K. A novel deep learning architecture outper-
forming ‘off-the-shelf’ transfer learning and feature-based methods in
128 E. Trivizakis and K. Marias

the automated assessment of mammographic breast density. Oncol Rep.


2019;42:2009–15.
10. Allegra A, Tonacci A, Sciaccotta R, Genovese S, Musolino C, Pioggia G,
Gangemi S. Machine learning and deep learning applications in multiple
myeloma diagnosis, prognosis, and treatment selection. Cancers 2022.
2022;14:606.
11. Trivizakis E, Papadakis GZ, Souglakos I, Papanikolaou N, Koumakis
L, Spandidos DA, Tsatsakis A, Karantanas AH, Marias K. Artificial
intelligence radiogenomics for advancing precision and effectiveness in
oncologic care (Review). Int J Oncol. 2020;57:43–53.
12. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W,
Jackel LD. Backpropagation applied to handwritten zip code recognition.
Neural Comput. 1989;1:541–51.
13. Simonyan K, Zisserman A. Very deep convolutional networks for large-
scale image recognition. 2014. arXiv Prepr. arXiv1409.1556.
14. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking
the inception architecture for computer vision. In: Proceedings of the
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. IEEE Computer Society; 2016. p. 2818–26.
15. He K, Zhang X, Ren S, Sun J. Deep residual learning for image
recognition. In: Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition. IEEE Computer Society;
2016. p. 770–8.
16. Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected
convolutional networks. 2016. arXiv Prepr. arXiv1608.06993.
17. Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. Adv
Neural Inf Process Syst. 2017:3856–66.
18. Shamshad F, Khan S, Zamir SW, Khan MH, Hayat M, Khan FS, Fu
H. Transformers in medical imaging: a survey. 2022; https://doi.org/
10.48550/arxiv.2201.09873.
19. Osuala R, Kushibar K, Garrucho L, Linardos A, Szafranowska Z, Klein
S, Glocker B, Diaz O, Lekadir K. Data synthesis and adversarial net-
works: a review and meta-analysis in cancer imaging. Med Image Anal.
2021;102704
20. Dimitriadis A, Trivizakis E, Papanikolaou N, Tsiknakis M, Marias K.
Enhancing cancer differentiation with synthetic MRI examinations via
generative models: a systematic review. Insights Imaging. [Accepted].
21. National Institutes of Health – National Cancer Institute
(NIH – NCI) Imaging Data Common (IDC). https://
portal.imaging.datacommons.cancer.gov/. Accessed 30 Nov 2022.
22. Trivizakis E, Souglakos I, Karantanas AH, Marias K. Deep radiotran-
scriptomics of non-small cell lung carcinoma for assessing molecular and
histology subtypes with a data-driven analysis. Diagnostics. 2021;11:1–
15.
6 Deep Learning Fundamentals 129

23. Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible


machine learning for health care. Nat Med. 2019;25(10):1337–40.
24. Chen IY, Joshi S, Ghassemi M. Treating health disparities with artificial
intelligence. Nat Med. 2020;26:16–7.
25. Schmarje L, Grossmann V, Zelenka C, et al. Is one annotation enough?
A data-centric image classification benchmark for noisy and ambiguous
label estimation. 2022; https://doi.org/10.48550/arxiv.2207.06214.
26. Gul S, Khan MS, Bibi A, Khandakar A, Ayari MA, Chowdhury MEH.
Deep learning techniques for liver and liver tumor segmentation: a review.
Comput Biol Med. 2022;147:105620.
27. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA,
Gee JC. N4ITK: improved N3 bias correction. IEEE Trans Med Imaging.
2010;29:1310–20.
28. Learned-Miller EG, Jain V. Many heads are better than one: jointly
removing bias from multiple MRIs using nonparametric maximum like-
lihood. Lect Notes Comput Sci. 2005;3565:615–26.
29. Haur Ong K. White matter lesion intensity standardization using adaptive
landmark based brain tissue analysis on FLAIR MR image stroke CAD
view project. Artic. Int. J. Adv. Soft Comput. Its Appl. 2018;
30. Hinton GE. Learning multiple layers of representation. Rev TRENDS
Cogn Sci. https://doi.org/10.1016/j.tics.2007.09.004
31. Ugga L, Romeo V, Stamoulou E, et al. Harmonization strategies in
multicenter MRI-based radiomics. J Imaging. 2022;8:303.
32. Da-Ano R, Visvikis D, Hatt M. Harmonization strategies for multicenter
radiomics investigations. Phys Med Biol. 2020;65:24TR02.
33. Park JE, Park SY, Kim HJ, Kim HS. Reproducibility and generalizability
in radiomics modeling: possible strategies in radiologic and statistical
perspectives. Korean J Radiol. 2019;20:1124.
34. Larue RTHM, van Timmeren JE, de Jong EEC, et al. Influence of gray
level discretization on radiomic feature stability for different CT scanners,
tube currents and slice thicknesses: a comprehensive phantom study. Acta
Oncol (Madr). 2017;56:1544–53.
35. Loi S, Mori M, Benedetti G, et al. Robustness of CT radiomic features
against image discretization and interpolation in characterizing pancreatic
neuroendocrine neoplasms. Phys Medica. 2020;76:125–33.
36. Das KP, Chandra J. A review on preprocessing techniques for noise
reduction in PET-CT images for lung cancer. Lect Notes Data Eng
Commun Technol. 2022;111:455–75.
37. Park S, Yoon JH, Joo I, et al. Image quality in liver CT: low-dose
deep learning vs standard-dose model-based iterative reconstructions. Eur
Radiol. 2022;32:2865–74.
38. Akagi M, Nakamura Y, Higaki T, Narita K, Honda Y, Zhou J, Yu Z,
Akino N, Awai K. Deep learning reconstruction improves image quality
of abdominal ultra-high-resolution CT. Eur Radiol. 2019;29:6163–71.
130 E. Trivizakis and K. Marias

39. Hata A, Yanagawa M, Yoshida Y, Miyata T, Tsubamoto M, Honda


O, Tomiyama N. Combination of deep learning–based denoising and
iterative reconstruction for ultra-low-dose CT of the chest: image quality
and lung-rads evaluation. Am J Roentgenol. 2020;215:1321–8.
40. Feng TS, Lian LA, Hong LJ, Jun LY, Dong PJ. Potential value of
the PixelShine deep learning algorithm for increasing quality of 70
kVp+ASiR-V reconstruction pelvic arterial phase CT images. Jpn J
Radiol. 2019;37:186–90.
41. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura
D, Summers RM. Deep convolutional neural networks for computer-
aided detection: CNN architectures, dataset characteristics and transfer
learning. IEEE Trans Med Imaging. 2016;35:1285–98.
42. Ribeiro E, Uhl A, Wimmer G, Häfner M. Transfer learning for colonic
polyp classification using off-the-shelf CNN features. Lect Notes Comput
Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics).
2016;10170 LNCS:1–13.
43. Zhi W, Wing H, Yueng F, Chen Z, Zandavi SM, Lu Z, Chung YY.
Using transfer learning with convolutional neural networks to diagnose
breast cancer from histopathological images. https://doi.org/10.1007/978-
3-319-70093-9_71
44. Trivizakis E, Ioannidis GS, Melissianos VD, Papadakis GZ, Tsatsakis
A, Spandidos DA, Marias K. A novel deep learning architecture out-
performing ‘off-the-shelf’ transfer learning and feature-based methods in
the automated assessment of mammographic breast density. Oncol Rep.
2019; https://doi.org/10.3892/or.2019.7312.
45. Ioannidis GS, Trivizakis E, Metzakis I, Papagiannakis S, Lagoudaki E,
Marias K. Pathomics and deep learning classification of a heterogeneous
fluorescence histology image dataset. Appl Sci. 2021;11:3796.
46. Mormont R, Geurts P, Maree R. Comparison of deep transfer learning
strategies for digital pathology. In: 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition Work. IEEE; 2018. p. 2343.
47. Kim HE, Cosa-Linan A, Santhanam N, Jannesari M, Maros ME, Gans-
landt T. Transfer learning for medical image classification: a literature
review. BMC Med Imaging. 2022;22:1–13.
48. Amyar A, Modzelewski R, Vera P, Morard V, Ruan S. Multi-task multi-
scale learning for outcome prediction in 3D PET images. Comput Biol
Med. 2022;151:106208.
49. Kainz P, Pfeiffer M, Urschler M. Semantic segmentation of colon glands
with deep convolutional neural networks and total variation segmentation.
2015; https://doi.org/10.48550/arxiv.1511.06919.
50. Suganyadevi S, Seethalakshmi V, Balasamy K. A review on deep learning
in medical image analysis. Int J Multimed Inf Retr. 2021;11:19–38.
51. Bovis K. Classification of mammographic breast density using a com-
bined classifier paradigm. Med Image Underst Anal. 2002:1–4.
6 Deep Learning Fundamentals 131

52. Rao T. Performance analysis of deep learning models using bagging


ensemble.
53. Yurttakal AH, Erbay H, İkizceli T, Karaçavuş S, Biçer C. Classification of
breast DCE-MRI images via boosting and deep learning based stacking
ensemble approach. Adv Intell Syst Comput. 2021;1197 AISC:1125–32.
54. Armanious K, Jiang C, Fischer M, Küstner T, Hepp T, Nikolaou K,
Gatidis S, Yang B. MedGAN: medical image translation using GANs.
Comput Med Imaging Graph. 2020;79:101684.
55. Darzidehkalani E, Ghasemi-Rad M, van Ooijen PMA. Federated learning
in medical imaging: Part I: toward multicentral health care ecosystems. J
Am Coll Radiol. 2022;19:969–74.
56. Darzidehkalani E, Ghasemi-Rad M, van Ooijen PMA. Federated learning
in medical imaging: Part II: Methods, challenges and considerations. J
Am Coll Radiol. 2022;19:975–82.
57. Hoffman RR, Mueller ST, Klein G, Litman J. Metrics for explain-
able AI: challenges and prospects. 2018. https://doi.org/10.48550/
arxiv.1812.04608
58. Doshi-Velez F, Kim B. Towards a rigorous science of interpretable
machine learning. 2017. https://doi.org/10.48550/arxiv.1702.08608
59. Lekadir K, Osuala R, Gallin C, et al. FUTURE-AI: guiding principles
and consensus recommendations for trustworthy artificial intelligence in
medical imaging. 2021. https://doi.org/10.48550/arxiv.2109.09658
60. Angelov PP, Soares EA, Jiang R, Arnold NI, Atkinson PM. Explainable
artificial intelligence: an analytical review. Wiley Interdiscip Rev Data
Min Knowl Discov. 2021;11:e1424.
61. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D.
Grad-CAM: visual explanations from deep networks via gradient-based
localization. In: Proceedings of the IEEE International Conference on
Computer Vision. IEEE; 2017. p. 618–26.
62. AI HLEG, Commission E. Ethics guidelines for trustworthy AI. 2019.
Data Preparation for AI Analysis
7
Andrea Barucci, Stefano Diciotti, Marco Giannelli,
and Chiara Marzi

7.1 Introduction

In the past decade, artificial intelligence (AI) has definitely pre-


vailed as a “disruptive technology” with widespread applications
in every field of human knowledge, from space travel to cultural
heritage [1–3], through medicine and biology [4, 5]. Machine
learning [6] and deep learning [7] techniques are at the heart of the
current AI success in medicine, and they have proven that they can

A. Barucci () · C. Marzi


“Nello Carrara” Institute of Applied Physics, National Research Council of
Italy (IFAC-CNR), Florence, Italy
e-mail: [email protected]; [email protected]
S. Diciotti
Department of Electrical, Electronic, and Information Engineering
“Guglielmo Marconi”, University of Bologna, Cesena, Italy
Alma Mater Research Institute for Human-Centered Artificial Intelligence,
University of Bologna, Bologna, Italy
e-mail: [email protected]
M. Giannelli
Unit of Medical Physics, Pisa University Hospital “Azienda
Ospedaliero-Universitaria Pisana”, Pisa, Italy
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature 133


Switzerland AG 2023
M. E. Klontzas et al. (eds.), Introduction to Artificial
Intelligence, Imaging Informatics for Healthcare
Professionals, https://doi.org/10.1007/978-3-031-25928-9_7
134 A. Barucci et al.

adapt to seemingly extremely distant tasks, such as recognising a


hieroglyph in a photograph [1] or finding a cancerous lesion in
medical images.
Historically, the first AI applications in medicine, thanks to
a large amount of data, were focused on radiological imaging,
specifically computed tomography (CT), and magnetic resonance
(MR). Nowadays, AI involves most imaging techniques, also
including X-ray [8], positron emission tomography [9], ultra-
sound imaging [10], and digital pathology [11]. Initially, radio-
logically derived hand-crafted features, such as radiomic features
[12], were analysed through machine learning techniques. More
recently, deep learning networks that operate directly on images
are also being used. Some AI applications are now widely used
at the research level [13] or being validated through clinical
trials. Others are already being used in clinical practice [14–
17].
The potential of AI in clinical imaging is supported by a
large number of studies (see, e.g., [18–22]), but at the same
time some current limitations become apparent, such as the lack
of generalisation of the results obtained. Many studies have
led to results that are only valid in the conditions in which
they were performed and therefore cannot be directly exported
to other clinical contexts. These limitations were immediately
attributed to data quality, preprocessing, and algorithms architec-
ture.
It is worth noting that in this scenario, the interaction between
different experts, such as clinicians, engineers, physicists, and
data scientists, becomes fundamental in defining a system consist-
ing of data and algorithms of the highest quality and performance.
Quality should become a feature of the entire system, taking
into account the quality of the individual image, the quality
of the image dataset, and the high potential of the AI algo-
rithms.
In this chapter, we will focus on the first two aspects. More
specifically, we will discuss how data quality affects the results
of AI, from the acquisition of the image and the information
contained therein to the methods for preprocessing the data
7 Data Preparation for AI Analysis 135

Fig. 7.1 Overview of the impact of data quality and numerosity in clinical
image analysis using AI

to create numerous specific datasets for AI applications (Fig.


7.1).

7.2 Data Quality and Numerosity


7.2.1 Intrinsic Image Quality

If we want to investigate how the quality of a medical image


is related to the result of the subsequent analysis with AI, we
must first define what exactly we observe in an image. The latter
is basically the result of a measurement of the object under
examination, such as X-rays for a CT-scan or electromagnetic
waves for an MRI examination, reconstructed through various
steps.
First, there is the instrumentation (the scanner) that introduces
its signature—the so-called hardware-fingerprint, related to the
hardware components (such as detectors for CT or coils for
MRI). The same time, the acquisition protocol introduces its
idiosyncrasies (protocol signature) in the image as a result of that
particular procedure’s ability to acquire and highlight informa-
tion from the tissue under investigation. Moreover, the protocol
signature has its own intrinsic specificity related to the actual
software and hardware implementation of the protocol itself. The
result of this measurement can then be processed by mathematical
algorithms (reconstruction, filtering, noise suppression, etc.) to
136 A. Barucci et al.

Fig. 7.2 The medical imaging formation process. The final image is the
result of different processes related to image acquisition and processing

produce the graphical image we are used to, which in turn


contains “traces” of the processing (software-fingerprint) [23].
Although this final image makes it possible to analyse the
physical, chemical, biochemical, biological processes, etc. taking
place inside the human body, the whole chain that constitutes the
observation system is deeply involved in the obtained image. Each
measurement is thus able to highlight the processes occurring
within the observed object (in this case, a patient) but imprints
its fingerprint in the images, which must be taken into account
(Fig. 7.2).

7.2.2 Image Diagnostic Quality

As discussed earlier, the quality of a medical image depends


on various factors (Fig. 7.2), including scanner hardware and
acquisition protocol, as well as preprocessing (interpolation, nor-
malisation, artefact removal, filtering, etc.). All these factors
directly affect the “intrinsic” image quality [24], and several
measurements have been used in the literature for this assessment
[25], including signal-to-noise ratio, contrast-to-noise ratio, signal
7 Data Preparation for AI Analysis 137

uniformity, etc. While these indices are extremely useful and valid
for describing the intrinsic characteristics of the image, they need
to be supported by other information that takes into account the
“diagnostic” quality of a medical image.
The diagnostic quality of a medical image is defined as “the
quality of decisions taken as a result of the image evaluation in
terms of benefit for the patient (does the therapeutic treatment
change? What is the effect on mortality? etc.)” [24]. Clearly, a
radiologist does not necessarily need a “beautiful” image (assum-
ing he can define what the term “beautiful” means), but rather the
image that provides the most useful and effective representation
for his purposes, such as the identification and classification of a
lesion, or the pathological condition, etc. Moreover, with the same
diagnostic quality, thanks to his experience, the clinical context,
and the capabilities of the human mind, a radiologist is able to
respond to the specific task even with two images of different
intrinsic quality. Of course, it remains clear that diagnostic quality
is more difficult to achieve in highly complex clinical questions,
looking for small structural or functional changes, for example.
Therefore, in radiology, since human observers are the final
recipients of visual information, subjective evaluation of image
quality is currently considered the most reliable approach [26].

7.2.3 Image Quality for AI Analyses

Machine learning and deep learning algorithms are agnostic, i.e.


they can be used to answer questions that are very far apart.
This does not mean that the same model trained to recognise
hieroglyphs, for example, can be applied to recognise pathology
in an MR image, but that the structure of the algorithm (e.g.
the architecture of a neural network) can respond to different
problems through specific training. Their versatility makes them
powerful tools, but at the same time, makes their optimisation and
adaptation essential to work efficiently on specific questions.
Achieving this goal requires a training phase in which the AI
algorithms learn from data to model the problem. Essentially, it
is about finding a function that describes the data based upon
138 A. Barucci et al.

the data itself, as Vladimir Vapnik said, “learning is a problem


of function estimation based upon empirical data.” In general,
machine learning algorithms, and deep learning networks in
particular, require extensive datasets. The amount of data needed
to train and test a machine learning algorithm depends on many
factors, including the complexity of the problem (i.e. the unknown
function that relates the input variables to the output variable) and
the complexity of the learning algorithm (i.e. the algorithm used
to inductively learn the unknown mapping function). Tasks with
strongly predictive variables require fewer samples to obtain well-
performing models than tasks with weakly informative variables
or mixed with noise. Moreover, a large dataset is usually more
statistically representative of the problem in terms of population,
and algorithms trained on it are more robust to errors in the
dataset. In contrast, small datasets are highly sensitive to poor
homogeneity, typically suffer from overfitting problems and low
generalisability [27], and often report unrealistic performance.
When the number of “available” images is limited, as in some
clinical applications, the requirement of a homogeneous dataset
becomes stringent to reduce the impact of confounding factors
such as acquisition protocols, preprocessing, etc. An example of
the impact of data quality and data volume on the performance of
a machine learning algorithm for a classification task is reported
in Fig. 7.3.
In the case of deep learning, it is possible to mitigate the
need for an extensive dataset by using strategies such as transfer
learning, where the neural network exploits the knowledge gained
by solving a different problem (with greater data availability),
but related to the problem being investigated. For example, the
knowledge acquired by learning to recognise images of pathology
might be applied, with the proper adaptation, to investigate a
different problem.
From an AI point of view, therefore, not only is the diagnostic
quality of the single image important, but it is essential that all
the images making up the entire dataset also have similar intrinsic
quality. Not fulfilling this requirement leads at best to worse
performance or, at worse, to chance-level predictions [27]. In this
way, the AI algorithms are trained in the best possible conditions
7 Data Preparation for AI Analysis 139

Fig. 7.3 An example of the impact of data quality and numerosity on the
performance of a machine learning algorithm for a classification task. Each
point represents a patient with a colour referring to three different pathologies
(yellow, green, and purple). The sketch shows that, in the presence of few
data with low quality, the algorithm is unable to separate the pathologies. By
simultaneously improving the quality and number of data, the algorithm’s
performance increases

since, on the one hand, each image of the dataset has sufficient
diagnostic quality to solve the problem, and, on the other hand,
there are be no sources of unwanted variability between the
different images, due to differences in intrinsic quality, that could
confuse the AI algorithms.
In medical imaging, acquiring a set of images with intrinsic
“constant” quality means examining all the subjects in the same
setting, with the same scanner, the same acquisition protocol,
carrying out the same preprocessing, etc. In this scenario, it is
not easy to obtain a large dataset. Therefore, in recent years,
many studies have combined data and images collected in dif-
ferent ways (different acquisition institutes, scanners, acquisition
protocols, etc.) to obtain a multicentre dataset that is clinically
representative of the population to be analysed. Thereby, each
image showed sufficient diagnostic quality but different intrinsic
quality as a function of the different acquisition protocols and
processing parameters. Therefore, the process of appropriately
combining data from different sources, known as data pooling,
is becoming fundamental to the success of AI in radiology and
140 A. Barucci et al.

is currently one of the steps that healthcare professionals and


researchers need to handle carefully.
In summary, the image quality for AI analyses focuses not
only on the single image, but on the quality of the entire dataset.
The whole dataset should be informative and contain biological,
medical, and clinical information. There is a principle in infor-
mation theory known as GIGO (garbage in–garbage out), which
states that the outgoing information quality from an algorithm
cannot exceed that of incoming information, meaning that it is not
possible to extract information where there is none. This principle
outlines the importance of data quality and the risk of obtaining
random results using inappropriate AI tools. This is particularly
true for deep learning, which, due to its power and an intrinsic
complex interpretation of results, can lead to incorrect results
beyond the control of experts [28].

7.3 Data Preprocessing for Machine Learning


Analyses

In machine learning, data pre-processing is a crucial step because


the quality of the data directly affects the learning ability of
the model [29]. In the following, we briefly describe the most
common preprocessing steps for tabular data.

Missing Values Imputation


Many clinical and image datasets contain missing values, fre-
quently coded as blanks, NaNs, or other placeholders. However,
many machine learning models cannot use these datasets directly
because they assume the presence of all values. Discarding entire
rows and/or columns with missing values is a possible solution for
working with incomplete datasets. However, this approach further
reduces the size of the dataset. Imputation of the missing values or
their inference from the known portion of the data is a preferable
approach [30]. One type of imputation algorithm is univariate,
where values in a specific feature are imputed using the statistics
(mean, median, or most frequent) of the non-missing values of
that feature. In contrast, multivariate imputation procedures use
7 Data Preparation for AI Analysis 141

the entire set of available features to estimate the missing values,


modelling each feature with missing values as a function of the
other features and using that estimate for imputation [31].

Encoding Categorical Features


Encoding categorical data (e.g., gender, education, drug level,
etc.) is a process that transforms categorical data into numerical
data that can be provided to the machine learning models [32].
Label or ordinal encoding is used when the categorical variables
are ordinal; ordinal encoding converts each categorical label into
integer values, respecting the sequence of labels. By contrast,
the one-hot encoding strategy converts each category into binary
numbers (0 or 1). This type of encoding is used when the data
is nominal. Newly created binary features can be considered
dummy variables. After one-hot encoding, the number of dummy
variables depends on the number of categories present in the data
[32].

Standardisation
Standardisation of datasets is a common requirement for many
machine learning models that might behave badly if individual
features show different ranges. One way to standardise the data
is to remove the mean value from each feature and divide it by
the standard deviation. The mean value and standard deviation are
calculated across samples. Other examples of data standardisation
are detailed in reference [33].

Multicentre Data Harmonisation


Pooling data from multiple institutions and hospitals provides
an opportunity to assemble more extensive and diverse groups
of subjects [34–37], increases statistical power [35, 38–41], and
allows for the study of rare disorders and subtle effects [42,
43]. However, a major drawback of combining data across sites
is the introduction of confounding effects due to non-biological
variability in the data, usually related to the hardware and data
acquisition protocol. For example, in an MRI study, the prop-
erties of MRI, such as scanner field strength, radiofrequency
coil type, gradients coil characteristics, hardware, image recon-
142 A. Barucci et al.

struction algorithm, and non-standardised acquisition protocol


parameters can introduce unwanted technical variability, which is
also reflected in MRI-derived features [44–46]. Harmonisation of
multicentre data, defined as applying mathematical and statistical
concepts to reduce unwanted site variability while preserving
biological content, is therefore necessary to ensure the success
of cooperative analyses. Several harmonisation techniques exist,
including functional normalisation [47], Removal of Artificial
Voxel Effect by Linear regression (RAVEL) [48], global scaling
[49, 50], and ComBat [51, 52].

Dimensionality Reduction
Dimensionality reduction refers to the process of reducing the
number of features in a dataset while keeping as much variation
in the original dataset as possible. Dimensionality reduction could
manage multicollinearity of the features and remove noise in the
data. From a machine learning analysis perspective, a lower num-
ber of features means less training time and less computational
power. It also avoids the potential problem of overfitting, lead-
ing to an increase in overall performance. Principal component
analysis (PCA) is a linear dimensionality reduction technique that
transforms a set of correlated variables into a smaller number
of uncorrelated variables, called principal components, while
retaining as much variation in the original dataset as possible [53].
Other linear dimensionality reduction methods are factor analysis
(FA) [54] and linear discriminant analysis (LDA) [55].

Feature Selection
In machine learning and statistics, feature selection is the pro-
cess of selecting a subset of relevant features for use in model
construction. In medicine and health care, feature selection is
advantageous because it enables the interpretation of the machine
learning model and the discovery of new potential biomarkers
related to a specific disorder or condition [56]. Feature selection
methods can be grouped into three categories: filter method,
wrapper method, and embedded method [57, 58]. In the filter
method, features are selected based on the general characteristics
of the dataset without using any predictive model. In the wrapper
7 Data Preparation for AI Analysis 143

method, the feature selection algorithm is wrapped around the


predictive model algorithm as a “wrapper” and the same model
is used to select the best features [59]. In embedded methods,
the feature selection process is integrated into the model learning
phase by using algorithms that have their own feature selection
methods (e.g., classification and regression tree (CART) and least
absolute shrinkage and selection operator (LASSO) algorithms).

7.3.1 The Machine Learning Pipeline

Training and testing machine learning models requires choosing


a proper validation scheme that handles data splitting (e.g., hold-
out, cross-validation (CV), bootstrap, etc.). This choice is crucial
to avoid data leakage by ensuring that the model is built on
training data and evaluated on test data that was never seen
during the learning phase. Indeed, data leakage, which occurs
when information from outside the training set is used to create
the model, can lead to falsely overestimated performance in the
test set (see, e.g., [60, 61]). In this view, all preprocessing steps
involving more than one sample (e.g. some types of imputation of
missing values, standardisation, multicentre data harmonisation,
dimensionality reduction, feature selection, etc.) should be per-
formed only on the training data and subsequently applied to the
test data.
In medicine and health care, where relatively small datasets are
usually available, the straightforward hold-out validation scheme
is rarely applied. In contrast, the CV and its nested version (nested
CV) for hyperparameter optimisation of the entire workflow [62–
64] are frequently preferred. Repeated CVs or repeated nested
CVs are also suggested for improving the reproducibility of
the entire machine learning system [63]. In all these validation
schemes, several training and test data procedures are performed
on different data splits, recalling the need for a compact code
structure to avoid errors that may lead to data leakage. All things
considered, machine learning pipelines are a solution because
they orchestrate all processing steps and the actual model in
a short, easier-to-read, and easier-to-maintain code structure.
144 A. Barucci et al.

Fig. 7.4 Scheme of a machine learning pipeline consisting of two prepro-


cessing steps (i.e., transformers #1 and #2) and one prediction step (i.e.,
estimator). Using the pipeline, preprocessing is performed on the training
data only, regardless of the validation scheme selected (e.g., hold-out, nested
hold-out, cross-validation (CV), nested CV). Reprinted from [52]

A pipeline represents the entire data workflow, combining all


preprocessing steps and training of the machine learning model.
It is essential to automate an end-to-end training/test process
without any form of data leakage and improve reproducibility,
ease of deployment, and code reuse, especially when complex
validation schemes are needed (Fig. 7.4).

7.3.2 The Machine Learning Pipeline: A Case Study

To highlight the importance of performing all preprocessing


steps on training data, which is needed in order to avoid data
leakage and workflow performance overestimation on test data,
we present the following case study.
From the MR T1-weighted scans of 86 healthy subjects
belonging to the International Consortium for Brain Mapping
7 Data Preparation for AI Analysis 145

(ICBM) dataset [65], we estimated radiomic [66] and fractal


descriptors [67–71]—for a total of 140 MRI-derived features—
of the brain cortical grey matter. By setting an arbitrary age
threshold (in this case, 45 years), each subject was labelled 0 or 1
depending on their age. With a hold-out validation scheme (80%
of subjects in the training set and 20% in the test set), we predicted
the age class using a Support Vector classifier (using the Scikit-
learn version 1.0.2 default hyperparameters) trained on radiomic
and fractal features. We performed several data preprocessing
procedures: data standardisation (on the entire dataset/in the
training set only), feature selection (on the entire dataset/in the
training set only), data standardisation, and feature selection (on
the entire dataset/in the training set only). In Table 7.1, we show
the following classification scores estimated in the test set: area
under the receiver operating characteristic (AUROC), accuracy,
sensitivity, and specificity. All the scores obtained by performing
the preprocessing steps on the entire dataset are higher than those
estimated by running the preprocessing only on the training data
and then applying it to the test data. This is clear evidence of
how incorrect application of preprocessing leads to data leakage,
falsely inflating machine learning models performance.

Table 7.1 Age class prediction scores in the test set


AUROC Accuracy Sensitivity Specificity
Standardisation
Entire dataset 0.74 0.78 0.77 0.80
Training set 0.71 0.72 0.69 0.80
Feature selection
Entire dataset 0.86 0.67 0.86 0.55
Training set 0.58 0.50 0.71 0.37
Standardisation and feature selection
Entire dataset 0.92 0.83 0.86 0.82
Training set 0.81 0.72 0.71 0.73
AUROC area under the receiver operating characteristic
146 A. Barucci et al.

References
1. Barucci A, Cucci C, Franci M, Loschiavo M, Argenti F. A deep learning
approach to ancient Egyptian hieroglyphs classification. IEEE Access.
2021;9:123438–47.
2. Cucci C, Barucci A, Stefani L, Picollo M, Jiménez-Garnica R, Fuster-
Lopez L. Reflectance hyperspectral data processing on a set of Picasso
paintings: which algorithm provides what? A comparative analysis of
multivariate, statistical and artificial intelligence methods. In: Groves R,
Liang H, editors. Optics for arts, architecture, and archaeology VIII.
Bellingham: SPIE; 2021. p. 1.
3. Li Z, Shen H, Cheng Q, Liu Y, You S, He Z. Deep learning based
cloud detection for medium and high resolution remote sensing images
of different sensors. ISPRS J Photogramm Remote Sens. 2019;150:197–
212.
4. Scapicchio C, Gabelloni M, Barucci A, Cioni D, Saba L, Neri E. A deep
look into radiomics. Radiol Med. 2021;126(10):1296–311.
5. Hamet P, Tremblay J. Artificial intelligence in medicine. Metabolism.
2017;69:36–40.
6. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and
prospects. Science. 2015;349(6245):255–60.
7. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature.
2015;521(7553):436–44.
8. Ismael AM, Şengür A. Deep learning approaches for COVID-19 detec-
tion based on chest X-ray images. Expert Syst Appl. 2021;164:114054.
9. Ding Y, Sohn JH, Kawczynski MG, Trivedi H, Harnish R, Jenkins NW,
et al. A deep learning model to predict a diagnosis of Alzheimer disease
by using 18F-FDG PET of the brain. Radiology. 2019;290(2):456–64.
10. Van Sloun RJ, Cohen R, Eldar YC. Deep learning in ultrasound imaging.
Proc IEEE. 2019;108(1):11–29.
11. Deng S, Zhang X, Yan W, Chang EI, Fan Y, Lai M, et al. Deep learning in
digital pathology image analysis: a survey. Front Med. 2020;14(4):470–
87.
12. Guiot J, Vaidyanathan A, Deprez L, Zerka F, Danthine D, Frix A, et al. A
review in radiomics: making personalized medicine a reality via routine
imaging. Med Res Rev. 2022;42(1):426–40.
13. Consortium TM. Project MONAI. Zenodo; 2020. https://zenodo.org/
record/4323059.
14. van Leeuwen KG, Schalekamp S, Rutten MJCM, van Ginneken B, de
Rooij M. Artificial intelligence in radiology: 100 commercially available
products and their scientific evidence. Eur Radiol. 2021;31(6):3797–804.
15. imbio. https://www.imbio.com.
16. BRAINOMIX. https://www.brainomix.com.
17. Goebel J, Stenzel E, Guberina N, Wanke I, Koehrmann M, Kleinschnitz
C, et al. Automated ASPECT rating: comparison between the Frontier
7 Data Preparation for AI Analysis 147

ASPECT Score software and the Brainomix software. Neuroradiology.


2018;60(12):1267–72.
18. Ciulli S, Citi L, Salvadori E, Valenti R, Poggesi A, Inzitari D, et al.
Prediction of impaired performance in trail making test in MCI patients
with small vessel disease using DTI data. IEEE J Biomed Health Inform.
2016;20(4):1026–33.
19. Yagis E, De Herrera AGS, Citi L. Generalization performance
of deep learning models in neurodegenerative disease classifica-
tion. In: 2019 IEEE international conference on bioinformatics and
biomedicine (BIBM), vol. 2019. San Diego: IEEE. p. 1692–8. https://
ieeexplore.ieee.org/document/8983088/.
20. Bertelli E, Mercatelli L, Marzi C, Pachetti E, Baccini M, Barucci A, et al.
Machine and deep learning prediction of prostate cancer aggressiveness
using multiparametric MRI. Front Oncol. 2022;11:802964.
21. Trajkovic J, Di Gregorio F, Ferri F, Marzi C, Diciotti S, Romei V. Resting
state alpha oscillatory activity is a valid and reliable marker of schizotypy.
Sci Rep. 2021;11(1):10379.
22. Marzi C, d’Ambrosio A, Diciotti S, Bisecco A, Altieri M, Filippi M, et al.
Prediction of the information processing speed performance in multiple
sclerosis using a machine learning approach in a large multicenter mag-
netic resonance imaging data set. Hum Brain Mapp. 2022;2022:26106.
23. Barca P, Marfisi D, Marzi C, Cozza S, Diciotti S, Traino AC, et al.
A voxel-based assessment of noise properties in computed tomography
imaging with the ASiR-V and ASiR iterative reconstruction algorithms.
Appl Sci. 2021;11(14):6561.
24. Coppini G, Diciotti S, Valli G. Bioimmagini. 3rd ed. Bologna: Pàtron;
2012.
25. Ding Y. Visual quality assessment for natural and medical image. Cham:
Springer; 2018.
26. Lévêque L, Outtas M, Liu H, Zhang L. Comparative study of the
methodologies used for subjective medical image quality assessment.
Phys Med Biol. 2021;66(15):15TR02.
27. Geirhos R, Temme CR, Rauber J, Schütt HH, Bethge M, Wichmann
FA. Generalisation in humans and deep neural networks. Adv Neural Inf
Proces Syst. 2018;31:7549–61.
28. Barucci A, Neri E. Adversarial radiomics: the rising of potential risks in
medical imaging from adversarial learning. Eur J Nucl Med Mol Imaging.
2020;47(13):2941–3.
29. Marfisi D, Tessa C, Marzi C, Del Meglio J, Linsalata S, Borgheresi
R, et al. Image resampling and discretization effect on the estimate of
myocardial radiomic features from T1 and T2 mapping in hypertrophic
cardiomyopathy. Sci Rep. 2022;12(1):10186.
30. Little RJA, Rubin DB. Statistical analysis with missing data. 3rd ed.
Hoboken: Wiley; 2020. p. 1.
148 A. Barucci et al.

31. Rubin DB, editor. Multiple imputation for nonresponse in surveys.


Hoboken: Wiley; 1987.
32. Cohen P, West SG, Aiken LS. Applied multiple regression/correlation
analysis for the behavioral sciences. London: Psychology Press; 2014.
33. Raju VNG, Lakshmi KP, Jain VM, Kalidindi A, Padma V. Study the
influence of normalization/transformation process on the accuracy of
supervised classification. In: 2020 third international conference on smart
systems and inventive technology (ICSSIT). Tirunelveli: IEEE; 2020. p.
729–35.
34. Pomponio R, Erus G, Habes M, Doshi J, Srinivasan D, Mamourian E, et
al. Harmonization of large MRI datasets for the analysis of brain imaging
patterns throughout the lifespan. NeuroImage. 2020;208:116450.
35. Radua J, Vieta E, Shinohara R, Kochunov P, Quidé Y, Green MJ,
et al. Increased power by harmonizing structural MRI site differences
with the ComBat batch adjustment method in ENIGMA. NeuroImage.
2020;218:116956.
36. Fortin JP, Cullen N, Sheline YI, Taylor WD, Aselcioglu I, Cook PA, et al.
Harmonization of cortical thickness measurements across scanners and
sites. NeuroImage. 2018;167:104–20.
37. Fortin JP, Parker D, Tunç B, Watanabe T, Elliott MA, Ruparel K, et al.
Harmonization of multi-site diffusion tensor imaging data. NeuroImage.
2017;161:149–70.
38. Beer JC, Tustison NJ, Cook PA, Davatzikos C, Sheline YI, Shinohara
RT, et al. Longitudinal ComBat: a method for harmonizing longitudinal
multi-scanner imaging data. NeuroImage. 2020;220:117129.
39. Keshavan A, Paul F, Beyer MK, Zhu AH, Papinutto N, Shinohara RT, et
al. Power estimation for non-standardized multisite studies. NeuroImage.
2016;134:281–94.
40. Pinto MS, Paolella R, Billiet T, Van Dyck P, Guns PJ, Jeurissen B, et
al. Harmonization of brain diffusion MRI: concepts and methods. Front
Neurosci. 2020;14:396.
41. Suckling J, Ohlssen D, Andrew C, Johnson G, Williams SCR, Graves
M, et al. Components of variance in a multicentre functional MRI study
and implications for calculation of statistical power. Hum Brain Mapp.
2008;29(10):1111–22.
42. Dansereau C, Benhajali Y, Risterucci C, Pich EM, Orban P, Arnold D,
et al. Statistical power and prediction accuracy in multisite resting-state
fMRI connectivity. NeuroImage. 2017;149:220–32.
43. Yu M, Linn KA, Cook PA, Phillips ML, McInnis M, Fava M, et
al. Statistical harmonization corrects site effects in functional connec-
tivity measurements from multi-site fMRI data. Hum Brain Mapp.
2018;39(11):4213–27.
7 Data Preparation for AI Analysis 149

44. Han X, Jovicich J, Salat D, van der Kouwe A, Quinn B, Czanner S, et


al. Reliability of MRI-derived measurements of human cerebral cortical
thickness: the effects of field strength, scanner upgrade and manufacturer.
NeuroImage. 2006;32(1):180–94.
45. Jovicich J, Czanner S, Greve D, Haley E, van der Kouwe A, Gollub R,
et al. Reliability in multi-site structural MRI studies: effects of gradi-
ent non-linearity correction on phantom and human data. NeuroImage.
2006;30(2):436–43.
46. Takao H, Hayashi N, Ohtomo K. Effect of scanner in longitudinal studies
of brain volume changes. J Magn Reson Imaging. 2011;34(2):438–44.
47. Fortin JP, Triche TJ, Hansen KD. Preprocessing, normalization and
integration of the Illumina human methylation EPIC array with minfi.
Bioinformatics. 2016;33(4):558–60.
48. Fortin JP, Sweeney EM, Muschelli J, Crainiceanu CM, Shinohara RT.
Removing inter-subject technical variability in magnetic resonance imag-
ing studies. NeuroImage. 2016;132:198–212.
49. Cleveland WS. LOWESS: a program for smoothing scatterplots by robust
locally weighted regression. Am Stat. 1981;35(1):54.
50. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of
normalization methods for high density oligonucleotide array data based
on variance and bias. Bioinformatics. 2003;19(2):185–93.
51. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microar-
ray expression data using empirical Bayes methods. Biostat Oxf Engl.
2007;8(1):118–27.
52. Marzi C, Giannelli M, Barucci A, Tessa C, Mascalchi M, Diciotti S.
Efficacy of MRI data harmonization in the age of machine learning. A
multicenter study across 36 datasets. 2022.
53. Jolliffe IT, Cadima J. Principal component analysis: a review and
recent developments. Philos Trans R Soc A Math Phys Eng Sci.
2016;374(2065):20150202.
54. Lord FM, Wainer H, Messick S, editors. Principals of modern psycho-
logical measurement: a Festschrift for Frederic M[ather] Lord. Hillsdale:
Erlbaum; 1983. p. 377.
55. Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York:
Wiley; 2001. p. 654.
56. Remeseiro B, Bolon-Canedo V. A review of feature selection methods in
medical applications. Comput Biol Med. 2019;112:103375.
57. Guyon I, Elisseeff A. An introduction to variable and feature selection. J
Mach Learn Res. 2003;3(Mar):1157–82.
58. Stańczyk U. Feature evaluation by filter, wrapper, and embedded
approaches. In: Stańczyk U, Jain LC, editors. Feature selection for data
and pattern recognition. Berlin: Springer; 2015. p. 29–44.
59. Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell.
1997;97(1–2):273–324.
150 A. Barucci et al.

60. Yagis E, Atnafu SW, Seco G, de Herrera A, Marzi C, Scheda R, Giannelli


M, et al. Effect of data leakage in brain MRI classification using 2D
convolutional neural networks. Sci Rep. 2021;11(1):22544.
61. Tampu IE, Eklund A, Haj-Hosseini N. Inflation of test accuracy due to
data leakage in deep learning-based classification of OCT images. Sci
Data. 2022;9(1):580.
62. Müller AC, Guido S. Introduction to machine learning with Python: a
guide for data scientists. 1st ed. Sebastopol: O’Reilly Media; 2016. p.
376.
63. Scheda R, Diciotti S. Explanations of machine learning models in
repeated nested cross-validation: an application in age prediction using
brain complexity features. Appl Sci. 2022;12(13):6681.
64. Varma S, Simon R. Bias in error estimation when using cross-validation
for model selection. BMC Bioinf. 2006;7(1):91.
65. 1000 functional connectomes project (FPC).
https://fcon_1000.projects.nitrc.org/fcpClassic/FcpTable.html.
66. van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N,
Narayan V, et al. Computational radiomics system to decode the radio-
graphic phenotype. Cancer Res. 2017;77(21):e104–7.
67. Marzi C, Ciulli S, Giannelli M, Ginestroni A, Tessa C, Mascalchi M,
et al. Structural complexity of the cerebellum and cerebral cortex is
reduced in spinocerebellar ataxia type 2. J Neuroimaging Off J Am Soc
Neuroimaging. 2018;28(6):688–93.
68. Pantoni L, Marzi C, Poggesi A, Giorgio A, De Stefano N, Mascalchi M,
et al. Fractal dimension of cerebral white matter: a consistent feature for
prediction of the cognitive performance in patients with small vessel dis-
ease and mild cognitive impairment. NeuroImage Clin. 2019;24:101990.
69. Marzi C, Giannelli M, Tessa C, Mascalchi M, Diciotti S. Toward a more
reliable characterization of fractal properties of the cerebral cortex of
healthy subjects during the lifespan. Sci Rep. 2020;10(1):16957.
70. Marzi C, Giannelli M, Tessa C, Mascalchi M, Diciotti S. Fractal analysis
of MRI data at 7 T: how much complex is the cerebral cortex? IEEE
Access. 2021;9:69226–34.
71. Pani J, Marzi C, Stensvold D, Wisløff U, Håberg AK, Diciotti S.
Longitudinal study of the effect of a 5-year exercise intervention on
structural brain complexity in older adults. A generation 100 substudy.
NeuroImage. 2022;2022:119226.
Current Applications of AI in Medical
Imaging 8
Gianfranco Di Salle, Salvatore Claudio Fanni,
Gayane Aghakhanyan, and Emanuele Neri

8.1 Introduction

In the recent years, a growing interest in artificial intelligence (AI)


has been observed, and its use has been investigated in a variety
of clinical contexts with different applications.
Oncologic imaging is undeniably the most investigated appli-
cation field of AI, in the forms of radiomics-based machine
learning (ML) and deep learning (DL), already well-described
in the previous chapters. However, also non-oncologic imaging
has not been spared by AI breakthrough, as demonstrated by the
increasing number of research studies and clinically applicable
tools.
AI applications have been tested at virtually all stages of
the imaging pipeline, from exam modality selection to exam

G. Di Salle () · S. C. Fanni · E. Neri


Department of Translational Research, Academic Radiology, University of
Pisa, Pisa, Italy
e-mail: [email protected]; [email protected]
G. Aghakhanyan
Department of Translational Research, University of Pisa, Pisa, Italy

© The Author(s), under exclusive license to Springer Nature 151


Switzerland AG 2023
M. E. Klontzas et al. (eds.), Introduction to Artificial
Intelligence, Imaging Informatics for Healthcare
Professionals, https://doi.org/10.1007/978-3-031-25928-9_8
152 G. Di Salle et al.

protocol selection, data acquisition, image reconstruction, image


processing, image interpretation, and reporting.
DL models have also been trained to understand and generate
natural language. These models may be useful as clinical deci-
sion support tool providing imaging appropriateness guidance
and may even be able to generate radiological report based on
keywords provided by the radiologists.
However, AI applications in diagnostic radiology are mostly
aimed at reasoning and perception task, based on interpreta-
tion of sensory information. AI could be particularly useful for
findings detection, as an additional reader providing a second
interpretation, or as the exclusively reader in those low-resources
regions where the availability of radiologists is limited. As proof
of its value, in 2021 the WHO recommended the use of AI-
powered computer-aided detection (CAD) software for screening
and triage purpose in countries suffering from a high burden of
pulmonary tuberculosis.
AI becomes even more valuable in classification task, to
discriminate between benign or malignant tumors or between
different histological subtype of malignancies. Radiomics-based
algorithms are designed for diagnostic, prognostic, and classifi-
cation tasks. Radiomics-based model implementation in clinical
practice may be fastened by AI-based segmentation. Segmenta-
tion is the process of tagging the diagnostic image, and of dividing
it in subregions representing organ structures and components,
tumor masses and their different necrotic, proliferating, hyper-
vascular and quiescent cellular compartments. Lesion contouring
is often useful for assessing disease severity and prognosis but
is a largely time-consuming and operator-dependent task, which
hampers the diffusion of handcrafted segmentation protocols for
diagnosis and follow-up. Indeed, nearly all research studies in
radiomics ground their pipeline on the basis of a solid segmen-
tation, which is essential for the correct calculation of radiomic
features. This is why AI-based automated segmentation may
improve the efficiency and methodological reliability of radiomic
studies themselves.
This brief chapter aims at summarizing the most up-to-date
applications of AI in medical imaging, dividing them by the
8 Current Applications of AI in Medical Imaging 153

specific task they are conceived for: lesion detection, disease


classification, and organ/tumor segmentation. Instead of listing
all approaches proposed to-date in literature—which would be
unproductive as their number is constantly growing—we will
discuss the most relevant contributions per specific task while
trying to highlight favorable and unfavorable arguments for their
translation into clinical practice.

8.2 Detection

Lesion detection in medical images can be tricky in the day-to-day


clinical practice, especially with incidental findings, during exams
performed for other purposes. Additionally, less conspicuous
findings may escape the reporter’s attention, for example, in
emergency setting where all stages of the diagnostic exams have
to be performed very quickly. Therefore, implementation of AI
algorithms designed to automatically detect imaging findings can
be useful for improving the sensitivity of human reporting.
Examples of the use of DL in disease detection include
pulmonary embolism (PE) in urgent CT pulmonary angiograms
(CTPAs) [1, 2], as well as large vessel occlusion (LVO) in
noncontrast head CT scans (NCCT) of acute stroke patients
[3]. Detection of intracranial hemorrhage (ICH) was investigated
through DL in works by Ginat [52] and by Seyam et al. [4].
Oncologic imaging is, of course, one of the fields that benefit
most from AI, in that new algorithms are aimed to automati-
cally detect new lesions and may profoundly impact oncologic
screening and follow-up. Automatic tumor detection is mostly
pursued using DL techniques, especially CNNs. Neuroradiology
already counts a large number of applications, mostly based on
MRI [5, 6] and nuclear medicine [7] due to the functional and
multiparametric information provided by these techniques. On the
contrary, computed tomography-based data are more frequently
used to train algorithms for body applications, such as colorectal
cancer [8], liver [9], and ovary [10]. Of note, interesting emerging
applications use DL algorithms to detect pancreatic neoplasm in
endoscopic ultrasound videos [11].
154 G. Di Salle et al.

In the surgical field, DL is increasingly applied to detect


critical events in an intraoperative setting, which is called “sur-
gical phase recognition” or “surgical workflow recognition.” This
task is currently recognized to be potentially critical to improve
workflow optimization, intraoperative planning, surgical training,
and patient safety.
As in the following paragraphs, average performance of the
reported algorithms was astonishingly good. Integration into
clinical practice depends on their generalizability, which in turn is
a function of research methodological quality, especially in terms
of external validation.

8.3 Classification

The devil is in the details, a famous idiom says. Diagnostic


strategies in medicine, including imaging, are all designated to
collect enough information to discriminate among physiologi-
cal and pathological, benign and malignant, stable and quickly
evolving clinical conditions. As the number of available diag-
nostic modalities increases, complex data integration is needed
to subdivide patients into homogeneous groups benefitting from
similar therapies, sharing similar prognosis, and characterized by
a similar response to treatments. Disease classification is one
of the most investigated AI-based tasks, as witnessed by the
high number of applications proposed in literature and tested
in hospitals workflow. Most available algorithms are based on
supervised learning, as they require input from large datasets of
labeled data in order to infer predictions about new cases.
An example of the use of AI algorithms in image-based
disease classification is represented by Alzheimer’s disease (AD),
the most common cause of dementia among older adults. MR
images were used as training data for distinguishing AD patients
from healthy controls in works by Lu et al. [12] and Qiu et
al. [13]. Nuclear medicine data have been used for the same
research question by Alongi et al. [14], reaching good accuracy
performances.
8 Current Applications of AI in Medical Imaging 155

Another field of AI application in patient’s diagnostic classifi-


cation is Parkinson’s disease (PD), a prevalent neurodegenerative
disorder especially affecting motor and cognitive domains. In rou-
tine clinical practice, PD patients need to be timely distinguished
from atypical parkinsonian syndrome patients (APS) and healthy
controls to undergo the most appropriate clinical management.
For this purpose, MRI-based [15] and nuclear medicine-based
[16, 17] approaches have been proposed. Good accuracy perfor-
mances have also been obtained in the validation of AI-based
classification algorithms of glioma molecular subtypes (Jiang et
al. 2021, [18]) and grading [19], and breast tissue density [20].
Further oncological applications include computer-aided
colorectal polyp classification [21], classification of the origin
of brain metastasis [22], and malignancy assessment of breast
masses [23].
In summary, the perspective of classifying patients based on
AI-powered elaborations of imaging data is succulent, in onco-
logical as well as non-oncological settings. The most plausible
and direct way to integrate this information in clinical practice
is using it as a support for clinical diagnosis and considering
it along with multimodal data while maintaining the physician
at the center of the diagnostic process. Conversely, the lack of
human and financial resources, the tendency to centralization of
care, the overload of the existing infrastructure are somewhat
urging decision-makers to implement rapid and easy-to-automate
solutions, eventually skipping human supervision or intervention.
Of note, it is currently unclear which is the amount and even
type of evidence needed to accomplish such a transition to safely
integrating AI-based automation in diagnostic imaging.

8.4 Segmentation

As mentioned above, segmentation is a very useful activity for


calculating quantitative indices to guide disease diagnosis and
prognosis, but also for extracting radiomic features from the
segmented image. Manual segmentation is not a widespread
activity in routine clinics because it is time-consuming and
156 G. Di Salle et al.

operator dependent. This is the reason why many AI tools have


been recently implemented for the semi- and fully automated
contouring of lesions, organs, and tissues.
Automatic segmentation for volume quantification has been
recently introduced to improve selection of ischemic stroke
patients for fibrinolytic and endovascular therapy. A large number
of clinical trials, including EXTEND [24], EXTEND-IA [25],
DAWN [26], DEFUSE [27] were based on the use of a single
software for the analysis of CT perfusion imaging (RAPID-AI
iSchemaView), aimed at contouring and measuring the ischemic
core and penumbra. Since then, an increasing number of software
applications has been developed to serve as therapeutic decision
support in stroke patients triage. The widespread diffusion of
these software in clinical practice is based on specific AHA
guidelines [28] and has impacted profoundly the workflow
of stroke patients, with considerable advantages in terms of
efficiency gain and time savings [29]. Most of this software does
not have scientific validation by competent regulatory authorities,
and concerns can be raised regarding output variability across
different vendors, software, or calculation methods.
Segmentation is also the most time-consuming task in many
diagnostic cardiac-CT and MRI applications, where structural
volume measurements throughout the cardiac cycle can give
valuable information about anatomy and physiology [30]. Ded-
icated commercial algorithms are widespread in most diagnostic
centers and are mostly intended to the measurement of cardiac
chambers volume [31, 32], myocardial thickness [33], and great
vessels/coronary arteries segmentation [34].
In a recent paper, Monti et al. [35] implemented a CNN-based
approach to measure aortic diameters in nine different landmark
sites proposed by the American Heart Association (AHA) for the
follow-up of aneurism growth and rupture risk prediction.
Automatic segmentation is also a matter of current investi-
gation in oncologic imaging, where tumor volumetric parame-
ters give invaluable information about staging, radiation therapy
dosimetry, prognosis, treatment response. Investigated organs
8 Current Applications of AI in Medical Imaging 157

include brain [36], liver [37], lung [38], breast [39], head and neck
[40], rectum [41], and stomach [42].

8.4.1 Monitoring

Emerging evidence has been collected about the potentiality of


monitoring oncologic patients’ follow-up to predict prognostic
information. For example, radiomics- [43] and neural network-
based (Trebeschi et al. [53]) models have been developed to
predict survival of patients with metastatic urothelial cancer from
follow-up whole-body CT images. Integration of such algorithms
in clinical practice could give valuable additional information
to the established response evaluation criteria and ultimately
influence therapeutic management of oncologic patients.

8.4.2 Prediction

Disease outcome information is never just binary, as disease


severity classes profoundly influence quality of life, organiza-
tional needs, and healthcare costs. Within comparable durations
of disease, a valuable information to obtain is how the patient will
survive, allowing to shift the focus of healthcare to encompass
qualitative interventions on disease management. For example,
several attempts have been recently made to predict long-term
motor outcome in PD patients, using nuclear medicine baseline
data. More specifically, multi-dimensional databases of clinical
and radiological information were used to predict year-4 motor
UPDRS-III score [44, 45]. A recent study by Salmanpour et
al. [46] found three different disease clusters in longitudinal
progression of PD, by training an algorithm on a clinical-imaging
dataset. This kind of studies is only a small part of the AI-powered
innovation in medical imaging but is likely to impact future
healthcare organization, and especially resources allocation, in a
preferential manner.
158 G. Di Salle et al.

8.4.3 Additional Applications


8.4.3.1 Image Enhancement and Reconstruction
In the last years, the field of medical image processing has
been revolutionized by the availability of AI-powered image
enhancement and reconstruction tools.
Image enhancement is the process of reducing erroneous
information within the medical images, e.g. denoising¸ artifact
correction technique, and resolution enhancement. Conversely,
image reconstruction is a method to transform a series of images
into another one. Despite getting less attention compared to clini-
cal diagnostic tasks, the use of AI for image quality enhancement
and reconstruction may have an even higher potential in terms of
imaging cost reduction and safety.
Two reports have recently highlighted the good performances
obtained by state-of-the-art techniques [47, 48]. In particular, dose
reduction obtained using AI algorithms in pediatric radiology has
been quantified as 36–70% [47]. In adult applications, such as
abdominal CT scans for urinary tract calculi detection, reduction
rates up to 84% compared to usual iterative-reconstruction (IR)
algorithms have been documented with similar image quality
[48]. Available literature suggests that DL-based reconstruction
may also overcome the appearance defined as “waxy/plastic” [49]
of the newer reconstruction low-dose algorithms while simultane-
ously providing a better SNR and CNR.

8.4.4 Workload Reduction?

Intuitively, AI applications should simplify decision-making and


reduce radiologists’ workload by automatizing complex calcu-
lations, time-consuming handcraft segmentations, and careful
comparison with literature and textbooks for diagnosis and clas-
sifications. Curiously, this seems not be the case. In a literature
review by [50], it is reported that novel applications of AI in
radiology increase radiologists’ workload in approximately 48%
of the analyzed studies and decrease it in only 4%. On the one
hand, the common goal of AI applications is enhancing diagnostic
8 Current Applications of AI in Medical Imaging 159

accuracy and patient care; on the other hand, the radiology field
is not exempt from the inverse relationship between decision
accuracy and decision speed [51]. AI may find itself to improve
medical decision accuracy by powering segmentation, classifica-
tion, and detection, and, at the same time, worsen it by increasing
radiologists’ workload to and beyond their optimal functioning
limits. In the light of these considerations, as direct recipients of
AI-based revolution in radiology, we should be aware that this
revolution can only take place when appropriate resources and
organizational support can be ensured to radiologists.

8.5 Conclusions

Current literature is rich in potential AI applications in medical


imaging, involving multiple image modalities, covering all body
regions, and using innumerable variants of technically refined
ML algorithms. In the available works, there is considerable
variability regarding AI methods explainability, quality of the
ML design, and amount of evidence about single conditions. All
the previous factors influence the potential of single models to
be translated into clinical workflow. Model generalizability and
rigorous validation techniques are essential for integration into
real-world clinical scenarios. As these criteria become more and
more recognized, and biobanks start collecting publicly avail-
able imaging data, sparse stand-alone AI integration experiments
will hopefully converge into large trials with unquestionable
generalizability and outcomes scalable into clinical routine. The
simultaneous, across-the-board discussion of ethical and regu-
latory features of AI does not only influence the availability
of proposed techniques, but shapes current and future research
and defines the role of technological advancements in clinical
decision-making.
160 G. Di Salle et al.

References
1. Weikert T, Winkel DJ, Bremerich J, Stieltjes B, Parmar V, Sauter
AW, Sommer G. Automated detection of pulmonary embolism in CT
pulmonary angiograms using an AI-powered algorithm. Eur Radiol.
2020;30(12):6545–53. https://doi.org/10.1007/s00330-020-06998-0.
2. Schmuelling L, Franzeck FC, Nickel CH, Mansella G, Bingisser R,
Schmidt N, et al. Deep learning-based automated detection of pul-
monary embolism on CT pulmonary angiograms: no significant effects
on report communication times and patient turnaround in the emergency
department nine months after technical implementation. Eur J Radiol.
2021;141:109816.
3. Olive-Gadea M, Crespo C, Granes C, Hernandez-Perez M, Pérez De La
Ossa N, Laredo C, Urra X, Carlos Soler J, Soler A, Puyalto P, Cuadras P,
Marti C, Ribo M. Deep learning based software to identify large vessel
occlusion on noncontrast computed tomography. Stroke. 2020;51:3133–
7. https://doi.org/10.1161/STROKEAHA.120.030326.
4. Seyam M, Weikert T, Sauter A, Brehm A, Psychogios MN, Blackham
KA. Utilization of artificial intelligence–based intracranial hemorrhage
detection on emergent noncontrast CT images in clinical workflow.
Radiology. 2022;4(2):1–6. https://doi.org/10.1148/ryai.210168.
5. Yang S, Yoon HI, Kim JS. Deep-learning-based automatic detection
and segmentation of brain metastases with small volume for stereotactic
ablative radiotherapy. Cancer. 2022;14:2555.
6. Turk O, Ozhan D, Acar E, Cetin T, Yilmaz M. Automatic detection of
brain tumors with the aid of ensemble deep learning architectures and
class activation map indicators by employing magnetic resonance images.
Z Med Phys. 2022; https://doi.org/10.1016/j.zemedi.2022.11.010.
7. Rahimpour M, Boellaard R, Jentjens S, Deckers W, Goffin K, Koole M.
A multi-label CNN model for the automatic detection and segmentation
of gliomas using [18 F] FET PET imaging. Eur J Nucl Med Mol Imaging.
2023; https://doi.org/10.1007/s00259-023-06193-5.
8. Akilandeswari A, Sungeetha D, Joseph C, Thaiyalnayaki K, Baskaran K,
Ramalingam RJ, Al-lohedan H, Al-dhayan DM, Karnan M, Hadish KM.
Automatic detection and segmentation of colorectal cancer with deep
residual convolutional neural network. Evid Based Complement Alternat
Med. 2022;2022:3415603.
9. Models P, Othman E, Mahmoud M, Dhahri H, Abdulkader H, Mahmood
A, Ibrahim M. Automatic detection of liver cancer using hybrid pre-
trained models. Sensors. 2022;22:5429.
10. Wang X, Li H, Zheng P. Automatic detection and segmentation of ovarian
cancer using a multitask model in pelvic CT images. Oxid Med Cell
Longev. 2022;2022:6009107.
8 Current Applications of AI in Medical Imaging 161

11. Jaramillo M, Ruano J, Gómez M, Romero E. Automatic detection of


pancreatic tumors in endoscopic ultrasound videos using deep learning
techniques. In: Medical imaging 2022: ultrasonic imaging and tomogra-
phy, vol. 12038. Bellingham, WA: SPIE; 2022. p. 106–15.
12. Lu D, Popuri K, Ding GW, Balachandar R, Beg MF. Alzheimer’s disease
neuroimaging. I. Multimodal and multiscale deep neural networks for the
early diagnosis of Alzheimer’s disease using structural MR and FDG-
PET images. Sci Rep. 2018;8:5697.
13. Qiu S, Joshi PS, Miller MI, Xue C, Zhou X, Karjadi C, Chang GH, Joshi
AS, Dwyer B, Zhu S, Kaku M, Zhou Y, Alderazi YJ, Swaminathan A,
Kedar S, Saint-Hilaire MH, Auerbach SH, Yuan J, Sartor EA, Au R,
Kolachalama VB, et al. Development and validation of an interpretable
deep learning framework for Alzheimer’s disease classification. Brain.
2020;143(6):1920–33. https://doi.org/10.1093/brain/awaa137.
14. Alongi P, Laudicella R, Panasiti F, Stefano A, Comelli A, Giaccone
P, Arnone A, Minutoli F, Quartuccio N, Cupidi C, Arnone G, Piccoli
T, Grimaldi LME, Baldari S, Russo G. Radiomics analysis of brain
[(18)F]FDG PET/CT to predict Alzheimer’s disease in patients with
amyloid PET positivity: a preliminary report on the application of SPM
cortical segmentation, pyradiomics and machine-learning analysis. Diag-
nostics. 2022;12(4):933. https://doi.org/10.3390/diagnostics12040933.
15. Shinde S, Prasad S, Saboo Y, Kaushick R, Saini J, Pal PK, Ingalhalikar
M. Predictive markers for Parkinson’s disease using deep neural nets on
neuromelanin sensitive MRI. NeuroImage Clin. 2019;22:101748. https://
doi.org/10.1016/j.nicl.2019.101748.
16. Zhao Y, Wu P, Wu J, Brendel M, Lu J, Ge J, Tang C, Hong J, Xu Q,
Liu F, Sun Y, Ju Z, Lin H, Guan Y, Bassetti C, Schwaiger M, Huang
SC, Rominger A, Wang J, Zuo C, Shi K, et al. Decoding the dopamine
transporter imaging for the differential diagnosis of parkinsonism using
deep learning. Eur J Nucl Med Mol Imaging. 2022;49(8):2798–811.
https://doi.org/10.1007/s00259-022-05804-x.
17. Salmanpour MR, Shamsaei M, Saberi A, Hajianfar G, Soltanian-Zadeh
H, Rahmim A. Robust identification of Parkinson’s disease subtypes
using radiomics and hybrid machine learning. Comput Biol Med.
2021;129:104142. https://doi.org/10.1016/j.compbiomed.2020.104142.
18. Hsu WW, Guo JM, Pei L, Chiang LA, Li YF. A weakly supervised deep
learning-based method for glioma subtype classification using WSI and
mpMRIs. Sci Rep. 2022;12:6111. https://doi.org/10.1038/s41598-022-
09985-1.
19. Yu X, Wu Y, Bai Y, Han H, Chen L, Gao H, Wei H, Wang M.
A lightweight 3D UNet model for glioma grading. Phys Med Biol.
2022;67:155006.
20. Magni V, Interlenghi M, Cozzi A, Alì M, Salvatore C, Azzena AA,
Capra D, Carriero S, Della Pepa G, Fazzini D, Granata G, Monti
CB, Muscogiuri G, Pellegrino G, Schiaffino S, Castiglioni I, Papa S,
162 G. Di Salle et al.

Sardanelli F. Development and validation of an AI-driven mammographic


breast density classification tool based on radiologist consensus. Radiol
Artif Intell. 2022;4(2):e210199. https://doi.org/10.1148/ryai.210199.
21. Younas F, Usman M, Yan WQ. A deep ensemble learning method
for colorectal polyp classification with optimized network parameters.
Berlin: Springer; 2023. p. 2410–33.
22. Jiao T, Li F, Cui Y, Wang X, Li B, Shi F, Xia Y, Zhou Q, Zeng Q. Deep
learning with an attention mechanism for differentiating the origin of
brain metastasis using MR images. J Magn Reson Imaging. 2023; https:/
/doi.org/10.1002/jmri.28695.
23. Abdel Rahman AS, Belhaouari SB, Bouzerdoum A, Baali H, Alam T,
Eldaraa AM. Breast mass tumor classification using deep learning. In:
2020 IEEE International conference on informatics, IoT, and enabling
technologies (ICIoT), Doha, Qatar; 2020. p. 271–6. https://doi.org/
10.1109/ICIoT48696.2020.9089535.
24. Ma H, Campbell BCV, Parsons MW, et al. Thrombolysis guided
by perfusion imaging up to 9 hours after onset of stroke. N Engl
J Med. 2019;380:1795–803. pmid:31067369. https://doi.org/10.1056/
NEJMoa1813046.
25. Campbell BC, Mitchell PJ, Kleinig TJ, Dewey HM, Churilov L, Yassi
N, Yan B, Dowling RJ, Parsons MW, Oxley TJ, Wu TY, Brooks M,
Simpson MA, Miteff F, Levi CR, Krause M, Harrington TJ, Faulder KC,
Steinfort BS, Priglinger M, EXTEND-IA Investigators. Endovascular
therapy for ischemic stroke with perfusion-imaging selection. N Engl J
Med. 2015;372(11):1009–18. https://doi.org/10.1056/NEJMoa1414792.
26. Nogueira RG, Jadhav AP, Haussen DC, Bonafe A, Budzik RF, Bhuva
P, Yavagal DR, Ribo M, Cognard C, Hanel RA, Sila CA, Hassan AE,
Millan M, Levy EI, Mitchell P, Chen M, English JD, Shah QA, Silver FL,
Pereira VM, DAWN Trial Investigators. Thrombectomy 6 to 24 hours
after stroke with a mismatch between deficit and infarct. N Engl J Med.
2018;378(1):11–21. https://doi.org/10.1056/NEJMoa1706442.
27. Albers GW, Marks MP, Kemp S, Christensen S, Tsai JP, Ortega-Gutierrez
S, McTaggart RA, Torbey MT, Kim-Tenser M, Leslie-Mazwi T, Sar-
raj A, Kasner SE, Ansari SA, Yeatts SD, Hamilton S, Mlynash M,
Heit JJ, Zaharchuk G, Kim S, Carrozzella J, DEFUSE 3 Investigators.
Thrombectomy for stroke at 6 to 16 hours with selection by perfusion
imaging. N Engl J Med. 2018;378(8):708–18. https://doi.org/10.1056/
NEJMoa1713973.
28. Powers WJ, Rabinstein AA, Ackerson T, Adeoye OM, Bambakidis NC,
Becker K, Biller J, Brown M, Demaerschalk BM, Hoh B, Jauch EC,
Kidwell CS, Leslie-Mazwi TM, Ovbiagele B, Scott PA, Sheth KN,
Southerland AM, Summers DV, Tirschwell DL. Guidelines for the early
Management of Patients with Acute Ischemic Stroke: 2019 update to the
2018 guidelines for the early management of acute ischemic stroke: a
guideline for healthcare professionals from the American Heart Associa-
8 Current Applications of AI in Medical Imaging 163

tion/American Stroke Association. Stroke. 2019;50(12):e344–418. https:/


/doi.org/10.1161/STR.0000000000000211.
29. Vagal A, Saba L. Artificial intelligence in “code stroke”—a paradigm
shift: do radiologists need to change their practice? Radiol Artif Intell.
2022;4(2):6–8. https://doi.org/10.1148/ryai.210204.
30. Yang DH. Application of artificial intelligence to cardiovascular com-
puted tomography. Korean J Radiol. 2021;22(10):1597–608. Epub
2021 Jul 26. PMID: 34402240; PMCID: PMC8484158. https://doi.org/
10.3348/kjr.2020.1314.
31. Bruns S, Wolterink JM, Takx RAP, van Hamersvelt RW, Suchá D,
Viergever MA, et al. Deep learning from dual-energy information for
whole-heart segmentation in dual-energy and single-energy non-contrast-
enhanced cardiac CT. Med Phys. 2020;47:5048–60.
32. Baskaran L, Maliakal G, Al’Aref SJ, Singh G, Xu Z, Michalak K, et al.
Identification and quantification of cardiovascular structures from CCTA:
an end-to-end, rapid, pixel-wise, deep-learning method. JACC Cardiovasc
Imaging. 2020;13:1163–71.
33. Koo HJ, Lee JG, Ko JY, Lee G, Kang JW, Kim YH, et al. Automated
segmentation of left ventricular myocardium on cardiac computed tomog-
raphy using deep learning. Korean J Radiol. 2020;21:660–9.
34. Morris ED, Ghanem AI, Dong M, Pantelic MV, Walker EM, Glide-Hurst
CK. Cardiac substructure segmentation with deep learning for improved
cardiac sparing. Med Phys. 2020;47:576–86.
35. Monti CB, van Assen M, Stillman AE, Lee SJ, Hoelzer P, Fung GSK,
Secchi F, Sardanelli F, De Cecco CN. Evaluating the performance
of a convolutional neural network algorithm for measuring thoracic
aortic diameters in a heterogeneous population. Radiol Artif Intell.
2022;4(2):e210196. https://doi.org/10.1148/RYAI.210196.
36. Chen W, Zhou W, Zhu L, Cao Y, Gu H, Yu B. MTDCNet: a 3D
multi-threading dilated convolutional network for brain tumor auto-
matic segmentation. J Biomed Inform. 2022;133(August):104173. https:/
/doi.org/10.1016/j.jbi.2022.104173.
37. Manjunath RV, Kwadiki K. Biomedical engineering advances modified
U-NET on CT images for automatic segmentation of liver and its
tumor. Biomed Eng Adv. 2022;4(June):100043. https://doi.org/10.1016/
j.bea.2022.100043.
38. Yang J, Wu B, Li L, Cao P, Zaiane O. MSDS-UNet: a multi-scale deeply
supervised 3D U-net for automatic segmentation of lung tumor in CT.
Comput Med Imaging Graph. 2021;92:101957. https://doi.org/10.1016/
j.compmedimag.2021.101957.
39. Yue W, Zhang H, Zhou J, Li G. Deep learning-based automatic seg-
mentation for size and volumetric measurement of breast cancer on
magnetic resonance imaging. Front Oncol. 2022;12:984626. https://
doi.org/10.3389/fonc.2022.984626.
164 G. Di Salle et al.

40. Abed M, Khanapi M, Ghani A, Ibraheem R, Ahmed D, Khir M.


Artificial neural networks for automatic segmentation and identification
of nasopharyngeal carcinoma. J Comput Sci. 2017;21:263–74. https://
doi.org/10.1016/j.jocs.2017.03.026.
41. Zhu H-T, Sun S. Automatic segmentation of rectal tumor on diffusion-
weighted images by deep learning with U-Net. Appl Clin Med Phys.
2021;22:324. https://doi.org/10.1002/acm2.13381.
42. Li H, Liu B, Zhang Y, Fu C, Han X, Du L. 3D IFPN: improved feature
pyramid network for automatic segmentation of gastric tumor. Front
Oncol. 2021;11:618496. https://doi.org/10.3389/fonc.2021.618496.
43. Park KJ, Lee JL, Yoon SK, Heo C, Park BW, Kim JK. Radiomics-
based prediction model for outcomes of PD-1/PD-L1 immunotherapy
in metastatic urothelial carcinoma. Eur Radiol. 2020;30(10):5392–403.
https://doi.org/10.1007/s00330-020-06847-0.
44. Rahmim A, Huang P, Shenkov N, Fotouhi S, Davoodi-Bojd E, Lu
L, Mari Z, Soltanian-Zadeh H, Sossi V. Improved prediction of out-
come in Parkinson’s disease using radiomics analysis of longitudinal
DAT SPECT images. NeuroImage Clin. 2017;16:539–44. https://doi.org/
10.1016/j.nicl.2017.08.021.
45. Tang J, Yang B, Adams MP, Shenkov NN, Klyuzhin IS, Fotouhi S,
Davoodi-Bojd E, Lu L, Soltanian-Zadeh H, Sossi V, Rahmim A. Artifi-
cial neural network-based prediction of outcome in Parkinson’s disease
patients using DaTscan SPECT imaging features. Mol Imaging Biol.
2019;21(6):1165–73. https://doi.org/10.1007/s11307-019-01334-5.
46. Salmanpour MR, Shamsaei M, Hajianfar G, Soltanian-Zadeh H, Rahmim
A. Longitudinal clustering analysis and prediction of Parkinson’s disease
progression using radiomics and hybrid machine learning. Quant Imaging
Med Surg. 2022;12(2):906–19. https://doi.org/10.21037/qims-21-425.
47. Ng CKC. Artificial intelligence for radiation dose optimization in pedi-
atric radiology: a systematic review. Children. 2022;9(7):1–12. https://
doi.org/10.3390/children9071044.
48. McLeavy CM, Chunara MH, Gravell RJ, Rauf A, Cushnie A, Staley Tal-
bot C, Hawkins RM. The future of CT: deep learning reconstruction. Clin
Radiol. 2021;76(6):407–15. https://doi.org/10.1016/j.crad.2021.01.010.
49. Laurent G, Villani N, Hossu G, Rauch A, Noël A, Blum A, Gondim Teix-
eira PA. Full model-based iterative reconstruction (MBIR) in abdominal
CT increases objective image quality, but decreases subjective accep-
tance. Eur Radiol. 2019;29(8):4016–25. https://doi.org/10.1007/s00330-
018-5988-8.
50. Kwee TC, Kwee RM. Workload of diagnostic radiologists in the fore-
seeable future based on recent scientific advances: growth expectations
and role of artificial intelligence. Insights Imaging. 2021;12:88. https://
doi.org/10.1186/s13244-021-01031-4.
51. Alexander R, Waite S, Bruno MA, Krupinski EA, Berlin L, Macknik
S, Martinez-Conde S. Mandating limits on workload, duty, and speed
8 Current Applications of AI in Medical Imaging 165

in radiology. Radiology. 2022;304(2):274–82. https://doi.org/10.1148/


radiol.212631.
52. Ginat DT. Analysis of head CT scans flagged by deep learning software
for acute intracranial hemorrhage. Neuroradiology 2020;62(3):335–340.
https://doi.org/10.1007/s00234-019-02330-w.
53. Trebeschi S, Bodalal Z, van Dijk N, Boellaard TN, Apfaltrer P,
Tareco Bucho TM, Nguyen-Kim TDL, van der Heijden MS, Aerts
HJWL, Beets-Tan RGH. Development of a prognostic AI-Monitor for
metastatic urothelial cancer patients receiving immunotherapy. Front
Oncol. 11(April) 2021. https://doi.org/10.3389/fonc.2021.637804.

You might also like