0% found this document useful (0 votes)
56 views169 pages

Basics of Image Processing

Uploaded by

rcollins666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views169 pages

Basics of Image Processing

Uploaded by

rcollins666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 169

Imaging Informatics for Healthcare Professionals

Ángel Alberich-Bayarri
Fuensanta Bellvís-Bataller Editors

Basics of
Image
Processing
The Facts and Challenges of
Data Harmonization to Improve
Radiomics Reproducibility
Imaging Informatics for Healthcare
Professionals
Series Editors
Peter M. A. van Ooijen, University Medical Center Groningen
University of Groningen, GRONINGEN, Groningen
The Netherlands
Erik R. Ranschaert, Department of Radiology, ETZ Hospital
Tilburg, The Netherlands
Annalisa Trianni, Department of Medical Physics, ASUIUD
UDINE, Udine, Italy
Michail E. Klontzas, Institute of Computer Science, Foundation
for Research and Technology (FORTH) & University Hospital
of Heraklion, Heraklion, Greece
The series Imaging Informatics for Healthcare Professionals is
the ideal starting point for physicians and residents and students
in radiology and nuclear medicine who wish to learn the basics in
different areas of medical imaging informatics. Each volume is a
short pocket-sized book that is designed for easy learning and ref-
erence.
The scope of the series is based on the Medical Imaging
Informatics subsections of the European Society of Radiology
(ESR) European Training Curriculum, as proposed by ESR and
the European Society of Medical Imaging Informatics (EuSoMII).
The series, which is endorsed by EuSoMII, will cover the curri-
cula for Undergraduate Radiological Education and for the level I
and II training programmes. The curriculum for the level III train-
ing programme will be covered at a later date. It will offer fre-
quent updates as and when new topics arise.
Ángel Alberich-Bayarri
Fuensanta Bellvís-Bataller
Editors

Basics of Image
Processing
The Facts and Challenges
of Data Harmonization
to Improve Radiomics
Reproducibility
Editors
Ángel Alberich-Bayarri Fuensanta Bellvís-Bataller
Founder and CEO VP of Clinical Studies
Quibim SL Quibim SL
Valencia, Spain Valencia, Spain

ISSN 2662-1541     ISSN 2662-155X (electronic)


Imaging Informatics for Healthcare Professionals
ISBN 978-3-031-48445-2    ISBN 978-3-031-48446-9 (eBook)
https://doi.org/10.1007/978-3-031-48446-9

© EuSoMII 2023
This work is subject to copyright. All rights are solely and exclusively licensed by
the Publisher, whether the whole or part of the material is concerned, specifically
the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by
similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service
marks, etc. in this publication does not imply, even in the absence of a specific
statement, that such names are exempt from the relevant protective laws and
regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and
information in this book are believed to be true and accurate at the date of
publication. Neither the publisher nor the authors or the editors give a warranty,
expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with
regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature


Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


Preface

In the rapidly evolving landscape of medical imaging and can-


cer research, radiomics has emerged as a promising field with the
potential to revolutionize diagnosis, treatment, and improve
patient outcomes. Radiomics delves into the extraction of quanti-
tative features from medical images with the aim of transforming
them into actionable predictions.
With the emergence of radiomics, there come associated chal-
lenges that could potentially hinder the growth of the field in clin-
ical practice. One such challenge is data harmonization, which
plays an essential role in ensuring the reproducibility, robustness,
and generalizability of radiomic studies.
In an ideal scenario, images acquired under the same condi-
tions should exhibit consistent technical quality with minimal
deviations. However, in a real-world, multi-centric setting, vari-
ous factors come into play, including different manufacturers,
scanners, and a high degree of variability originating from the
different sites where the images are acquired. These variations
inevitably lead to shifts in the signal intensities of the image vox-
els, even when similar acquisition protocols are employed.
When different manufacturers, scanners, or acquisition proto-
cols are involved, this introduces image variability that can poten-
tially impact the development of generalizable AI predictive
models. Additionally, it can affect the reproducibility in the calcu-
lation of quantitative imaging biomarkers.
This book delves into this crucial aspect of radiomics repro-
ducibility, aiming to provide a comprehensive exploration of the
intricacies surrounding data harmonization. We bring together a

v
vi Preface

diverse group of experts from the fields of radiology, engineering,


data science, and oncology, who collectively share their invalu-
able insights and experiences.
Our journey begins with a foundational understanding of
radiomics and its transformative potential in the realm of preci-
sion medicine. We explore the principles of image formation in
different modalities, followed by the different methodologies
employed in radiomic features extraction and the significant
strides that have been made in this field. The book then navigates
through the description of the factors influencing reproducibility
in radiomic studies ending with the fundamental principles of data
harmonization both in the image and features domains. We also
recognize the limitations and potential biases inherent in the
explained methodologies and emphasize the need for a balanced
and nuanced approach for data harmonization that depends on the
specific application, available data, and resources. Furthermore,
we address the critical role of standardization and the initiatives
that have been undertaken to establish guidelines and best prac-
tices in radiomics research. We acknowledge that collaboration
and open data sharing strategies are vital components to foster
reproducibility and accelerate progress in this field.
We express our sincere gratitude to all the authors who have
generously shared their expertise, experiences, and passion in the
creation of this book. Additionally, we extend our appreciation to
the readers whose curiosity and interest will drive progress in
this field. Together, let us embark on a journey into the
captivating world of data harmonization, where we strive to
enhance radiomics reproducibility and make a meaningful
clinical impact on diagnosis, prognosis, and treatment planning.

Valencia, Spain Ángel Alberich-Bayarri


Valencia, Spain  Fuensanta Bellvís-Bataller
Contents

1 Era
 of AI Quantitative Imaging��������������������������������������  1
L. Marti-Bonmati and L. Cerdá-­Alberich
2 Principles
 of Image Formation in the Different
Modalities�������������������������������������������������������������������������� 27
P. A. García-Higueras and D. Jimena-­Hermosilla
3 How
 to Extract Radiomic Features from Imaging�������� 61
A. Jimenez-Pastor and G. Urbanos-­García
4 Facts
 and Needs to Improve Radiomics
Reproducibility ���������������������������������������������������������������� 79
P. M. A. van Ooijen, R. Cuocolo, and N. M. Sijtsema
5 Data
 Harmonization to Address the Non-­biological
Variances in Radiomic Studies���������������������������������������� 95
Y. Nan, X. Xing, and G. Yang
6 Harmonization
 in the Image Domain ����������������������������117
F. Garcia-Castro and E. Ibor-Crespo
7 Harmonization
 in the Features Domain ������������������������145
J. Lozano-Montoya and A. Jimenez-­Pastor

vii
Era of AI Quantitative
Imaging 1
L. Marti-Bonmati
and L. Cerdá-­Alberich

1.1 Precision Medicine Needs Precision


Imaging

In the quest for personalized healthcare, precision medicine has


emerged as a transformative approach, tailoring treatments to the
unique genetic expression and characteristics of each patient. By
considering subject variability in genes, environment exposure
and lifestyle, healthcare professionals aim to enhance the efficacy
of therapies and minimize potential side effects. For years, dis-
eases were considered to affect patients in a similar way and med-
ical treatments were mainly designed for the “average patient.” As
a result of this one-size-fits-all approach, treatments are very suc-
cessful for some patients but not for others, as patient and disease
heterogeneity are always present. While the advancements in
genomics have been at the forefront of this revolution, another
crucial aspect that stands to revolutionize the landscape of preci-
sion medicine is precision imaging. Genetic information alone

L. Marti-Bonmati (*) · L. Cerdá-Alberich


Biomedical Imaging Research Group (GIBI230), La Fe Health Research
Institute, Valencia, Spain
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature 1


Switzerland AG 2023
Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image
Processing, Imaging Informatics for Healthcare Professionals,
https://doi.org/10.1007/978-3-031-48446-9_1
2 L. Marti-Bonmati and L. Cerdá-Alberich

may not provide a comprehensive understanding of the patient’s


condition and disease’s intricacies. Precision imaging may serve
as the gateway to unveiling these hidden facets.
The drivers of precision medicine innovation are based on
health data accessibility and biological treatment developments.
First, more and more data with biopathological understanding is
becoming available, the use of real-world (RW) data imaging is
increasing and data extraction and analytical tools are being
improved, allowing to have data very difficult or not feasible
under the basic guideline of traditional randomized controlled tri-
als (RCT). Secondly, more therapy choices exist as novel classes
of targeted therapies and pharmacokinetic tailoring, like blood
transfusion type matching or targeted cancer therapies by molecu-
lar phenotyping.
Conventional medical imaging has long been a cornerstone of
clinical practice, offering valuable glimpses into the inner work-
ings of the human body. X-rays, computed tomography (CT)
scans, and magnetic resonance imaging (MRI) have proven
invaluable in diagnosing and monitoring diseases. However, pre-
cision medicine demands a higher level of detail, one that tran-
scends what traditional imaging modalities can provide.
In recent years, advancements in imaging technology have
enabled us to visualize molecular and cellular processes in unprec-
edented ways. Positron emission tomography (PET) scans com-
bined with radiotracers can detect metabolic changes at the
cellular level. Functional MRI (fMRI) can map neural activity,
allowing us to understand brain function better. Techniques like
diffusion tensor imaging (DTI) offer insights into the microstruc-
ture of tissues and nerve fiber tracts. These cutting-edge imaging
tools offer the potential to identify subtle changes that are indica-
tive of early-stage diseases and predict a patient’s response to spe-
cific treatments.
This increase in knowledge will allow to take decisions at the
very initial steps or even before the disease is present, as in the
recognition of pre-metastatic niches changing treatment from sur-
1 Era of AI Quantitative Imaging 3

gery to neoadjuvancy in some tumors. Early tumor detection of


microscopic lesions before macroscopic size may also allow bet-
ter treatment allocation, as in early microscopic regional lymph
node invasion in breast cancer [1]. In macroscopic tumors, a pre-
cise lesion characterization will avoid biopsy sampling inaccura-
cies, and the identification of heterogeneous habitats with different
behaviors in different locations will help target biopsies, as in
neuroblastoma cancer patients [2]. Patients also need an accurate
disease staging map to identify the full anatomical tumor exten-
sion even before the invasion is evident, such as the microvascular
extension evaluation in hepatocellular carcinoma tumors [3].
Knowing the expected patients’ overall prognosis will allow their
adequate stratification before treatment allocation, avoiding sam-
pling biases.
Optimization of treatment options is quite relevant in those
cases were either pathology or liquid biopsy might not be suffi-
cient, such as in immunotherapy in solid tumors. Imaging can also
guide target focal therapies, such as radiofrequency ablation of
osteoid osteomas or dose painting radiotherapy. Images are also
used to evaluate and grade the response to treatment, defining next
steps in the management of the disease, using scores such as the
RECIST 1.1 or RANO. Radiologists are proud of the essential
role of medical imaging in daily clinical practice. Radiologist
involvement is further fostered by the development of new techni-
cal devices with improved sensitivity to small changes and abnor-
malities, such as spectral CT or contrast-enhanced tomosynthesis.
By combining new devices and acquisition protocols with compu-
tational and artificial advanced algorithms to extract quantitative
insights from the medical images, radiologists are improving both
the understanding of the expression that new pathobiological
pathways have on images (Fig. 1.1), such as the regional neoan-
giogenesis profiles in glioblastoma [4], and the new clinical needs,
such as the inflammatory and fibrotic progression in patients with
metabolic liver disease.
4 L. Marti-Bonmati and L. Cerdá-Alberich

Fig. 1.1 The art of science and life representation based on phenotype imag-
ing for personalized classification and prediction of clinical outcomes to
achieve a diagnostic gain with respect to standard of care clinical practice

1.2 Transforming Clinical Care


from Qualitative to Quantitative

Imaging analysis is crucial for tumor detection, staging, and fol-


low-­up. Cancer patients are stratified based on tumor properties,
such as size and shape, invasion into lymph nodes and extension
toward distant organs. All this information was summarized into
the TNM international staging system, and radiologists included
this classification within the radiology report. With the inputs of
pathology, molecular and genetic information, new grades, and
subtypes were defined and most treatments were reallocated.
However, today we recognize that further stratification is still
needed to properly allocate treatment. Also, the need to recognize
tumor phenotypes and heterogeneity organization as a driving
tool toward treatment allocation strategies is based in several
facts. The same genetic and pathologic expression can have dif-
ferent phenotypic and aggressiveness behaviors; tumor vary
locally as different habitats develop due to differences in cellular
clones, microenvironment, and stroma evolution; and distant sites
may behave different genetic and biological expressions than the
primary tumor. Different sites in different organs have a distinct
and specific extracellular matrix and cellular composition com-
1 Era of AI Quantitative Imaging 5

pared with that of the originating site. Metastases are usually bio-
logically different from the primary tumor. Therefore, imaging
has the potential to help targeting treatments if cancer hallmarks
and biological behavior are estimated from them.
Radiomics and imaging biomarkers are subrogated features
and parameters extracted from medical images, providing quanti-
tative information as the regional distribution and magnitude of
the evaluated property. They can also be clustered as nosological
signatures by the combination of relevant features into a single
value. This information is resolved in space, as parametric maps,
and time, through delta analysis of longitudinal changes.
Artificial intelligence (AI) offers a paradigm shift toward data-­
driven tools decision-making that is revolutionizing medicine. AI
can be used to improve the process of data acquisition (such as
faster and higher quality MRI), extract new information from exist-
ing data (such as data labeling, lesion detection and segmentation,
and deep radiomics extraction for patients’ stratification), and gen-
erate prediction on future disease-related events (such as predictive
models on therapy response, and time-to-events for patient out-
comes). Nowadays, AI-powered imaging is being widely used in
cancer care, providing more reliable diagnosis and early detection,
improving screening results, adjusting follow-up schemes, aiding
in the discovery of new drugs, grading aggressiveness, defining best
treatments, and improving final prognostic outcomes (Fig. 1.2).

Fig. 1.2 Diagram of the AI and medical imaging innovation research path-
way, containing aspects related to the clinical question to answer, the data to
be employed, the model to be developed to predict a particular clinical out-
come and the proposed improvements for sustainability and reproducibility
of research
6 L. Marti-Bonmati and L. Cerdá-Alberich

1.2.1 Automated Methods Capable


of Quantifying Imaging Features Related
to Clinical Endpoints

Handcrafted computational methods to extract radiomics and


imaging biomarkers suffer from variability and low reproducibil-
ity. This limitation is due to the inherent difference of medical
images obtained from one machine to the other, based on the large
variability of acquisition and reconstruction protocols. Vendors
and technicians modified the way images are obtained to provide
radiologists with images having the highest possible quality for
their subjective clinical evaluation. Unfortunately, this pathway
introduces a huge spectrum of differences due to geometrical and
contrast dispersion. To partially avoid this issue, clinical trials
forced centers to use similar image protocols. Dealing with real-­
word evidence, standardization of image acquisition protocols
will surely never happen (new machines, new releases, different
approaches).
As a further challenge, similar image acquisition protocols can
provide different biomarkers results, and repeatability (test-retest
reliability, variation in measurements taken by a single person or
instrument on the same item and under the same conditions) and
reproducibility (replicate the study in different locations and dif-
ferent equipment by different people) studies usually show dis-
crepancies. To minimize this reproducibility crisis, calibration
methods (comparison with a known magnitude for correctness)
can be applied, introducing corrections such as intraclass correla-
tion coefficient or linear regression. Also, different image prepa-
ration steps were applied before biomarkers extraction to
transform source images into a common framework via resizing,
intensity normalization, and noise reduction. Unfortunately, all
these measures were not sufficient for most applications and only
a few had succeeded outside the oncology field, such as the proton
density fat fraction and R2* measures for the non-invasive liver
fat and iron calculation [5, 6].
There is a lack of availability of centralized and distributed
repositories with quality large, labeled, and high complexity
1 Era of AI Quantitative Imaging 7

imaging data [7]. Lack of standardization of cancer-related health


data is hampering the use of AI in cancer care mainly due to dif-
ficulties to access and share patient’s data and to test, validate,
certificate, and audit AI algorithms. Data standardization, interop-
erability, biases, completeness, safety, privacy, ethical, and regu-
latory sharing aspects are crucial for the secondary use of data in
predictive analytics. Massive data extraction, multicenter obser-
vational studies, and federated machine learning (ML) approaches
are fostering the impact of imaging in medicine [8].
Real world data (RWD) heterogeneity, due to routine clinical
practice variability between and within sites, datasets size and
accessibility limitations capturing the full complexity of biologi-
cal diversity, are the main forces responsible of the reproducibility
crisis. The scientific method tries to avoid biases when proving
facts and causal relations, but heterogeneity and diversity biases
can only be minimized. As standardization in data acquisition will
never be achieved (so many different vendors and changing plat-
forms, releases and protocols), data harmonization is the only fea-
sible solution. AI will have a role allowing data harmonization
and capturing diverse patterns.

1.2.2 AI-Based Methods as Gold-Standard


for Imaging Biomarkers

AI-based methods are being increasingly used in medical imaging


for the extraction of radiomics and imaging biomarkers (Fig. 1.3),
as large and standardized imaging repositories are currently being
built through different regional, national, and European initiatives
[9, 10]. Given the ability of AI tools to analyze thousands of
images and develop its own expertise, the global AI in the medical
imaging market is projected to grow significantly in the upcoming
years. Some of the most relevant AI-based methods for the extrac-
tion of imaging biomarkers are encountered in the following med-
ical imaging areas:
8 L. Marti-Bonmati and L. Cerdá-Alberich

Fig. 1.3 Schema of the AI-based workflow in medical imaging and oncol-
ogy, including tumor detection and segmentation, obtention of hallmarks in
terms of parametric maps, extraction of diagnosis models and tools for the
prediction of aggressiveness, overall survival, angiogenesis, cellularity, and
relationships between phenotyping–genotyping, for the development of a
Clinical Decision Support System (CDSS) that may impact decision on treat-
ment [based on the probability of treatment response, confidence level,
impact on radiotherapy (RT), etc.]. *DW diffusion-weighted, DCE dynamic
contrast-enhanced, MR magnetic resonance

I mage Acquisition and Reconstruction


AI can help automate image acquisition and workflows, stream-
line processes, and improve patient care. While AI-based methods
are still being developed and tested, they are showing promise in
creating faster and more reliable US/CT/MR/PET scans, improv-
ing the efficiency of the image acquisition process and the quality
of the reconstructed images [11]. AI can help with tasks such as
planning, physiological tracking, parameter optimization, noise
and artifact reduction, and quality assessment. Deep Learning
(DL) algorithms may aid in the transformation of raw k-space
data to image data and specifically examine accelerated imaging,
noise reduction and artifact suppression. Recent efforts in these
areas show that deep learning-based algorithms can eclipse con-
ventional reconstruction methods in terms of lower acquisition
times, improved image quality and higher computational effi-
ciency across all clinical applications, such as brain, musculoskel-
etal, cardiac, and abdominal imaging [12]. AI-based DL
1 Era of AI Quantitative Imaging 9

reconstruction and post-processing techniques can consistently


improve diagnostic image quality at the lowest attainable source
signal across all patients and procedures, far beyond what is pos-
sible with current reconstruction techniques. This presents a huge
step for image optimization programs [13].

Image Harmonization
One of the main challenges when developing AI models with
RWD is the large image heterogeneity caused by many different
vendors, scanners, protocols, acquisition parameters, and clinical
practice. One of the most promising areas of research regarding
image harmonization is the use of generative adversarial networks
(GANs) [14] to generate synthetic images which belong to a new
common framework space of standardized imaging data. GANs
make use of a generator and a discriminator to improve their abil-
ity of (1) creating new images (fake) as similar as possible to the
reference images (real) used as ground truth and (2) distinguish-
ing the real images from the fake ones. This method allows an
effective and efficient learning procedure where the only require-
ment is to have a well-defined ground truth; for instance, we may
target images in a particular imaging domain (e.g., images belong-
ing to a specific manufacturer, scanner, magnetic field strength or
type of weighting in MR images). Additionally, if paired images
are not available, the CycleGAN-based architecture is a great
alternative solution, which is a popular DL model used for image-­
to-­image translation tasks without paired examples. The models
are trained in an unsupervised manner using a collection of images
from the source and target domain that do not need to be related
in any way.
However, in the case of aiming to generate images in a new
common standardized imaging data space, we may not be able to
define the specific characteristics for these images to achieve a
better resolution and lower noise. Potential solutions may include
the use of the frequency space, which allows to isolate specific
components of the image, keeping its main information and
excluding the one related to its contrast, which is at the core of
image acquisition heterogeneity. This strategy can be used in
combination with autoencoders, which have demonstrated a good
10 L. Marti-Bonmati and L. Cerdá-Alberich

performance in image reconstruction even when large parts of the


images are removed.
Other relevant harmonization techniques include distribution-­
based methods, such as location-scale strategies (e.g., ComBat to
address the heterogeneity of cortical thickness, surface area and
subcortical volumes caused by various scanners and sequences
[15] or to harmonize the radiomic features extracted across multi-
center MR datasets [16]), and image processing techniques, such
as image filtering, physical-size resampling, standardization, and
normalization [17]. These harmonization techniques aim at reduc-
ing batch effects in quantitative imaging feature extraction and,
therefore, decreasing the variability observed across different
manufacturers and acquisition protocols.

I mage Synthesis for Data Augmentation


Image synthesis for data augmentation is a technique widely used
in computer vision and ML tasks to improve the performance and
robustness of models. Data augmentation involves generating
additional training data by applying various transformations to
existing images, such as rotations, translations, scaling, and flip-
ping. This augmentation process helps the model generalize better
by exposing it to a wider range of variations while reducing over-
fitting.
Image synthesis techniques play a crucial role in data augmen-
tation by creating new images that closely resemble the original
dataset while introducing controlled variations. These synthesized
images can include realistic deformations, different lighting con-
ditions, and other visual changes. By integrating these synthetic
images with the original dataset, the model becomes more resil-
ient to variations in the real-world scenarios.
There are several approaches to image synthesis for data aug-
mentation. One common technique is geometric transformations,
where images are modified by applying operations like rotation,
translation, scaling, and shearing. These transformations can sim-
ulate changes in perspective or object position, enhancing the
model’s ability to recognize objects from different viewpoints.
Another method is to alter the color and texture properties of
images. This can involve changing the brightness, contrast,
1 Era of AI Quantitative Imaging 11

s­ aturation, or hue of the original images. Adding noise, blurring,


or sharpening effects can also be applied to simulate variations in
image quality or focus. These modifications allow the model to
adapt to different lighting conditions and improve its robustness
against image distortions.
AI algorithms can also be employed for medical imaging data
augmentation. Different types of deep generative models, such as
variational autoencoders (VAEs) [18] and GANs, can learn the
underlying distribution of the training dataset and generate new
images that resemble the original data.
Regarding clinical applications, synthetic images can improve
the accuracy and robustness of medical imaging tasks (classifica-
tion, regression, segmentation) in several ways. For instance, they
can be used to generate additional data to improve model perfor-
mance and avoid overfitting. Additionally, synthetic images can
be used to address limited data and privacy issues, as they can be
generated to provide additional data for training models and to
anonymize patient data. Synthetic images can also be generated
with their corresponding segmentation masks to aid segmentation
network generalization and adaptation, which can improve the
robustness of segmentation models.
By leveraging image synthesis techniques, researchers and
practitioners can create augmented datasets with a larger and
more diverse range of samples. With a more robust and diverse
training set, models are better equipped to handle real-world sce-
narios and exhibit improved performance, accuracy, and reliabil-
ity.
However, the use of synthetic medical images raises several
methodological considerations that should be carefully consid-
ered to ensure the proper use of this technology. The first one is
the potential for bias in the data used to generate the synthetic
images, which can lead to biased models and inaccurate diagno-
ses. Another concern is the privacy of patient data, as synthetic
images can be used to anonymize patient data, but there might be
a risk of re-identification if the synthetic images are not properly
de-identified.
12 L. Marti-Bonmati and L. Cerdá-Alberich

Image Segmentation
Volume of interest segmentation plays a crucial role in various
aspects of medical applications, such as quantifying the size and
shape of organs in population studies, detecting and extracting
lesions in disease analysis, defining computer-aided treatment
volumes, and surgical planning, among others. While manual seg-
mentation by medical experts was considered the ground truth, it
is expensive, time-consuming, and prone to disagreements among
readers. On the other hand, automatic segmentation methods offer
faster, cost-effective, and more reproducible results after manual
checking and editing [19].
Traditionally, segmentation relied on classical techniques like
region growing [20], deformable models [21], graph cuts [22],
clustering methods [23], and Bayesian approaches [24]. However,
in recent years, DL methods have surpassed these classical hand-
crafted techniques, achieving unprecedented performance in vari-
ous medical image segmentation tasks [25, 26]. Recent reviews
and advancements in DL for medical image segmentation are
available, focusing on improving network architecture, loss func-
tions, and training procedures [27]. Remarkably, it has been dem-
onstrated that standard DL models can be trained effectively using
limited labeled training images by making use of several transfer
learning techniques [28].
Although there is considerable variation in proposed network
architectures, they all share a common foundation: the use of con-
volution as the primary building block. Some alternative network
architectures have explored recurrent neural networks [29] and
attention mechanisms [30] but still rely on convolutional opera-
tions. However, recent studies suggest that a basic fully convolu-
tional network (FCN) with an encoder–decoder structure can
handle diverse segmentation tasks with comparable accuracy to
more complex architectures [31].
Convolutional neural networks (CNNs), including FCNs, owe
their effectiveness in modeling and analyzing images to key prop-
erties such as local connections, parameter sharing, and transla-
tion equivariance [32]. These properties provide CNNs with a
strong and valuable inductive bias, enabling them to excel in
1 Era of AI Quantitative Imaging 13

v­ arious vision tasks. However, CNNs also have limitations as the


fixed weights determined during training treat different images
and parts of an image equally, lacking the ability to adapt based
on image content. Additionally, the local nature of convolution
operations limits the learning of long-range interactions between
distant parts of an image.
Attention-based neural network models offer a potential solu-
tion to address the limitations of convolution-based models. These
models focus on learning relationships between different parts of
a sequence, deviating from the fixed weights approach of CNNs.
Attention-based networks, widely adopted in natural language
processing (NLP) applications, have transformers as the dominant
attention-based models [33]. Transformers outperform recurrent
neural networks in capturing complex and long-range interactions
and overcome limitations like vanishing gradients. They also
enable parallel processing, resulting in shorter training times on
modern hardware.
Despite the advantages of transformer networks, their adoption
in computer vision applications and medical image segmentation
is limited. Challenges arise from the significantly larger number
of pixels in images compared to the length of signal sequences in
typical NLP applications, limiting the direct application of stan-
dard attention models to images. Furthermore, training trans-
former networks is more challenging due to their minimal
inductive bias, requiring larger amounts of training data. Recent
studies propose practical solutions to these challenges. Vision
transformers (ViTs) consider image patches as the units of infor-
mation [34], embedding them into a shared space and learning
their relationships through self-attention modules. ViTs have
shown superior image classification accuracy compared to CNNs
when massive, labeled datasets and computational resources are
available. Knowledge distillation from a CNN teacher has been
proposed as a potential solution to training transformer networks,
enabling them to achieve image classification accuracy compara-
ble to CNNs with the same amount of labeled training data [35].
Additionally, self-attention-based deep neural networks, relying
on self-attention between linear embeddings of 3D image patches
14 L. Marti-Bonmati and L. Cerdá-Alberich

without convolution operations, have been proposed. These


­models typically require large, labeled training datasets, and are
often combined with unsupervised pre-training methods that
leverage large unlabeled medical image datasets [36]. While
U-Net [37], a U-shaped CNN architecture, has achieved tremen-
dous success on most medical image segmentation tasks, trans-
former-based models are challenging the well-configured U-Net
architectures with promising results [34].

 xtraction of Deep Features


E
DL is an emerging approach primarily utilized in tasks related to
recognition, prediction, and classification. By propagating data
through multiple hidden layers, a neural network can learn and
construct a representation of the data, which can be further used
for prediction or classification purposes. In the case of image data,
a CNN typically employs multiple convolutional kernels to extract
various textures and edges before passing the extracted informa-
tion through multiple hidden layers.
After learning, the convolutional layers of CNNs contain rep-
resentations of edge gradients and textures. When these represen-
tations are propagated through fully connected layers, the network
is believed to have learned diverse high-level features. From these
fully connected layers, deep features, which refer to the outputs of
units in the layer, are extracted and denoted by the position of a
neuron in a hidden layer row vector.
Due to the still limited data availability in the medical imaging
field, training a CNN from scratch is often unfeasible. In these
cases, a pretrained CNN can be employed, such as the VGG16
network [38], which has been trained on the ImageNet dataset.
Additionally, transfer learning methods [39], which refer to the
application of previously acquired knowledge from one domain to
a new task domain, are considered as an alternative option.
To extract deep features from medical images within a given
exam, a commonly used approach relies on selecting the
2-­dimensional slice containing the largest lesion area. In this case,
only features from the lesion region are extracted by incorporat-
ing the largest rectangular box around the tumor. The resulting
images can be then resized to an isometric voxel size by ­employing
1 Era of AI Quantitative Imaging 15

a bicubic interpolation in order to match the required input size of


the neural network.
In addition, these pretrained networks have been originally
trained using natural camera images with three color channels (R,
G, B), whereas the medical images are usually grayscale, lacking
color components. Voxel intensities of the medical images are
usually converted to a 0–255 range. Thus, the same grayscale
image can be utilized three times to mimic an image with three
color channels, and normalization is encouraged to be carried out
using the appropriate color channel image. The deep features can
then be generated from the last fully connected layer, followed by
the application of a ReLU activation function. The resulting fea-
ture vector, in the case of the VGG16 network, has a size of 4096.
Consequently, further feature engineering and dimensionality
reduction techniques may be necessary to maximize their effec-
tiveness and address the following specific challenges:

1. The presence of redundant or irrelevant information within the


deep features. High-dimensional feature vectors can be com-
putationally expensive to process and may lead to overfitting
or increased model complexity. Dimensionality reduction
techniques, such as principal component analysis (PCA) or
t-distributed stochastic neighbor embedding (t-SNE), can be
applied to reduce the dimensionality of deep features while
preserving the most informative characteristics. By reducing
the number of dimensions, these techniques can enhance
model efficiency, improve generalization, and facilitate visual-
ization of the data.
2. The need for more interpretable or domain-specific features.
Deep features are often abstract and lack direct human inter-
pretability. In certain applications, it may be beneficial to
incorporate domain knowledge or expert insights into the fea-
ture engineering process. This can involve designing hand-
crafted features based on prior knowledge or using rule-based
algorithms to extract specific patterns or characteristics. By
incorporating domain expertise, the deep features can be trans-
formed into more meaningful representations that align with
the specific problem at hand.
16 L. Marti-Bonmati and L. Cerdá-Alberich

3. Deep features may require further processing to address spe-


cific challenges related to data variability or noise. For instance,
in tasks involving medical imaging, there can be variations in
image acquisition protocols, artifacts, or noise levels.
Preprocessing techniques, such as image normalization,
denoising, or data augmentation, can be applied to enhance the
quality and robustness of deep features. These techniques help
mitigate the impact of data variability and improve the model’s
ability to generalize across different conditions or sources
[40].

It is important to note that the need for further feature engi-


neering and dimensionality reduction depends on the specific task
and used dataset. In some cases, the raw deep features obtained
from DL models may be sufficiently informative and effective
without additional processing. However, in other situations, incor-
porating additional techniques can enhance the performance,
interpretability, and efficiency of the models. In other words,
while deep features extracted from DL models offer powerful rep-
resentations of data, the choice and extent of these techniques
depend on the specific requirements and characteristics of the task
at hand.

 I Models for Prediction of Clinical Endpoints


A
AI models have emerged as powerful tools in the prediction of
clinical outcomes, offering valuable insights and assisting health-
care professionals in making informed decisions [41]. In this con-
text, two prominent types of AI models are radiomics models and
end-to-end deep learning models.
Radiomics models leverage the vast amount of quantitative
imaging data extracted from medical images to predict clinical
outcomes. Radiomics involves extracting a large number of quan-
titative features from medical images, including texture, shape,
and intensity, and combining them with clinical and demographic
data. These features are then used as inputs to ML algorithms,
such as support vector machines, random forests, or neural net-
works, to build predictive models [42].
1 Era of AI Quantitative Imaging 17

Radiomics models offer several advantages. Firstly, they allow


for an in vivo objective assessment of disease characteristics by
analyzing imaging data. This can aid in the early detection of dis-
eases [43], prediction of treatment response [44], and prognosis
estimation [45]. Secondly, radiomics models can capture intricate
patterns and relationships within the data that may not be evident
to the human observer, enabling more accurate predictions. Lastly,
radiomics models have the potential to facilitate personalized
medicine by identifying biomarkers or image-based signatures
that correlate with specific clinical outcomes, thus guiding indi-
vidualized treatment plans [46, 47] (Fig. 1.4).
Some examples of AI-based radiomic models include the iden-
tification of biological target volume from functional images,
using AI-derived imaging biomarkers to shift radiology from a
mostly subjective analysis to a more objective one, applying
sophisticated ML and computational intelligence to revolutionize
cancer image analysis, and using radiomic-based approaches and
AI to analyze medical images and construct prediction algorithms
for precision digital oncology [48, 49].
On the other hand, end-to-end DL models represent a more
recent and rapidly advancing approach in AI [50]. These models

Fig. 1.4 Diagram of the AI image processing pipeline, including image har-
monization (spatial resolution, common framework, normalization), image
annotation and tumor extraction (3D segmentation), properties extraction
(parameters, deep features) and modeling, and personalized cancer phenotyp-
ing, prediction, and prognosis estimations
18 L. Marti-Bonmati and L. Cerdá-Alberich

apply deep neural networks to automatically learn hierarchical


representations from the raw data, without the need for explicit
feature extraction. In the context of clinical outcome prediction,
end-to-end DL models can directly analyze medical images or
combine them with other types of data, such as electronic health
records or genetic information.
End-to-end DL models offer several advantages. Firstly, they
can learn complex patterns and relationships in a data-driven
manner, enabling them to extract features that are highly relevant
for the prediction task. Secondly, these models have the potential
to outperform traditional methods by automatically discovering
intricate image features that may be challenging to capture
through manual feature engineering. Moreover, they can handle
multi-modal data integration seamlessly, incorporating diverse
information sources to improve prediction accuracy. Lastly, their
ability to generalize across different datasets makes them highly
adaptable and transferable to various clinical settings. A recent
example is the use of these models for evaluating COVID-19
patients, which has demonstrated potential for improving clinical
decision making and assessing patient outcomes throughout
images [51].
However, both radiomics models and end-to-end DL models
face challenges and limitations. Radiomics models heavily rely
on the quality and consistency of imaging data and are susceptible
to variations in image acquisition protocols. Standardization of
image acquisition and feature extraction techniques is crucial to
ensure robust and reproducible results. On the other hand, end-to-­
end DL models require large amounts of labeled data for training,
which can be a limitation in domains where annotated data is
scarce or time-consuming to obtain. Additionally, the black-box
nature of DL models can hinder their interpretability, making it
difficult to understand the reasoning behind their predictions,
which is a critical aspect in the healthcare domain.
In conclusion, both radiomics models and end-to-end deep
learning models have shown promise in predicting clinical out-
comes and assisting clinical users in detecting and quantifying a
wide array of clinical conditions, with excellent accuracy, sensi-
tivity, and specificity. Radiomics models leverage quantitative
1 Era of AI Quantitative Imaging 19

imaging features to provide insights into disease characteristics


and treatment response. Meanwhile, end-to-end DL models offer
the advantage of learning from raw data and can handle multi-­
modal integration. While challenges exist, ongoing research and
advancements in these AI models will further enhance their pre-
dictive capabilities and contribute to improved patient care and
outcomes.

I ntegration of Imaging, Clinical, Biological


and Pathology Data
The integration of imaging, clinical, biological, and pathology
information has revolutionized the field of healthcare by provid-
ing valuable new insights into the diagnosis, treatment, and man-
agement of various diseases. This multidisciplinary approach
brings together different types of data from various sources,
enabling a more comprehensive and holistic understanding of a
patient’s condition.
Imaging data from the different modalities, such as radio-
graphs, CT, MR, and PET, provide visual representations of the
internal structures and organs of the body. These images allow
healthcare professionals to identify abnormalities, tumors, lesions,
or other indicators of disease. By integrating imaging data with
clinical information, such as patient history, symptoms, and labo-
ratory results, a more accurate diagnosis can be made. This inte-
gration enhances the diagnostic accuracy and improves patient
outcomes [52].
Biological data, including genetic and molecular profiles, pro-
vides insights into the underlying mechanisms of diseases at a
cellular and molecular level. With the advancement of technolo-
gies like genomics and proteomics, healthcare professionals can
analyze an individual’s genetic makeup and identify specific
genetic variations that may contribute to the development or pro-
gression of a disease. By integrating this information with imag-
ing and clinical data, personalized treatment plans can be tailored
to each patient’s unique genetic profile, leading to more effective
and targeted therapies.
Pathology data, derived from the microscopic examination of
tissues and cells, plays a crucial role in diagnosing and character-
20 L. Marti-Bonmati and L. Cerdá-Alberich

izing diseases, particularly cancer. Pathologists analyze biopsy


samples and provide information about the presence, type, and
stage of the disease. By integrating pathology data with imaging,
clinical, and biological data, healthcare professionals can gain a
comprehensive understanding of the disease, allowing for more
accurate prognostic predictions and personalized treatment strate-
gies.
The integration of these diverse datasets is made possible by
advancements in technology and the development of specialized
software platforms. These platforms allow for the aggregation,
storage, and analysis of large volumes of data from different
sources. Data integration techniques, such as data mining, ML
and AI, help identify patterns, correlations, and predictive models
that can aid in clinical decision-making.
The benefits of integrating imaging, clinical, biological, and
pathology data are manifold. Firstly, it improves diagnostic accu-
racy, enabling healthcare professionals to detect diseases at an
earlier stage when they are more treatable. Secondly, it facilitates
personalized medicine, where treatment plans can be tailored to
an individual’s unique characteristics, resulting in better outcomes
and reduced side effects. Thirdly, it enhances research and devel-
opment by providing researchers with a wealth of data to study
disease mechanisms, identify new therapeutic targets, and develop
innovative treatments.
However, there are challenges in integrating these different
types of data. One significant is data interoperability, as each type
of data is often stored in different formats and systems. Efforts are
being made to develop standards and protocols that allow seam-
less data exchange and interoperability across different platforms
and healthcare settings.
In conclusion, the integration of imaging, clinical, biological,
and pathology data is transforming healthcare and research by
providing a comprehensive and multidimensional view of
patients’ conditions. This integrated approach enhances diagnos-
tic accuracy, enables personalized treatment plans, and fosters
advancements in research and development. With continued
advancements in technology and data analysis through AI tech-
1 Era of AI Quantitative Imaging 21

niques, the integration of these diverse datasets will continue to


play a vital role in improving patient care and advancing medical
knowledge.

References
1. Demicheli R, Fornili M, Querzoli P et al (2019) Microscopic tumor foci
in axillary lymph nodes may reveal the recurrence dynamics of breast
cancer. Cancer Commun 39:35. https://doi.org/10.1186/s40880-­019-­
0381-­9
2. Cerdá Alberich L, Sangüesa Nebot C, Alberich-Bayarri A et al (2020) A
confidence habitats methodology in MR quantitative diffusion for the
classification of neuroblastic tumors. Cancers (Basel) 12(12):3858.
https://doi.org/10.3390/cancers12123858. PMID: 33371218; PMCID:
PMC7767170
3. Ni M, Zhou X, Lv Q et al (2019) Radiomics models for diagnosing
microvascular invasion in hepatocellular carcinoma: which model is the
best model? Cancer Imaging 19:60. https://doi.org/10.1186/s40644-­019-­
0249-­x
4. Juan-Albarracín J, Fuster-Garcia E, Pérez-Girbés A et al (2018)
Glioblastoma: vascular habitats detected at preoperative dynamic
susceptibility-­weighted contrast-enhanced perfusion MR imaging predict
survival. Radiology 287(3):944–954. https://doi.org/10.1148/
radiol.2017170845. Epub 2018 Jan 19. PMID: 29357274
5. Reeder SB, Yokoo T, França M et al (2023) Quantification of liver iron
overload with MRI: review and guidelines from the ESGAR and
SAR. Radiology 307(1):e221856. https://doi.org/10.1148/radiol.221856.
Epub 2023 Feb 21. PMID: 36809220; PMCID: PMC10068892
6. Martí-Aguado D, Jiménez-Pastor A, Alberich-Bayarri Á et al (2022)
Automated whole-liver MRI segmentation to assess steatosis and iron
quantification in chronic liver disease. Radiology 302(2):345–354.
https://doi.org/10.1148/radiol.2021211027. Epub 2021 Nov 16. PMID:
34783592
7. Kondylakis H, Kalokyri V, Sfakianakis S et al (2023) Data infrastructures
for AI in medical imaging: a report on the experiences of five EU proj-
ects. Eur Radiol Exp 7(1):20. https://doi.org/10.1186/s41747-­023-­
00336-­x. PMID: 37150779; PMCID: PMC10164664
8. Marti-Bonmati L, Koh DM, Riklund K et al (2022) Considerations for
artificial intelligence clinical impact in oncologic imaging: an AI4HI
position paper. Insights Imaging 13(1):89. https://doi.org/10.1186/
s13244-­022-­01220-­9. PMID: 35536446; PMCID: PMC9091068
22 L. Marti-Bonmati and L. Cerdá-Alberich

9. Martí-Bonmatí L, Alberich-Bayarri Á, Ladenstein R et al (2020)


PRIMAGE project: predictive in silico multiscale analytics to support
childhood cancer personalised evaluation empowered by imaging bio-
markers. Eur Radiol Exp 4(1):22. https://doi.org/10.1186/s41747-­020-­
00150-­9. PMID: 32246291; PMCID: PMC7125275
10. Martí-Bonmatí L, Miguel A, Suárez A et al (2022) CHAIMELEON proj-
ect: creation of a Pan-European repository of health imaging data for the
development of AI-powered cancer management tools. Front Oncol
12:742701. https://doi.org/10.3389/fonc.2022.742701. PMID: 35280732;
PMCID: PMC8913333
11. Reader AJ, Schramm G (2021) Artificial intelligence for PET image
reconstruction. J Nucl Med 62(10):1330–1333. https://doi.org/10.2967/
jnumed.121.262303. Epub 2021 Jul 8. PMID: 34244357
12. Lin DJ, Johnson PM, Knoll F, Lui YW (2021) Artificial intelligence for
MR image reconstruction: an overview for clinicians. J Magn Reson
Imaging. 53(4):1015–1028. https://doi.org/10.1002/jmri.27078. Epub
2020 Feb 12. PMID: 32048372; PMCID: PMC7423636
13. Shan H, Padole A, Homayounieh F et al (2019) Competitive performance
of a modularized deep neural network compared to commercial algo-
rithms for low-dose CT image reconstruction. Nat Mach Intell 1:269–
276. https://doi.org/10.1038/s42256-­019-­0057-­9
14. Goodfellow I, Pouget-Abadie J, Mirza M et al (2014) Generative adver-
sarial nets. In Advances in neural information processing systems (NIPS
2014). pp 2672–2680
15. Radua J, Vieta E, Shinohara R et al (2020) ENIGMA Consortium col-
laborators. Increased power by harmonizing structural MRI site differ-
ences with the ComBat batch adjustment method in ENIGMA. Neuroimage
218:116956. https://doi.org/10.1016/j.neuroimage.2020.116956. Epub
2020 May 26. PMID: 32470572; PMCID: PMC7524039
16. Whitney HM, Li H, Ji Y et al (2020) Harmonization of radiomic features
of breast lesions across international DCE-MRI datasets. J Med Imaging
(Bellingham) 7(1):012707. https://doi.org/10.1117/1.JMI.7.1.012707.
Epub 2020 Mar 5. PMID: 32206682; PMCID: PMC7056633
17. Nan Y, Ser JD, Walsh S et al (2022) Data harmonisation for information
fusion in digital healthcare: a state-of-the-art systematic review, meta-­
analysis and future research directions. Inf Fusion 82:99–122. https://doi.
org/10.1016/j.inffus.2022.01.001. PMID: 35664012; PMCID:
PMC8878813
18. Kingma DP, Welling M (2019) An introduction to variational autoencod-
ers. Found Trends Mach Learn 12(4):307–392. https://doi.
org/10.1561/2200000056
19. Veiga-Canuto D, Cerdà-Alberich L, Jiménez-Pastor A et al (2023)
Independent validation of a deep learning nnU-net tool for neuroblastoma
detection and segmentation in MR images. Cancers (Basel) 15(5):1622.
1 Era of AI Quantitative Imaging 23

https://doi.org/10.3390/cancers15051622. PMID: 36900410; PMCID:


PMC10000775
20. Wan SY, Higgins WE (2003) Symmetric region growing. IEEE Trans
Image Process 12(9):1007–1015. https://doi.org/10.1109/
TIP.2003.815258. PMID: 18237973
21. Bogovic JA, Prince JL, Bazin PL (2013) A multiple object geometric
deformable model for image segmentation. Comput Vis Image Underst
117(2):145–157. https://doi.org/10.1016/j.cviu.2012.10.006. PMID:
23316110; PMCID: PMC3539759
22. Chen X, Pan L (2018) A survey of graph cuts/graph search based medical
image segmentation. IEEE Rev Biomed Eng 11:112–124. https://doi.
org/10.1109/RBME.2018.2798701. Epub 2018 Jan 26. PMID: 29994356
23. Mittal H, Pandey AC, Saraswat M et al (2022) A comprehensive survey
of image segmentation: clustering methods, performance parameters, and
benchmark datasets. Multimed Tools Appl 81(24):35001–35026. https://
doi.org/10.1007/s11042-­021-­10594-­9. Epub 2021 Feb 9. PMID:
33584121; PMCID: PMC7870780
24. Wong WC, Chung AC (2005) Bayesian image segmentation using local
iso-intensity structural orientation. IEEE Trans Image Process
14(10):1512–1523. https://doi.org/10.1109/tip.2005.852199. PMID:
16238057
25. Veiga-Canuto D, Cerdà-Alberich L, Sangüesa Nebot C et al (2022)
Comparative multicentric evaluation of inter-observer variability in man-
ual and automatic segmentation of neuroblastic tumors in magnetic reso-
nance images. Cancers (Basel) 14(15):3648. https://doi.org/10.3390/
cancers14153648. PMID: 35954314; PMCID: PMC9367307
26. Kamnitsas K, Ledig C, Newcombe VFJ et al (2017) Efficient multi-scale
3D CNN with fully connected CRF for accurate brain lesion segmenta-
tion. Med Image Anal 36:61–78. https://doi.org/10.1016/j.
media.2016.10.004. Epub 2016 Oct 29. PMID: 27865153
27. Taghanaki SA, Abhishek K, Cohen JP, Cohen-Adad J, Hamarneh G
(2020) Deep semantic segmentation of natural and medical images: a
review. Artif Intell Rev 54(1):1–42
28. Ghafoorian M, Mehrtash A, Kapur T et al (2017) Transfer learning for
domain adaptation in MRI: application in brain lesion segmentation. In:
Proceedings of the international conference on medical image computing
and computer assisted intervention. Springer, Cham, pp 516–524
29. Bai W, Suzuki H, Qin C et al (2018) Recurrent neural networks for aortic
image sequence segmentation with sparse annotations. In: Proceedings of
the international conference on medical image computing and computer
assisted intervention. Springer, Cham, pp 586–594
30. Chen J, Lu Y, Yu Q et al (2021) TransUNet: transformers make strong
encoders for medical image segmentation. arXiv:2102.04306
24 L. Marti-Bonmati and L. Cerdá-Alberich

31. Isensee F, Kickingereder P, Wick W et al (2018) No new-net. In:


Proceedings of the international MICCAI brain lesion workshop.
Springer, Cham, pp 234–244
32. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
33. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need.
In: Proceedings of the 31st international conference on neural informa-
tion processing systems (NIPS’17). Curran Associates Inc., Red Hook,
pp 6000–6010
34. He K, Gan C, Li Z et al (2022) Transformers in medical image analysis:
a review. ArXiv abs/2202.12165
35. Touvron H, Cord M, Douze M et al (2020) Training data-efficient image
transformers & distillation through attention. arXiv:2012.12877
36. Karimi D, Dou H, Gholipour A (2022) Medical image segmentation
using transformer networks. IEEE Access 10:29322–29332. https://doi.
org/10.1109/access.2022.3156894. Epub 2022 Mar 4. PMID: 35656515;
PMCID: PMC9159704
37. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks
for biomedical image segmentation. In: Navab N, Hornegger J, Wells W,
Frangi A (eds) Medical image computing and computer-assisted inter-
vention—MICCAI 2015. MICCAI 2015. Lecture Notes in Computer
Science, vol 9351. Springer, Cham. https://doi.org/10.1007/978-­3-­319-­
24574-­4_28
38. Liu S, Deng W (2015) Very deep convolutional neural network based
image classification using small training sample size. In: 2015 3rd IAPR
Asian conference on pattern recognition (ACPR), Kuala Lumpur,
pp 730–734. https://doi.org/10.1109/ACPR.2015.7486599
39. Tammina S (2019) Transfer learning using VGG-16 with deep convolu-
tional neural network for classifying images. Int J Sci Res Publ 9:9420.
https://doi.org/10.29322/IJSRP.9.10.2019.p9420
40. Fernández Patón M, Cerdá Alberich L, Sangüesa Nebot C et al (2021)
MR denoising increases radiomic biomarker precision and reproducibil-
ity in oncologic imaging. J Digit Imaging 34(5):1134–1145. https://doi.
org/10.1007/s10278-­021-­00512-­8. Epub 2021 Sep 10. PMID: 34505958;
PMCID: PMC8554919
41. Oltra-Sastre M, Fuster-Garcia E, Juan-Albarracin J et al (2019) Multi-­
parametric MR imaging biomarkers associated to clinical outcomes in
gliomas: a systematic review. Curr Med Imaging Rev 15(10):933–947.
https://doi.org/10.2174/1573405615666190109100503. PMID:
32008521
42. Marti-Bonmati L, Cerdá-Alberich L, Pérez-Girbés A et al (2022)
Pancreatic cancer, radiomics and artificial intelligence. Br J Radiol
95(1137):20220072. https://doi.org/10.1259/bjr.20220072. Epub 2022
Jun 28. PMID: 35687700
1 Era of AI Quantitative Imaging 25

43. Sanz-Requena R, Martínez-Arnau FM, Pablos-Monzó A et al (2020) The


role of imaging biomarkers in the assessment of sarcopenia. Diagnostics
(Basel) 10(8):534. https://doi.org/10.3390/diagnostics10080534. PMID:
32751452; PMCID: PMC7460125
44. Carles M, Fechter T, Radicioni G et al (2021) FDG-PET radiomics for
response monitoring in non-small-cell lung cancer treated with radiation
therapy. Cancers (Basel) 13(4):814. https://doi.org/10.3390/can-
cers13040814. PMID: 33672052; PMCID: PMC7919471
45. Fuster-Garcia E, Juan-Albarracín J, García-Ferrando GA et al (2018)
Improving the estimation of prognosis for glioblastoma patients by MR
based hemodynamic tissue signatures. NMR Biomed 31(12):e4006.
https://doi.org/10.1002/nbm.4006. Epub 2018 Sep 21. PMID: 30239058
46. Paiar F, Gabelloni M, Pasqualetti F et al (2023) Correlation of pre- and
post-radio-chemotherapy MRI texture features with tumor response in
rectal cancer. Anticancer Res 43(2):781–788. https://doi.org/10.21873/
anticanres.16218. PMID: 36697103
47. Pang Y, Wang H, Li H (2022) Medical imaging biomarker discovery and
integration towards AI-based personalized radiotherapy. Front Oncol
11:764665. https://doi.org/10.3389/fonc.2021.764665. PMID: 35111666;
PMCID: PMC8801459
48. Weiss J, Hoffmann U, Aerts HJWL (2020) Artificial intelligence-derived
imaging biomarkers to improve population health. Lancet Digit Health
2(4):e154–e155. https://doi.org/10.1016/S2589-­7500(20)30061-­3. Epub
2020 Mar 2. PMID: 33328074
49. Forghani R (2020) Precision digital oncology: emerging role of radiomics-­
based biomarkers and artificial intelligence for advanced imaging and
characterization of brain tumors. Radiol Imaging Cancer 2(4):e190047.
https://doi.org/10.1148/rycan.2020190047. PMID: 33778721; PMCID:
PMC7983689
50. Koh DM, Papanikolaou N, Bick U et al (2022) Artificial intelligence and
machine learning in cancer imaging. Commun Med (Lond) 2:133. https://
doi.org/10.1038/s43856-­022-­00199-­0. PMID: 36310650; PMCID:
PMC9613681
51. Zhao W, Jiang W, Qiu X (2021) Deep learning for COVID-19 detection
based on CT images. Sci Rep 11:14353. https://doi.org/10.1038/s41598-­
021-­93832-­2
52. Rodríguez-Ortega A, Alegre A, Lago V et al (2021) Machine learning-­
based integration of prognostic magnetic resonance imaging biomarkers
for myometrial invasion stratification in endometrial cancer. J Magn
Reson Imaging 54(3):987–995. https://doi.org/10.1002/jmri.27625. Epub
2021 Apr 1. PMID: 33793008
Principles of Image
Formation in the Different 2
Modalities

P. A. García-Higueras
and D. Jimena-­Hermosilla

2.1 Ionizing Radiation Imaging

X-rays are a form of ionizing electromagnetic radiation that were


discovered in 1895 by the German physicist Wilhelm Röntgen. In
a very short time after discovery, two types of applications in
medicine were defined: diagnosis of diseases and therapeutic pur-
poses. Since then, X-ray applications have dramatically increased
due to the evolution of radiological technology [1].
The ionizing radiation imaging involves three basic processes:
the generation of the X-ray beam, its interaction with the patient
tissues, and the image formation.

2.1.1 X-Ray Beam Generation

All equipment that produces X-rays for radiodiagnostic purposes


has a common structure for beam generation: the X-ray tube and
the generator.

P. A. García-Higueras (*) · D. Jimena-Hermosilla


Hospital Radiophysics Clinical Management Unit, University Hospital
of Jaén, Jaén, Spain
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature 27


Switzerland AG 2023
Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image
Processing, Imaging Informatics for Healthcare Professionals,
https://doi.org/10.1007/978-3-031-48446-9_2
28 P. A. García-Higueras and D. Jimena-Hermosilla

X-Ray Tube
The X-ray tube is the radiation source and is electrically powered
by the generator. It is composed of a protective housing and a
glass envelope in which a vacuum is created. Two main parts can
be found inside the glass envelope:

• Cathode—negative side of the X-ray tube. Its main compo-


nent is the filament. When a high electric current passes
through the filament, it heats up and produces electrons which
are directed towards the anode.
• Anode—positive side of the X-ray tube. When an accelerated
electron interacts with the anode target, the particle’s trajectory
can be altered and thereby lose part or all its kinetic energy.
This energy is transferred to the medium in the form of heat
(99%) or can be emitted by photons through radiation losses
(1%), which form the X-ray beam.

Generator
The generator is the device that transforms and accommodates the
power from the electrical grid to the needs of the X-ray tube. The
generator usually consists of two separate elements: the console
and the electric transformer. The operator can use the console to
define the radiological technique, which basically consists of
three adjustable parameters:

• Peak kilovoltage (kV). A higher voltage increases the energy


of the photons generated and therefore their penetration depth.
The change of the peak kilovoltage affects the image contrast.
• Tube current or milliamperes (mA). It is the number of elec-
trons generated by a unit of time. A higher tube current means
that more photons will be generated per unit time.
• Time exposure (ms). It determines the time duration of the
X-ray beam production. The product of tube current (mA) and
exposure time (ms) determines the number of photons pro-
duced (mAs).
2 Principles of Image Formation in the Different Modalities 29

Nowadays most of the equipment operates with systems that


automatically adjust the technique according to the characteristics
of the studied object.

2.1.2 Radiation-Patient Interaction

Once the X-ray beam is generated, it passes through the patient


and interacts with the tissues. Although there are many ways that
photons interact with matter, in radiodiagnosis they are funda-
mentally reduced to two: photoelectric effect and Compton effect
[2, 3].

Photoelectric Effect
In photoelectric effect, the photon disappears being absorbed by
the atomic electrons that compose the medium. In other words,
the X-ray beam will lose a photon that will not reach the imaging
system. The photoelectric effect is a phenomenon that occurs
most likely at low X-ray beam energies (low kV).

Compton Effect
In the Compton effect, the photon interacts with an electron of the
medium and transfers part of its energy, the photon is scattered
after the interaction with a scattering angle. The process results in
a decrease of the photon energy and the emission of an atomic
electron.

2.1.3 Image Acquisition and Reconstruction

Although the X-ray beam generation has not changed for many
decades and its interaction with matter is governed by invariant
physics laws, in the last decades image receptors have been evolv-
ing towards systems generically called digitals.
The way in which the image is obtained could be used to clas-
sify the X-ray equipment. A distinction could be made between
30 P. A. García-Higueras and D. Jimena-Hermosilla

X-ray and computed tomography modalities, the former obtains


the image through a direct exposure of the X-ray beam passing
through the patient, and the latter, obtains the image from a math-
ematical reconstruction.

Computed Tomography
Computed tomography (CT) was the first and earliest application
of digital radiology and is considered by many the greatest
advance in radiodiagnosis since the X-rays discovery [4]. In a
modern CT scan both the X-ray tube and the image detector are
rigidly mounted on a platform and rotate at the same time. This
structure is called gantry.
CT imaging systems are generally an array of solid-state detec-
tors fabricated by modules. The field of view (FOV) is delimited
by the physical extension of the detectors array. At the CT tube
exit, there are “shape filters” to adjust the intensity gradient of the
X-ray beam and a collimation system to limit the beam width
along the longitudinal axis. Another intrinsic element of a CT
scanner is the couch, which has precision motors for accurate
movements and lasers to centre the patients. The modes of CT
acquisitions can be classified into axial and helical (Fig. 2.1):

• Axial acquisition: The tube does not irradiate while the patient
moves between acquisition cycles.
• Helical acquisition: The tube has a spiral trajectory around the
patient in the form of helices. The ratio between the travel
length of the couch in a complete revolution and the slice
thickness is called pitch.

Reconstruction Algorithms
The basic principle of tomographic image reconstruction is the
accurate reproduction of the object from a set of projections taken
from different angles. Although tomographic reconstruction was
originally performed by algebraic methods, the slowness of the
calculation processes led to the use of analytical methods.
The most popular of the analytical methods is the filtered back-
projection (FBP) algorithm. This algorithm based on the Fourier
2 Principles of Image Formation in the Different Modalities 31

Fig. 2.1 Schematic representation of the different acquisition modes and


pitch effect. If pitch is less than 1 the spirals overlap each other, if pitch is
equal to 1 the spirals are contiguous and if pitch is greater than 1 the spirals
are separated from each other. Pitch usually takes values between 1 and 2

slice theorem states that by projections of an image it is possible


to determine the image through a two-dimensional inverse Fourier
transform [5]. This means that by taking n projections of an object
at different angles ϑ1, ϑ2, ..., ϑn and obtaining the Fourier trans-
formation of each of them, the two-dimensional transformation of
the object on the lines passing through the origin forming the
angles ϑ1, ϑ2, ..., ϑn can be determined.
Nowadays, due to the great improvement in computing capac-
ity, algebraic methods are used again, with iterative reconstruc-
tion algorithms as the most employed. These algorithms improve
image quality compared to FBP, especially in conditions in which
the image is noisy or it is not possible to obtain complete acquisi-
tions, at the expense of longer reconstruction time.
There are many different algorithms, but all of them start with
an assumed image calculated from projections of the original
image. Subsequently, the original projection data is compared
with the calculated projections and the image is updated based on
32 P. A. García-Higueras and D. Jimena-Hermosilla

the found differences. This process is repeated for multiple itera-


tion steps until a specified number of times or until the differences
in the original and calculated projections become very small.

2.1.4 Image Quality

Image quality is a crucial concept for the evaluation of any imag-


ing system [6]. Although it is usually evaluated subjectively, some
characteristics allow to establish the system capability to repro-
duce or distinguish different tissues of a volume or a studied
pathology. Basic characteristics when analysing CT image quality
are mainly: spatial resolution, contrast resolution, noise, and arti-
facts.

Spatial Resolution
Spatial resolution or high-contrast resolution is the ability of an
imaging system to discriminate and represent small details in an
image of an interest volume. This capability provides the details
of the anatomical structures of the tissues. The elements that
affect spatial resolution in a CT scan are summarized as follows:

• Hardware of the equipment. The size of the detectors, directly


related to the pixel size, the number of detectors or focus size
are elements that strongly affect spatial resolution.
• Radiation dose which affects the quality of used radiation. The
definition of peak kilovoltage (kV) and the amount of emitted
radiation (mAs) have a direct relation with the spatial resolu-
tion achieved and the patient absorbed dose.
• Acquisition parameters. Slice thickness and pitch or FOV
affect the spatial resolution. Smaller slice thickness or lower
pitch improves the spatial resolution of the image.
• Image processing and reconstruction.
2 Principles of Image Formation in the Different Modalities 33

Contrast Resolution
Contrast resolution or low-resolution contrast is the ability of an
imaging system to discriminate between different structures or
tissues within an interest volume. The elements that affect con-
trast resolution in a CT scan are:

• Physical properties of the studied volume. Differences in tissue


density and the respective attenuation coefficients will have a
high dependence on the image contrast.
• Radiation technique. Employing a low number of mAs can
reduce the contrast and increase the image noise. Using a low
kV increases the contrast resolution, but at the same time the
image noise.
• Acquisition parameters such as slice thickness or pitch can
increase the image contrast.
• Equipment characteristics such as the response function of the
detector or the dynamic range used, are related to the contrast
resolution obtained.
• Image processing and mathematical filters applied. Different
image processing or reconstruction techniques can be used to
increase contrast resolution and decrease image noise.

Image Noise
Image noise refers to the presence of random signals or fluctua-
tions that are not related to the patient’s anatomy or pathology.
Noise has a negative impact on image quality, as it may hinder the
visualization of tissues or obscure subtle details. The most com-
mon sources of image noise in CT are as follows:

• Noise generated by the equipment. This noise is inherent to the


used equipment and can be generated, for example, by the
electronic components.
• Noise produced by radiation physics. Noise is strongly associ-
ated with Compton effect; through a correct selection of the
34 P. A. García-Higueras and D. Jimena-Hermosilla

radiological technique or radiation filters applied, the final


image noise can be reduced.
• As the thickness of the patient crossed by the radiation
increases, Compton effect will be greater and, therefore, the
image noise will be higher.
• Acquisition parameters. A high pitch value reduces the number
of acquisitions and therefore lead to higher image noise.
• Mathematical filters and reconstruction methods are available
to reduce image noise.

Artifacts
Artifacts are undesired structures resulting from distortions,
anomalies or interferences which appear in the image and do not
represent the patient’s anatomy or pathology. The most common
are artifacts from patient movements (intentional or uninten-
tional), artifacts from metallic objects in the volume of interest or
artifacts from hardware problems in the equipment.
Although the image quality characteristics describe different
aspects, they cannot be treated as completely independent factors
because the improvement of one of them is often obtained through
deterioration of one (or more) of the others.

2.2 Nuclear Medicine Imaging

The characteristic of nuclear medicine imaging is the administra-


tion of radioactive tracers or radiopharmaceuticals to patients to
determine its distribution. For this purpose, an original molecule
is labelled in which an atom is replaced by a radioisotope.
Metabolic processes distribute the substance within the patient.
The amount of radiopharmaceutical introduced (or activity) is
measured in millicuries (mCi).
For diagnostic purposes, tracers are labelled with short half-­
life radioisotopes that emit gamma photons or positrons. This
allows imaging to be performed in a short period of time, with a
lower dose deposited in patients and rapid removal from the
organism.
2 Principles of Image Formation in the Different Modalities 35

Gamma scintigraphy consists of obtaining planar images by


detecting gamma radiation from a radionuclide. Single photon
emission computed tomography (SPECT) is a gammagraphic
volumetric representation presented in tomographic slices or
images. Both techniques are performed on equipment called
gamma cameras.
Positron emission tomography (PET) is a tomographic tech-
nique whose main difference is that it uses radiopharmaceuticals
that emit positrons. This technique makes possible to obtain
regional distributions of functional processes that cannot be mea-
sured by any other technology.
Furthermore, the development of PET images allowed this
technique to acquire some advantages. Firstly, it is dynamic,
which means studies may be performed in a short time, being able
to be at the pace of physiological processes kinetics. Secondly,
micromoles of the traces could be detected, making this technique
sensible. Furthermore, quantitative information might be obtained
from physiological processes besides being a non-invasive diag-
nosis method.
Regarding this, PET tomography comes with a CT scanner.
This device is attached to PET tomography. This allows to obtain
a CT scan before the PET study, giving the possibility to register
both images. Consequently, it is possible to combine the advan-
tages of both diagnostic techniques. CT image provides anatomi-
cal and morphological information whereas PET study obtains
functional data, such as metabolic behaviour of tissues.

2.2.1 Radiopharmaceuticals

A radiopharmaceutical is made up of two different components:

• Tracer. It conditions the metabolic pathway of the radiophar-


maceutical and is directed towards the target organ to be stud-
ied.
• Radionuclide. It is the isotope which emits the radiation that
allows information to be obtained on the process under study.
36 P. A. García-Higueras and D. Jimena-Hermosilla

In PET, 18FDG (18F-2-fluoro-2-deoxy-d-glucose) is the most


widely used radiopharmaceutical allowing the measurement of
glucose consumption in real time [7]. In cardiology, myocardial
blood flow can be measured by means of a compound called
13
N-ammonia as well as 18FDG to study viability and glucose con-
sumption of the heart. In neurology, 18F-DOPA along with
11
C-raclopride are used in dopamine transporter and D2 receptor
in brain, being remarkable its application in Parkinson’s disease.
In addition, H215O is employed in brain blood flow studies to dis-
cover functional behaviour of the brain in aspects as linguistics
and pharmacological drugs effects. Lastly, PET tomography is
widely used in oncology to diagnose pulmonary nodules, breast
cancer, lymphoma and colorectal cancer, head and neck cancer,
among others.

2.2.2 Physics Concepts in PET: Decay,


Annihilation, and Coincidences

Radioactive Decay
Each element from the periodic table has multiple isotopes. An
isotope of an element is a nucleus with the same atomic number
(number of protons) but differing from the number of neutrons.
Some isotopes are unstable and have some likelihood to undergo
a decay process. If so, they are called radionuclides. These radio-
nuclides may decay by different means.
PET is based on the decay path called beta plus decay (β+).
Radioisotopes with an excess of protons are likely to decay via β+
[8]. It means one of the protons from the nucleus is converted into
a neutron, emitting a positron, and an electronic neutrino in the
process. The difference of energy between the father radionuclide
and the daughter is shared between the positron and the neutrino.
A general expression for this process is expressed below, being X
the father nucleus, Y the daughter nucleus (losing one proton), e+
is the positron, and n e corresponds to antineutrino. Finally, Z and
A are the atomic and mass number, respectively,
A
Z X® Y + e + + ve
A
Z -1 (2.1)
2 Principles of Image Formation in the Different Modalities 37

Electron–Positron Annihilation
Immediately after the decay, the positron loses its kinetic energy
(10−1 cm) and, when it is almost at rest, interacts with an electron
from the tissue, resulting in the disappearance of both and the
emission of two photons of 511 kiloelectronvolts (keV) energy
moving in opposite directions. This interaction is called electron–
positron annihilation. Considering neither positron nor electron
have kinetic energy, the energy of the photons is a result of
Einstein’s energy-mass equation as follows, being me and mp the
masses of the electron and the positron, respectively, and c the
speed of light:

E = mc 2 = me c 2 + mp c 2 (2.2)
Therefore, the two opposed photons resulting from this pro-
cess are the basis of PET tomography. As these photons are very
energetic, they have high probability to escape from the body and
to be detected externally. Hence, placing two detectors in the line
the photons are moving and detecting them at the same time it is
known as the point where the annihilation (that is the decay) hap-
pened (Fig. 2.2).
A single detection (also called true detection) occurs when two
photons coming from an annihilation are detected in a short period
of time, called coincidence time (τ). At the same time, not only the
true detections occur, but other detections also occur deteriorating
the image quality and decreasing the quantitative information.
They are called random, scattered, and multiple coincidences
(Fig. 2.3).

Scattered Coincidences
Scattered coincidences are produced when a photon interacts
within the object studied via Compton scattering. These scattered
photons are registered by a detector out of the line of coincidence
of annihilation photons. Nonetheless, this effect may be reduced
using tungsten septa ring, absorbing photons at large angles, and
identifying and removing scattered photons, being the low energy
resolution of the detector the main drawback for this filtering
method.
38 P. A. García-Higueras and D. Jimena-Hermosilla

Fig. 2.2 Neutron-rich radionuclide decays, following a positron emission


and finally annihilating after interacting with an electron, producing the
opposed photons that will be detected. Figure adapted from the published [9]

Random Coincidences
It could happen that two photons produced by different events are
detected by two opposed detectors within the timing coincidence
window, mistaken it for a true coincidence. The random events
rate probability (Crandom) is increased directly with the timing
­coincidence window τ and the single event rate of each detector
(S1 and S2) as follows:

Crandom = 2t S1 S 2 (2.3)
As the activity increases so does the ratio between random/
single events rate probability. For this purpose, the use of septa
rings notably reduces this ratio besides the development of faster
detector with a lower timing coincidence window.
2 Principles of Image Formation in the Different Modalities 39

a b

c d

Fig. 2.3 Different events that result in a detection. True coincidences (a) are
important to create the image. Scattered, random and multiple coincidences
(b–d) worsen the image quality and it is necessary to identify and diminish
them. Figure adapted from published in [9]

Multiple Coincidences
There is a chance of two or more single events to be detected in
the timing coincidence window. In such case, the events are nor-
mally discarded because it turns out confusing for the event to be
positioned. Nevertheless, some annihilations contain information
of the quantity and spatial location of the decays. Therefore, under
specific circumstances, one of the lines is selected randomly from
the multiple detections.
40 P. A. García-Higueras and D. Jimena-Hermosilla

2.2.3 PET Detector Materials

A PET scan consists of hundreds of detectors set in the shape of a


ring surrounding the source, being the two detectors placed in the
same movement line of the two opposed photon emitted which
detects and identify those photons coming from the electron posi-
tron annihilation [10].
Generally, inorganic scintillators are employed as detectors.
Absorbed energy from the photon takes an electron from the
material to a higher energy state. Afterwards, that empty state is
occupied by another electron from a higher state, emitting a pho-
ton in that transition. Eventually, those scintillation photons are
detected by a photomultiplier tube cathode, which transform them
into electrons being accelerated by electric fields in different
steps. Those electrons are collected by the anode generating an
electric signal necessary to obtain the final image.
Originally, PET detectors were made of NaI(TI) crystals used
in gamma cameras. It does not take excessive difficulties to manu-
facture large surfaces of those crystals, nonetheless, they have a
low sensitivity to 511 keV photons. Thus, in the 70s, it was intro-
duced a new material called bismuth germanate (Bi4Ge3O12) as
known as BGO. Despite having a worse time and light production
properties, it resulted to be more sensitive to annihilation photons,
becoming massively used in PET scanners. It was in the late 90s
when new scintillators were developed with more appropriate
characteristics for PET, being used in PET tomography nowa-
days. They are lutetium oxyorthosilicate (LSO) and gadolinium
oxiorthosilicate. Both materials improve photon detection effi-
ciency having a narrower time coincidence window, with a better
energy resolution, allowing dismissing random detections for
instance.
Furthermore, new materials were introduced such as lutecium-­
yttrium oxyorthosilicate (LYSO). Its low coincidence window
permits the use of “time of flight” technique (TOF) [9]. This
method is based on the production of annihilation photons on a
point separated from the centre of the scan. These photons go
across different distances to achieve the respective opposed detec-
2 Principles of Image Formation in the Different Modalities 41

tor. Hence, using rapid detectors (LYSO) with a narrow coinci-


dence window it is possible to detect the time difference between
both detections and, therefore, the point from the coincidence line
where they were produced may be measured with a certain uncer-
tainty. In consequence, this technique improves image quality
specially in heavier patients tomographies.

2.2.4 Image Acquisition and Reconstruction

In a 2D acquisition, a uniform angular sampling is made. The


events collected by every pair of detectors are arranged in a 2D
matrix called synogram. The elements of this matrix correspond
to the numbers of events detected by a pair of detectors ordered
such that rows are a function of the azimuthal angle of the coinci-
dence lines and columns are ordered depending on the distance
from the coincidence line to the centre. When 3D acquisitions are
performed it is necessary to include a polar angle besides the azi-
muthal one.
Additionally, it is important to perform some corrections to the
collected data to optimize resulting images and minimize undesir-
able effects:

• Death time correction: the death time is the time required by


a detector to process and register an event, so it is not possible
to detect anything meanwhile. This effect is more significative
at high activity rates. It is corrected by empiric relations
between true and measured detections ratio.
• Normalization: PET tomographs have thousands of detectors
so, they can present slightly differences of thickness, light
emission and electronic of each detector. It is necessary to cal-
ibrate periodically every detector by different methods such as
the use of a linear 68Ge source.
• Random detections correction: as it was explained previ-
ously, random events, reduce the image quality and distort the
activity values. Therefore, a method to prevent this is called
delayed window method. It consists of a new measure 50 ns
42 P. A. García-Higueras and D. Jimena-Hermosilla

delayed from the time coincidence window, giving a measure


of the random detections and being capable to subtract it from
the coincidence measure in real time.
• Scattered photons correction: this effect is corrected by dif-
ferent and complexes mathematical treatment of the data after
the random detection correction. Such methods depend on
whether it is a 2D or a 3D reconstruction.
• Attenuation correction: attenuation deteriorate PET images.
Moreover, this is the most important effect to be corrected. On
this purpose, four methods may be considered: measured cor-
rections, CT based correction, mathematical method, and seg-
mentation correction.

After acquisition and corrections, the image must be recon-


structed. Reconstruction is aimed to obtain cross-section images
of the spatial distribution of the radionuclide throughout the object
under study. Basically, there are two reconstruction approaches.
The first approach is analytic essentially, that is, it uses mathemat-
ical basis of computed tomography to relate line integral tomog-
raphy to the activity distribution in the object. On this matter,
there are multiple reconstruction algorithms such as filter back-
projection and Fourier reconstruction [5].
To reconstruct the image, a synogram information is needed.
Every angle value of this projection data undergo a Fourier trans-
form and the resulting values are arranged in a grid as a function
of the azimuthal angle Φ. It is applied as an inverse Fourier trans-
form on the grid to obtain the reconstructed image. Unlike simple
backprojection, filtered backprojection uses a filter during recon-
struction. This is called ramp filter and, basically, have an impact
on high frequencies, removing them and, eventually, reduce blur-
ring present in simple backprojection images. The second
approach uses iterative methods to model the data collection pro-
cess to obtain, by means of successive iterative steps, the image
that best accurately matches the measured data.
2 Principles of Image Formation in the Different Modalities 43

2.2.5 Image Quality

The factors affecting the image quality in nuclear medicine might


be [11]:

• Radiopharmaceutical properties: the choice of radiophar-


maceutical, its half-life, energy, and emission characteristics
play a significant role in the image quality. Radiopharmaceuticals
with appropriate characteristics for the specific study are
essential.
• Radiopharmaceutical dosage: administering the appropriate
radiopharmaceutical dosage based on the patient’s weight and
condition is crucial for achieving optimal image quality.
• Radiotracer uptake: the uptake and distribution of the radio-
tracer within the patient’s body can affect image quality.
Variability in uptake patterns can lead to image distortion or
poor contrast.
• Patient preparation: patient preparation, including fasting,
hydration, and any necessary medication adjustments, can
influence image quality. Adequate patient cooperation is also
essential to minimize motion artifacts.
• Detector sensitivity: the sensitivity of the gamma camera or
PET scanner detectors impacts image quality. More sensitive
detectors can provide higher-quality images with better signal-­
to-­noise ratios.
• Collimator design: collimator is a critical component that
shapes the gamma photon paths to form the image. Different
collimator designs have varying trade-offs between spatial
resolution and sensitivity.
• Acquisition time: the duration of image acquisition can affect
image quality. Longer acquisition times can lead to better
image quality but may not always be practical due to patient
comfort and radiotracer decay.
44 P. A. García-Higueras and D. Jimena-Hermosilla

• Motion artifacts: patient motion during image acquisition can


result in blurring or misalignment of structures. Strategies to
minimize motion, such as immobilization devices or gating
techniques, are important.
• Noise: various sources of noise, such as statistical noise, elec-
tronic noise, and patient motion, can degrade image quality.
Noise reduction techniques and longer acquisition times can
mitigate this issue.
• Attenuation correction: correcting for attenuation (absorp-
tion and scattering of radiation within the body) is crucial for
accurate image quantification and improved image quality,
especially in PET imaging.
• Scatter correction: scattered gamma photons can degrade
image contrast and quality. Advanced algorithms can be used
to correct for scatter and improve image quality.
• Image reconstruction algorithms [11]: the choice of image
reconstruction algorithm can significantly impact image qual-
ity. Iterative reconstruction methods often provide superior
results compared to traditional filtered back projection.
• Count statistics: sufficient counts are needed to generate high-­
quality images. Low-count studies may result in noisy images
with poor contrast.
• Technician skills: the skill and experience of the nuclear med-
icine technician in positioning the patient, setting acquisition
parameters, and monitoring the procedure can influence image
quality.
• Quality control: routine quality control measures, including
calibration and maintenance of imaging equipment, are essen-
tial to ensure consistent and high-quality nuclear medicine
images.
• Post-processing: image post-processing techniques, such as
image filtering and contrast enhancement, can be used to
improve image quality and diagnostic accuracy.

Optimizing these factors and adhering to best practices in


nuclear medicine imaging can help ensure that high-quality
images are obtained, leading to more accurate interpretations.
2 Principles of Image Formation in the Different Modalities 45

2.3 Magnetic Resonance Imaging

Magnetic resonance (MR) imaging (MRI) is based on the tissue


properties of responding to magnetic fields and radiofrequency
waves to generate detailed images of the different body structures
[12]. Magnetic field is a vectorial magnitude (with magnitude,
direction, and sense) and is measured in units of Tesla (T) in the
International System.
The advantages of MR include the fact that it does not use ion-
izing radiation, allows multiplanar acquisitions and provides a
large amount of information for each anatomical slice, allowing
dynamic and functional studies. The disadvantages include longer
acquisition times, the sequences are complex to optimize, high
heterogeneity dependent on acquisition parameters and it is more
expensive than other imaging techniques.

2.3.1 Hardware Components of MR

 agnetic Field Magnet


M
To obtain an MRI, it is necessary to create a very intense, uniform,
and stable magnetic field within a defined volume. To generate
this magnetic field, magnets are used, being the most common
magnetic fields generated in actual equipment of 1.5 T and
3 T. The higher the intensity of the magnetic field is, the stronger
the signals are obtained.
Most MR equipment currently used in the clinical environment
employs superconducting magnets to generate the main magnetic
field. These fields are generated by wire coils through which high
intensity current flows. The conductive wires are usually made of
metallic alloys (commonly niobium and titanium) which lose
their resistance to the current flow when cooled to temperatures
close to absolute zero, becoming “superconductive”. For this pur-
pose, the conductive wires are immersed in a liquid He bath. The
main maintenance cost of this equipment is to refill the Helium
gas (about once a year), which gradually evaporate.
46 P. A. García-Higueras and D. Jimena-Hermosilla

 agnetic Field Gradient Magnets


M
MRI equipment use magnetic field gradients to create a spatial
differentiation of the studied region. This creates spatial coding
along X, Y, and Z axes to produce sagittal, coronal, and axial
slices, respectively. Oblique slices can be obtained by activating
several coils simultaneously. These magnetic fields are much
weaker than the main field.

Radiofrequency Coils
They generate the radiofrequency radiation (RF) and are also
responsible for detecting the signal returned by the studied tis-
sues. The most important components are as follows:

• Frequency synthesizer: produces a central frequency which is


matched to the excitation frequency of the nuclei.
• RF envelope: produces a range of frequencies (bandwidth)
around the central frequency.
• Power amplifiers: They magnify the RF pulses in order to
increase the energy responsible for exciting the nuclei.
• Transmitting and receiving antennas: there are many types and
they are responsible for emitting RF signals to excite the nuclei
and collecting the signal emitted by the tissues.

2.3.2 Physical Basis

Most existing MR equipment is based on the excitation of hydro-


gen (H) atoms [13]. H nucleus consists of a single proton and is
the most abundant element in living organisms as it is part of
water molecules.

 uclear Spin and Magnetic Moment


N
Spin is a particle property which is quantized, i.e., it only takes
certain discrete values. Particles with non-zero spin, such as pro-
tons, being electric charged in motion, generate around them a
magnetic field that has an associated vector “magnetic moment”
(μ), which is oriented in the spin direction.
2 Principles of Image Formation in the Different Modalities 47

In a nuclei, composed of protons and neutrons, the spins tend


to be paired, because this is an energetically favourable situation.
Magnetically active nuclei have a non-zero spin, i.e., those nuclei
that have an odd number of protons and/or neutrons. The H atom,
being composed of a single proton, has a spin.
Without an external magnetic field, the direction of the mag-
netic moment of the H nucleus is randomly oriented, cancelling
themselves mutually and resulting, macroscopically, in a total
magnetic charge of the body equal to zero. However, when an
external magnetic field is applied, the magnetic moment (μ) tends
to align in the direction of the magnetic field.

 recession and Larmor Frequency


P
The magnetic moment lines μ of the H nucleus are not completely
parallel to the direction of the magnetic field and perform a coni-
cal rotational movement, defined as precession motion (Fig. 2.4).
The precession angle is determined by quantum laws, but the pre-
cession frequency is characteristic of each nucleus and depends
on the applied magnetic field (B). This characteristic frequency is
called Larmor frequency (v) and is calculated as follows, being γ
the gyromagnetic ratio of the particle, which depends on its charge
and mass:
v = Bg (2.4)

 arallel or Antiparallel Alignment


P
Once the H nucleus is introduced into a magnetic field B, there are
two possibilities: to be positioned in a lower energy state parallel
to the magnetic field or to be positioned in a higher energy state
antiparallel to the magnetic field. Considering the human body
and the magnetic fields used in MR at ambient temperature, there
is an excess of H nuclei positioned parallel to those positioned
antiparallel.
The vector sum of the magnetic moments at each nuclear spin
is defined as the net magnetization vector, represented by M.
Without an external magnetic field, the magnetization vector is
generally zero, because the magnetic moments of the H nuclei are
48 P. A. García-Higueras and D. Jimena-Hermosilla

Fig. 2.4 The particle rotation (spin) generates a magnetic moment μ. When
an external magnetic field B is applied, the magnetic moment tends to be
oriented in the direction of the magnetic field (B), forming an angle θ with it
and producing a precession motion at Larmor frequency. Figure adapted from
published in [13]

oriented in random directions. When a magnetic field is applied,


there is an excess of parallel oriented nuclei, originating a net
magnetization oriented in the direction and sense of the magnetic
field without any transverse magnetization component.

 esonance and Nutation Motion


R
Only with the magnetic field B presence, the magnetization vector
M is in equilibrium (Fig. 2.5a). To obtain information for image
generation, it is necessary to excite the nuclei that compose the
tissues. This is done by applying radiofrequency pulses at Larmor
frequencies of the nuclei to be excited.
During the radiofrequency pulse, the H nuclei with lower
energy (parallel state) absorb energy and switch to a higher energy
2 Principles of Image Formation in the Different Modalities 49

a b

Fig. 2.5 (a) In presence of a magnetic field B, the magnetization vector M is


in equilibrium with the same direction and sense of B due to a higher concen-
tration of atoms in parallel state. (b) Once the RF pulse is applied, the atoms
precess coherently causing M to be projected on the transverse plane (X,Y).
When the RF pulse ends, the particles enter into a relaxation phase returning
to their original state (a). Figure adapted from published in [13]

state (antiparallel state). Macroscopically, the magnetization vec-


tor M moves away from its equilibrium position during the pulse.
In addition to the change of the magnetization vector in the Z
axis, all the protons subjected to the RF pulse enter into resonance
simultaneously, i.e., coherently (Fig. 2.5b). This coherence pro-
duces, besides modifying the magnetization vector in the Z axis, a
growth of the vector projected on the transverse plane (X,Y). This
process is called “radiofrequency pulse excitation”.

Longitudinal Relaxation: T1
At the moment the RF pulse ends, the H nuclei release their
energy to the surrounding medium, so some of them oriented in
an antiparallel state return to their parallel state. A more homoge-
neous surrounding medium means a more coherent and uniform
energetic release.
50 P. A. García-Higueras and D. Jimena-Hermosilla

Water, due to its chemical properties, accepts with difficulty


the energy exchange, so the relaxation is coherent and very slow.
This fact causes water to have a long longitudinal time (long T1).
In contrast, fat, due to its mobility, produces rapid energy
exchanges and has short T1 relaxation times.
Instead of a single RF pulse, several pulses are emitted sepa-
rated by a time called relaxation time (TR), the tissues with a long
T1 (water) do not have time to reach total relaxation, so they will
have fewer relaxed nuclei available to excite when a new RF pulse
arrives. This produces that tissues with a short T1 (fat) emit a
stronger signal than those with a long T1 (water).

 ransverse Relaxation: T2 and T2*


T
Once the RF pulse ends, there is a loss of precession coherence or
dephasing in the X, Y field. Therefore, some protons will precess
more slowly than others depending on the influence of the medium
or the local variations of the applied magnetic field. One of these
two factors, the inhomogeneity of the external magnetic field, can
be compensated.
The transverse relaxation time is denominated T2 when the
inhomogeneities of the external magnetic field are compensated
and only the medium is considered, but if the inhomogeneity of
the magnetic field is included, it is denominated T2*. Thus, T2*
will always be shorter than T2, since the two influences cause a
faster loss of coherence.
Free induction decay (FID) is the electrical current induced by
the relaxation motion of the H nuclei after the RF signal ends.
This signal is registered in a receiving antenna and is processed to
obtain the image.
The time between sending the RF pulse and collecting the FID
is called the echo time (TE). Consequently, with a fixed TE, the
tissue that loses the coherence more slowly (long T2) has a stron-
ger signal (water).

 roton Density Image (PD)


P
For a given TE, if the TR is lengthened, the longitudinal relax-
ation of the tissues will be complete and the effect of the T2 relax-
2 Principles of Image Formation in the Different Modalities 51

ation time will be minimal, so the resulting image will depend on


the density of H nuclei in the voxel.

 1, T2 or PD Weighted Images
T
All MRI have both T1 and T2 components. A correct selection of
TR and TE parameters allows a weighting of T1, T2 or a suitable
combination of both (PD weighted image). To summarize:

• Short TR/Short TE: T1 weighted image


• Long TR/Short TE: PD weighted image
• Long TR/Long TE: T2 weighted image

2.3.3 Image Acquisition and Reconstruction

The magnetic gradient fields Gx, Gy, and Gz are activated to create
a spatial encoding along the three space directions. The Gz gradi-
ent is used to select the slice along the longitudinal axis, for the
transversal plane the Gy (phase encoding) and Gx (frequency
encoding) are used [14].
Phase encoding begins when Gy is activated. The rows that
receive a higher magnetic field precess at a higher frequency than
other rows that receive a lower magnetic field. When the Gy gradi-
ent closes there is a phase shift between the different rows whereby
the row in the plane can be uniquely identified.
The frequency encoding is activated by Gx, which is perpen-
dicular to Gy, so that each column will receive a different mag-
netic field. By Larmor’s Law the H nuclei of different columns
will precess at a different frequency. To prevent Gx and Gy from
overlapping making encoding of each row impossible, a bipolar
Gx gradient is applied, with two lobes of the same amplitude and
duration but in the opposite directions.
During the first lobe (−Gx) no signal is captured and it is used
to produce a phase shift that will be compensated for the one pro-
duced during the reading. When the second gradient (+Gx) is
applied, the echo signal is collected. The second lobe (+Gx) is
applied just after the first one and inverts the gradient over the
52 P. A. García-Higueras and D. Jimena-Hermosilla

nuclei of H. Therefore, the nuclei that had been advanced in phase


are now delayed and after a time (tx) all the spins are in phase
again. During the reading phase each column relaxes at a fre-
quency which depends on its position.
At the beginning of the echo signal collection the spins are
very phase shifted, so the collected signal is very small. As time
elapses the phase shifts are gradually recovered, and the echo sig-
nal increases up to a maximum after tx time. Generally, the second
lobe is allowed to act again for another tx time to collect the com-
plete echo signal because the signal decreases progressively as the
phase shift increases due to the effect of this new gradient
(Fig. 2.6).
The encoding process of a complete plane will be repeated as
many times as the number of rows in a plane, since for each row
the signal corresponding to the action of the bipolar gradient Gx
with a phase encoding determined by Gy will be collected. The
echo signals are digitized and stored orderly in a matrix that con-
stitutes the k-space.

Fig. 2.6 Schematic diagram of the application of magnetic gradients for


echo signal generation and spatial encoding of the tomographic plane. Figure
adapted from published in [14]
2 Principles of Image Formation in the Different Modalities 53

 -Space
K
The echo signal collected by the receiving antenna is subjected to
a series of stages before digitization [15]. Bandwidth (BW) is the
frequency range collected and accepted for digitization measured
in Hertz (Hz). The digitizing process of the echo signal is carried
out by measuring the voltage at regular time intervals called sam-
pling intervals (∆tm). The number of samples to be taken corre-
sponds to the number of pixels to be displayed in a row. Actually,
two components are generated in the process (real component and
imaginary component), although for simplicity of explanation, it
will be considered as a single component for the moment.
By Nyquist’s theorem, a signal can be mathematically recon-
structed if it is band-limited and the sampling rate is more than
two times the maximum frequency of the sampled signal.
The echo signal’s maximum frequency is found at the extreme
of the acquire band, that is, at BW/2. Applying Nyquist’s theorem,
the minimum reading frequency of the signal will be BW, and the
sampling intervals will be determined by

1 1
Dtm = = (2.5)
BW 2 · f max
Hence, for each encoding a line of digitized values of the echo
signal is obtained spaced by a sampling interval ∆tm. From the Gy
values employed, it is possible to arrange the lines obtained from
the different echoes as rows of a matrix in which the columns
would be separated by a time ∆tm and the rows by the time it takes
to transition from one echo signal to another, i.e., the TR. This
matrix, which constitutes the digitized data space in time-domain,
must be transformed to frequency-domain data to reconstruct the
image.
The signals that constitute the echo, obtained by the action of
the Gx gradient, belong to a frequency range that depend on the
position in the voxel plane, therefore, we obtain as many digitized
values expressed in spatial frequency scale (kx) as the number of
pixels to represent in a row.
54 P. A. García-Higueras and D. Jimena-Hermosilla

The maximum spatial frequency in the direction of Gx at the


extreme of the field of view in that direction (FOVx) is determined
by

æ g ö æ FOVx ö
f max = ç ÷ · Gx · ç ÷ (2.6)
è 2p ø è 2 ø
Applying Nyquist’s theorem again, the spatial frequency is
given by

1 æ g ö
Dk x = =ç ÷ · Gx · Dtm (2.7)
FOVx è 2p ø
Since time t is taken based on the maximum value of the TE,
the values of kx are ordered on a line of spatial frequencies spaced
one interval apart ∆kx ordered symmetrically with respect to the
centre. This line constitutes a row in the k-space matrix.
In order to fill the matrix, it is necessary to collect as many
echoes as voxels in the column, which are encoded by the gradient
value Gy. Considering that the time of application of Gy is always
the same (ty), the difference between echoes will be the variation
of Gy gradient value, therefore:

1 æ g ö
Dk y = =ç ÷ · DG y · t y (2.8)
FOVy è 2p ø
This is how the data matrix that constitutes the K-space is
obtained, where each row is separated by ∆ky and each column by
∆kx, considering also that each matrix position (kx, ky) corresponds
to a value (signal strength).
Thus, the most external or peripheral line of the K-space matrix
will be filled with the highest value of Gy. Since high spatial fre-
quencies carry information about fast signal variations in space,
the most external rows of K-space carry information about the
spatial resolution of the image. Analogously, the central part of
the matrix is where the highest signal intensities are stored
because Gy has the lowest values of spatial frequencies, which
carry much information about contrast, i.e., the central part of the
K-space matrix carries information about contrast resolution.
2 Principles of Image Formation in the Different Modalities 55

As explained at the beginning, with the digitization of the echo


signal two components are generated, so that K-space is com-
posed of a matrix of two components (real and imaginary). With
both spaces two different images are formed: the magnitude
image, which is the image usually presented, and the phase image.
When all the K-space data have been filled, a spatial domain
image can be obtained by Fourier transformation [5] (FT), assign-
ing a chromatic value (representation scale) to each spatial posi-
tion (image pixel). Therefore, every image has an associated
equivalent K-space, and it is possible to pass from one to the other
by Fourier transforms.
Although rows and columns of the matrix are coincident in the
image and K-space, in each K-space value there is information
about the entire image plane, so a position in the image plane (x,y)
is not necessarily highly correlated with the analogous K-space
value (kx,ky).
K-space in MRI is a mathematical space that represents the
information collected during the acquisition process and is one of
the most versatile tools in image generation. Image artifacts, such
as those caused by patient movements or hardware limitations,
can appear in K-space and affect image reconstruction. Depending
on the way the matrix is filled, the amount of information stored
or its rearrangement, different types of an image can be obtained
with different acquisition times. Advances in acquisition and
reconstruction techniques have been fundamental in improving
image quality and speed of MR imaging.

2.3.4 Image Quality

There are several parameters that can be adjusted before starting a


MRI acquisition that affect image quality [16]. Therefore, to
achieve a good quality image in a reasonable period of time, it is
necessary select them appropriately. The principal factors respon-
sible for image quality in MRI are signal-to-noise ratio (SNR),
contrast-to-noise ratio (CNR), spatial resolution, and image
acquisition time.
56 P. A. García-Higueras and D. Jimena-Hermosilla

 ignal-to-Noise Ratio (SNR)


S
SNR represents the ratio of received signal to received signal
noise. A simple way to improve image quality is increasing signal
and decreasing noise, although this is not always possible, as
some parameters improve SNR at the cost of deteriorating other
factors. Parameters that most affects SNR are the following:

• Proton density. This affects the received signal, since a tissue


with a higher proton density sends a stronger signal than a tis-
sue with a lower proton density.
• Voxel volume. The SNR is proportional to voxel volume. Thus,
we can increase SNR by increasing the slice thickness or
increasing the FOV while leaving the matrix size invariant.
Equivalently, the matrix size can be decreased leaving the FOV
invariant. This is done by decreasing the number of rows/col-
umns obtained in k-space.
• TR, TE, and flip angle of the magnetization vector.
A long TR allows complete recovery of the longitudinal
magnetization, increasing the signal. In contrast, once a short
TE is used, the transverse plane magnetization loss before
picking up the signal is minor. The flip angle of the magnetiza-
tion vector determines the amount of magnetization is gener-
ated in the transverse plane. Thus, a sequence using a flip angle
of 90° provides more signal, since all the longitudinal magne-
tization is displaced to the transverse plane to be collected,
increasing the SNR.
• Number of acquisitions or excitations. Represents the number
of times the data collection is repeated. The SNR is propor-
tional to square root of the number of acquisitions:
SNR µ Acquisitions number.
• Type of coil used in the acquisition.

Spatial Resolution
Parameters involved in spatial resolution are slice thickness, the
FOV or the image matrix size. To increase the spatial resolution,
the voxel volume has to be reduced by decreasing the slice thick-
ness, the FOV or by increasing the matrix size. These improve-
ments, as discussed above, result in a reduction of the SNR.
2 Principles of Image Formation in the Different Modalities 57

 ontrast-to-Noise Ratio (CNR)


C
It is the difference in a grey scale between tissues. Some of the
parameters that affect the CNR are as follows:

• Relaxation times T1, T2, and proton density. These properties,


related to the emitted signal, are part of the physical properties
of the tissues.
• TR, TE, and flip angle of the magnetization vector. With a long
TR, the magnetization vector is fully recovered before the next
pulse is received and therefore, it is available to be shifted to
the transverse plane, which will enhance the contrast.
Using long TE (T2-weighted sequences) the tissues, in
which the magnetization vector in the transverse plane don’t
disappear when the echo is read, are those with long T2 relax-
ation times. Thus, the rest of the tissues will not present signal,
and although the SNR is worst, there will be a high CNR.
The flip angle of the magnetization vector also influences
the CNR, since it determines the amount of magnetization gen-
erated in the transverse plane: the greater the angle, the greater
the contrast.

Image Acquisition Time


A shorter image acquisition time means that the image is less
likely to be impaired by patient movements. The parameters
affecting acquisition time are as follows:

• TR. Decreasing TR results in an incomplete recovery of the


magnetization vector, with a consequent decreasing in the
SNR.
• Number of phase encodings. By reducing the number of
k-space lines, a rectangular field appears. This reduces the
acquisition time, spatial resolution and increase the SNR ratio.
• Acquisitions number. If the number of acquisitions is reduced,
the acquisition time decreases without changing the spatial
resolution, but reducing the SNR.
• Echo reading time. By reducing the TE, the signal drop is
reduced and the SNR is increased. What is used to further
58 P. A. García-Higueras and D. Jimena-Hermosilla

reduce the acquisition time is to obtain a fractioned echo, in


which only part of the frequency encodings is acquired, and
the rest is calculated by the equipment. In this way the resolu-
tion does not decrease, but the SNR is deteriorated.

It is important to know these parameters and their interrela-


tionship, since a correct management of these will allow to obtain
an image with a great signal, good resolution, and contrast and
obtained in the shortest feasible time.

References
1. Brenner DJ, Hall EJ (2007) Computed tomography, an increasing source
of radiation exposure. New Engl J Med 357(22):2277–2284. https://doi.
org/10.1056/NEJMra072149
2. Johns HE, Cunningham JR (1983) The physics of radiology, 4th edn.
ISBN: 0-398-04669-7
3. Curry TS, Dowdey JE, Murry RC (1990) Christensen’s physics of diag-
nostic radiology, 4th edn. ISBN: 0-8121-1310-1
4. Herman GT (2009) Fundamentals of computerized tomography. Image
reconstruction from projections, 2nd edn. ISBN: 978-1-84628-737-7
5. Bracewell RN (2000) The Fourier Transform and its applications, 3rd
edn. ISBN: 0-07-303938-1
6. Aichinger H, Dierker J, Joite-BarfuB S, Säbel M (2012) Radiation expo-
sure and image quality in X-ray diagnostic radiology, 2nd edn. ISBN:
978-3-642-11240-9
7. Anand SS, Singh H, Dash AK (2009) Clinical applications of PET and
PET-CT. Med J Armed Forces India 65(4):353–358. https://doi.
org/10.1016/S0377-­1237(09)80099-­3
8. Shukla AK, Kumar U (2006) Positron emission tomography: an over-
view. J Med Phys 31(1):13–21. https://doi.org/10.4103/0971-­6203.25665
9. Surti S (2015) Update on time-of-flight PET imaging. J Nucl Med
56(1):98–105. https://doi.org/10.2967/jnumed.114.145029
10. Cherry SR, Dahlbom M (2006) PET: physics, instrumentation and scan-
ners. ISBN: 978-0-387-32302-2
11. Cherry SR, Sorenson JA, Phelps ME (2012) Image quality in nuclear
medicine. Phys Nucl Med 233–251. https://doi.org/10.1016/b978-­1-­
4160-­5198-­5.00015-­0
2 Principles of Image Formation in the Different Modalities 59

12. Weishaupt D, Köchli VD, Marincek B (2008) How does MRI work? An
introduction to the physics and function of magnetic resonance imaging,
2nd edn. ISBN: 978-3-540-30067-0
13. Harms SE, Morgan TJ, Yamanashi WS, Harle TS, Dodd GD (1984)
Principles of nuclear magnetic resonance imaging. Radiographics 4:26–
43. https://doi.org/10.1148/radiographics.4.1.26
14. Conolly S, Macovski A, Pauly J, Schenck J, Kwong KK, Chesler DA
(2000) The biomedical engineering handbook, 2nd edn. Magnetic
Resonance Imaging. ISBN: 0-8493-0461-X
15. Horowitz AL (1995) MRI Physics for Radiologists: a visual approach,
3rd edn. ISBN: 978-0-387-94372-5
16. Westbrook C, Kaut C (2006) MRI in clinical practice, 2nd edn. ISBN:
978-1-84628-161-7
How to Extract Radiomic
Features from Imaging 3
A. Jimenez-Pastor
and G. Urbanos-­García

3.1 Introduction to Radiomic Analysis

Radiomic analysis has been widely applied in cancer research and


has demonstrated its potential to improve patient care. For
instance, radiomic analysis can provide information on tumor het-
erogeneity, which is known to be an important factor in cancer
progression and treatment resistance. In addition, radiomic analy-
sis has shown promising results in patient staging and risk strati-
fication at diagnosis [1, 2] and in predicting patient outcomes,
such as overall survival, disease-free survival, and progression-­
free survival [3, 4]. Furthermore, radiomic analysis can be used to
predict treatment response and identify patients who are likely to
benefit from specific therapies [5, 6]. This can help avoid unnec-
essary treatments and reduce the risk of side effects, ultimately
improving patient quality of life.
These predictions are based on a combination of radiomic fea-
tures, clinical data, and other biomarkers. By integrating multiple
sources of data, radiomic analysis can provide a more comprehen-

A. Jimenez-Pastor (*) · G. Urbanos-García


Department of AI Research, Quibim, Valencia, Spain
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature 61


Switzerland AG 2023
Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image
Processing, Imaging Informatics for Healthcare Professionals,
https://doi.org/10.1007/978-3-031-48446-9_3
62 A. Jimenez-Pastor and G. Urbanos-García

sive understanding of the patient’s disease status and help clini-


cians make informed treatment decisions [7].
Radiomic analysis is not limited to cancer research and can
also be applied to other medical conditions, such as neurological
disorders, cardiovascular diseases, and respiratory diseases. In
these fields, radiomics can help identify early disease markers and
monitor disease progression. For instance, radiomic analysis has
been used to diagnose Alzheimer’s disease and predict cognitive
decline in older adults [8]. In cardiovascular diseases, radiomic
analysis has been used to evaluate the morphology and function of
the heart, which can help identify patients at risk of developing
heart failure or other cardiovascular complications [9]. Finally, in
respiratory disorders radiomics has been used to stage idiopathic
lung disease (ILD) [10] or to stratify risk patients with chronic
obstructive pulmonary disease (COPD) [11].
Radiomic analysis can be applied to any image modality,
including computed tomography (CT), magnetic resonance imag-
ing (MRI), positron emission tomography (PET), and others. The
process of radiomic analysis involves several steps, starting with
image acquisition and preprocessing, followed by the segmenta-
tion of the region of interest (ROI). Once the ROI is defined,
radiomic features are extracted, which include shape, intensity,
texture, and statistical measures, among others. These features are
then subjected to statistical and machine learning algorithms to
identify patterns and relationships between features and clinical
endpoints. In addition to radiomic features, other features such as
deep features or other imaging biomarkers can be extracted from
the image. These features can be combined with radiomic features
to provide the model with a more comprehensive view of the
patient’s condition. To extract deep features from medical images,
pretrained convolutional neural networks (CNNs) are used. These
architectures utilize multiple convolutional filters to extract fea-
tures at different levels. This approach results in a high-­
dimensional feature vector for each input image. On the other
hand, imaging biomarkers are parameters that are computation-
ally derived and have been shown to correlate with a physiologi-
cal or pathological process. Some examples of imaging biomarkers
3 How to Extract Radiomic Features from Imaging 63

include the apparent diffusion coefficient (ADC), which is a mea-


sure of tissue cellularity, obtained from MRI diffusion weighted
images (DWI); Ktrans, which is a measure of tissue permeability,
obtained from MRI dynamic contrast enhanced (DCE); and pro-
ton density fat fraction (PDFF), which is a measure of fat concen-
tration, obtained from multi-echo T1 weighted (w) MRI. These
are just a few examples of the many imaging biomarkers that can
be extracted from medical images using various computational
methods. The use of imaging biomarkers can help to improve
diagnostic accuracy, predict treatment response, and monitor dis-
ease progression. Additionally, imaging biomarkers can provide
valuable insights into the underlying biology of diseases [12].
Despite the potential benefits of radiomic analysis, there are
still several challenges that need to be addressed. One major chal-
lenge is the lack of standardization in image acquisition and anal-
ysis protocols, which can affect the reproducibility and reliability
of radiomic features [13]. Also, the lack of robustness found
across some radiomic features due to intra- and inter-observer
variabilities [14]. Another limitation is the need for larger datasets
to develop robust and reproducible predictive models. Also, the
lack of external validations limits the application of these solution
in clinical practice. Finally, the integration of radiomic analysis
into clinical workflows remains a challenge, as it requires a mul-
tidisciplinary approach and collaboration between radiologists,
oncologists, and other healthcare professionals. However, there
are different initiatives to try to overcome all these limitations.
In conclusion, radiomic analysis is a rapidly growing field that
has the potential to transform medical imaging and improve
patient outcomes. By providing quantitative information on tissue
structure and function, radiomic analysis can help clinicians make
more informed treatment decisions and develop personalized
treatment strategies. However, several challenges need to be over-
come to ensure the widespread adoption of radiomic analysis in
clinical practice. With continued research and development,
radiomic analysis holds great promise in advancing personalized
medicine and improving patient care.
64 A. Jimenez-Pastor and G. Urbanos-García

3.2 Deep Learning vs. Traditional Machine


Learning

When building a predictive model based on medical imaging,


there are two main approaches: feature-based models and
imaging-­based models (Fig. 3.1).
Feature-based models involve extracting features from the
images, which can be based on radiomic features, deep features,
or other imaging biomarkers used for tissue characterization.
Before feature extraction, usually, a ROI is defined, therefore,
only a limited area of the image is analyzed, losing contextual
image information. Machine learning pipelines are then used to
build prognostic models, based on these input features, for the
prediction of clinical outcomes. However, a main limitation of
this approach is that these features are sensitive to differences
between scanners and acquisition protocols, which can lead to
reduced generalizability of the model. However, initiatives such
as the Quantitative Imaging Biomarkers Alliance (QIBA), the
European Imaging Biomarkers Alliance (EIBALL) or the Imaging
Biomarkers Standardization Initiative (IBSI) have been estab-
lished to promote standardization in image acquisition and pro-
cessing and facilitate data sharing among researchers. The IBSI is
an international collaborative effort between researchers, clini-
cians, and industry partners aimed at promoting radiomics stan-
dardization through the development of guidelines and ­standards

Fig. 3.1 Feature-based models vs. imaging-based models in the develop-


ment of predictive models in medical imaging
3 How to Extract Radiomic Features from Imaging 65

for radiomics feature extraction and analysis. In addition, feature


harmonization techniques such as ComBat can be applied to min-
imize these differences at a later stage [15, 16].
In contrast, imaging-based models have emerged as a promis-
ing alternative to feature-based models. One of the reasons for
this is that imaging-based models can learn features automati-
cally, which can help overcome the limitations of feature-based
models that are sensitive to inter-scanner and inter-protocol vari-
ability. Furthermore, imaging-based models can detect complex
patterns that may not be visible to the human eye or through man-
ual feature extraction. Also, the whole image can be used as input
to these models (i.e., an ROI is not required to be defined), using
all the contextual information. This method is based on end-to-­
end solutions using deep learning (DL), specifically CNN. A
CNN is built by multiple convolutional layers, used to extract fea-
tures from the input image. During the CNN training process, the
weights of these layers are learned, allowing the features extracted
to be adapted to the specific problem. This is a significant advan-
tage of imaging-based models compared to feature-based models.
Additionally, with a large and heterogeneous dataset, a CNN can
learn the differences found across scanners and acquisition proto-
cols and provide a more generalizable solution.
However, CNNs are complex models with many parameters to
adjust during the training process. As such, large datasets are
required to avoid overfitting to the training data, which can make
their use challenging due to the difficulties of collecting such
large datasets. Another challenge in building predictive models
based on medical imaging is the interpretability of the models.
DL models, such as CNNs, are often criticized for being black
box models, meaning that it can be challenging to understand how
the model arrives at its predictions. While recent advances in
interpretability techniques have addressed this issue to some
extent [17], it remains a significant challenge in the field. One of
the most significant challenges in building predictive models
based on medical imaging is the availability of high-quality data.
Collecting and curating large datasets is a time-consuming and
expensive process, particularly when dealing with medical images
that require annotation by trained experts.
66 A. Jimenez-Pastor and G. Urbanos-García

In conclusion, the use of medical imaging for predictive mod-


eling is a rapidly evolving field with significant potential for
improving patient outcomes. Both feature-based models and
imaging-based models have their advantages and limitations, and
the choice of approach depends on the specific problem and avail-
able data. In some cases, feature-based models may be sufficient,
while in others, imaging-based models may be necessary to
achieve higher accuracy and generalizability. As more high-­
quality data becomes available, and initiatives continue to pro-
mote standardized data sharing and collaboration, it is expected
that the field will continue to advance, leading to more accurate
and interpretable predictive models.

3.3 Radiomic Features Extraction Process

The goal of radiomic features extraction is to obtain as much


quantitative information as possible from the medical images,
which can be used for subsequent analysis.
Radiomic features can be categorized into four main groups:
shape-based, intensity-based, texture-based, and higher order fea-
tures:

• Shape-based features describe the shape and size of the lesion


and include features such as volume, surface area, compact-
ness, and sphericity.
• Intensity-based features describe the intensity distribution
within the ROI and include features such as mean, median,
variance, skewness, and kurtosis.
• Texture-based features describe the spatial distribution of
intensity values within the ROI and include features such as
entropy, energy, homogeneity, and correlation.
• Finally, higher order features are extracted from a derived
image after applying a filter to the original image. The most
common filters are the wavelet transformation which decom-
pose the image into different frequency bands and extract
features from each band; the Laplacian of Gaussian filter
­
3 How to Extract Radiomic Features from Imaging 67

(LoG), that enhances edges; or other mathematical operations


such as square, square root, logarithm or exponential. Both
intensity-­based and texture-based features are extracted from
these derived images.

Therefore, following this process, thousands of features are


extracted from the ROI.
The process of radiomics feature extraction typically involves
several steps.

3.3.1 Image Preprocessing

First, the medical images are preprocessed to enhance the quality


of the images and remove noise. Also, image harmonization tech-
niques can be applied at this stage. These preprocessing methods
aim to reduce variabilities across images acquired in different
scanners and acquisition protocols. DL has shown great results in
both image quality enhancement [18] and harmonization [19].
One of the most common and basic normalization techniques is
the z-score normalization where the image intensities are normal-
ized to a distribution of zero mean and unit variance. Also, in
some cases, can be interesting to remove outliers from the image
to avoid biasing the radiomic values to spurious voxel intensities,
The most commonly used method is to calculate the mean (μ) and
standard deviation (σ) of the intensities within the ROI and to
exclude those outside the range μ ± 3σ.
Another source of variance is the size of the image and the
reconstructed voxel size [20]. When images are acquired using
different scanners, the voxel dimensions can vary between images,
resulting in discrepancies in the extracted radiomic features. To
mitigate these differences, images are usually resized to reduce
the variability across scans. Currently, there is not clear recom-
mendation whether to upsample or downsample the image or the
exact final voxel size. In general, if the spacing between slices is
small compared to the voxel size in the acquisition plane, the
image can be resized to an isotropic voxel, commonly of size 1 or
68 A. Jimenez-Pastor and G. Urbanos-García

2 mm3. However, for images with low resolution between the


slices, a 2D approximation is often used, where the image is
resized to have isotropic pixels, meaning the same voxel size only
in the acquisition plane. Resizing the image in this way can help
to reduce the variability of the radiomic features and improve the
accuracy of the analysis. To address the image harmonization
problem, AI-based methods are developed minimizing the influ-
ence of the type of scanner used, the center to which the patient
belongs, and the parameters used for image acquisition.

3.3.2 Image Segmentation

Once the image is ready, the ROI is defined by a radiologist or


automated segmentation algorithm. The process of ROI definition
is a critical step in radiomic features extraction as it can signifi-
cantly impact the reproducibility of the results. Manual segmenta-
tion by a radiologist is the gold standard, but it is time-consuming
and can introduce intra- and inter-reader variability [21], particu-
larly in cases where the ROI borders are unclear. To mitigate this
limitation, multiple readers may be used, and radiomic features
with a low intraclass correlation coefficient (ICC) can be dis-
carded. The ICC is a measure of reproducibility, where 0 indicates
no reproducibility, and 1 indicates perfect reproducibility. To
overcome the limitations of manual segmentation, semi-automatic
and automatic methods have been developed. Semi-automatic
methods, such as thresholding or region growing, can reduce
intra-reader variability while maintaining inter-reader variability,
but fully automatic methods based on DL can significantly reduce
both sources of variability. However, it is important to note that
automatic methods generate systematic errors, which can be min-
imized by training them on large and heterogeneous datasets. In
recent years, deep learning-based solutions for automatic segmen-
tation of organs and lesions have shown promising results com-
pared to traditional computer vision methods. However, one of
the main disadvantages of these automatic methods is the lack of
generalization to new data. When using an automatic method, it is
3 How to Extract Radiomic Features from Imaging 69

important to analyze if any external validation was done after


model building and to assess its performance on the dataset of
interest.

3.3.3 Feature Extraction and Selection

Once the ROI is defined, radiomic features can be extracted using


specialized software or programming libraries. A variety of open-­
source and commercial packages are available, including
PyRadiomics, LIFEx, and PyCWT. The choice of package
depends on several factors, such as the type of images being ana-
lyzed, the specific features of interest, and the user’s level of
expertise. With the availability of these tools, radiomic analysis
has become more accessible and standardized, enabling its
broader adoption in clinical practice and research.
The choice of radiomic features to extract is a critical step in
the analysis of medical imaging data and can vary depending on
the specific research question or clinical application. Typically, a
large number of features are extracted for each use case, therefore,
to avoid overfitting, the next step is to reduce the number of fea-
tures through a process of feature selection. Several methods can
be used for feature selection, including traditional statistical and
machine learning techniques, as well as more recent methods
such as deep feature selection. It is worth to mention that when
dealing with radiomic features, it is crucial to perform a correla-
tion analysis before feature selection. Many radiomic features are
highly correlated, and it is important to identify these correlations
and remove redundant features to avoid overfitting the model.
This can be achieved by keeping only the most informative fea-
ture when two variables are highly correlated. Overall, careful
selection of radiomic features is essential for building accurate
and robust predictive models from medical imaging data.
It is important to note that the quality and reproducibility of
radiomic features can be affected by several factors, including
image acquisition protocols, segmentation methods, and feature
extraction algorithms. Therefore, it is essential to ensure standard-
70 A. Jimenez-Pastor and G. Urbanos-García

ization and validation of the radiomics workflow to ensure robust


and reliable results. Several initiatives, such as IBSI, have been
launched to address these challenges and promote the use of
radiomics in clinical practice.

3.3.4 Standardization

Radiomics standardization is a crucial aspect of medical imaging


research, especially when using radiomic features for predictive
modeling. Standardization helps to ensure that results are repro-
ducible, and that radiomic features can be compared across differ-
ent studies and datasets.
The IBSI guidelines [22] provide recommendations for all
aspects of the radiomics workflow, from image acquisition and
preprocessing to feature extraction and analysis. They include
recommendations for the definition of the ROI, image preprocess-
ing, and feature extraction parameters. The IBSI guidelines also
provide a standardized nomenclature for radiomic features, which
helps to ensure consistency across studies and datasets.
The standardization of radiomic features extraction is divided
in two chapters:

• Chapter 1 focused on the standardization of 169 commonly


used radiomic features. This was initiated in 2016 and com-
pleted in 2020. Therefore, a standard radiomics image process-
ing scheme together with the reference values to different
radiomic features are provided.
• Chapter 2 dedicated to the standardization of commonly used
imaging filters in radiomic studies (e.g., wavelet, LoG, etc.),
since features derived from filter response maps have been
found to be poorly reproducible. Therefore, the main goal is to
standardize the way image filters for radiomics studies are
implemented to improve reproducibility.

Each chapter is divided in three phases: (1) standardization of


radiomic feature computations using a digital phantom without
image processing; (2) standardization of radiomic feature compu-
3 How to Extract Radiomic Features from Imaging 71

tations under a general image processing scheme using CT data of


a lung cancer patient; and (3) validation using a multi-modality
imaging dataset of multiple patients.
To demonstrate an algorithm follows the IBSI guidelines in
radiomic features extraction, these guidelines, given in specific
image processing process, provide the exact value (together with
a tolerance) that the algorithm should provide for each radiomic
feature.
The IBSI guidelines are constantly evolving and being updated
based on new research findings and feedback from the commu-
nity. These guidelines are freely available online, and software
packages for implementing them are also available. Following the
IBSI guidelines is not only important for ensuring consistency
across studies but can also help to improve the accuracy and reli-
ability of radiomics-based predictive models.
In conclusion, the IBSI guidelines are an essential resource for
researchers and clinicians working with radiomics data.
Standardization ensures that radiomic features can be compared
across studies and datasets, leading to more robust and reliable
results. Following the IBSI guidelines not only ensures consis-
tency but also facilitates the development of more accurate and
reliable predictive models, ultimately improving patient outcomes.

3.4 Deep Learning Radiomic Features

In recent years, DL has shown a good performance in medical


image classification, detection or segmentation [23–25]. The fun-
damental notion of DL is neural network (NN) that emulates the
behavior of the human brain to solve complex data-based prob-
lems. A NN transforms the input image through several layers by
extracting features from images. These features contain textures
and edges information that propagates through the layers of the
network maintaining the spatial relationships in the image.
Commonly, features extracted from the last layer of the NN are
called deep features. The approach of extracting deep features
from medical imaging is referred to as deep learning-based
radiomics (DLR).
72 A. Jimenez-Pastor and G. Urbanos-García

DLRs have been used for disease diagnosis such as cancer type
prediction [26] or survival prediction [27]. These features can be
extracted through different DL architectures, in imaging, the most
common architectures are based on CNN.
The deep features extraction issue can be approached in differ-
ent ways and from different levels. On the one hand, the images
input can be at slice level, at volume level or at patient level. On
the other hand, DLRs can be extracted from either pretrained or
custom models.
Designing a model from scratch has the advantage of having a
network adjusted to the problem to solve. However, there may be
problems such as overfitting and class imbalance due to the lack
of available training datasets. To solve these problems, transfer
learning (TL) has been used as an alternative to construct new
models. TL is based on using a DL model pretrained with a natu-
ral image dataset and retrain the network with the desired medical
dataset to fine-tuning the hyperparameters. This approach has
been used in different studies. For example, in automatic polyp
detection in CT colonography [28], detection and classification of
breast cancer in microscope images [29] or pulmonary nodules in
thoracic CT images [30]. TL is usually applied using a pretrained
CNN model such as GoogleNet [31], Visual Geometry Group
Network (VGGNet) [32] or Residual Networks (ResNet) [33]
trained with the data from the ImageNet dataset. Deep features
from pretrained CNNs have achieved higher prediction accuracy
than hand-crafted radiomics signatures and clinical factors [34,
35]. However, TL has been arbitrarily configured in most studies,
and it is not evident whether good performance is obtained until
an evaluation of the model is performed.
DLRs can be extracted using both discriminative and genera-
tive deep learning networks [36]. Discriminative models are
supervised learning and use labels to distinguish classes, such as
distinguishing lesion from healthy tissue. Generative models are
unsupervised learning and extract general image features to gen-
erate new data with the same structure. In consequence, the
­features extracted from generative models can be used as input to
a classifier.
3 How to Extract Radiomic Features from Imaging 73

Furthermore, choosing the optimal architecture to extract the


DLRs is a challenge that remains to be studied. One of the most
popular deep learning-based architecture is CNN. A CNN trans-
forms the input image through convolutional layers, RELU (recti-
fied linear unit) lineal function activation layers and pooling
layers responsible for extracting features from images. In case of
discriminative models, the output of a CNN can be a classification
or regression results or can be used as an input to the rest radiomics
problem pipeline [27]. In generative models, auto-encoders are
often used and are built by an encoder, in which the input image
is codified to a lower dimensional feature space (latent space), and
a decoder, in which this latent space is decodified back to the orig-
inal space. Once the autoencoder is trained to codify-decodify
images by minimizing the reconstruction error, the encoder is
used to, given an input image, generate a lower dimensional fea-
ture space which represents the previously called
DLR. Convolutional auto-encoders (CAEs) are used in generative
radiomic problems to maintain the spatial correlations of the
image [37].

3.4.1 Deep Learning Radiomics and Hand-­


Crafted Radiomics

The main benefit of DLR vs. hand-crafted radiomics (HCRs) is


that no manual segmentation step is required. Manual segmenta-
tions are highly dependent on the reader, which makes them unre-
liable. In addition, eliminating manual segmentation helps save
time from a tedious task for experts and radiologists. However,
DLRs extraction requires large datasets and can have a high com-
putational cost. Moreover, deep features are difficult to interpret
and explain from a clinical perspective.
The combination of HCRs and DLRs features has shown an
increase of the model’s performance [38]. This combination can
be made in two ways: at decision-level or at features-level. The
decision-level approach is based on training the two models
(HCRs and DLRs) separately and combining the outputs by
74 A. Jimenez-Pastor and G. Urbanos-García

v­ oting to obtain the best results [39, 40]. This voting can be soft,
hard or adaptative. Feature-level fusion consists of concatenating
the HCRs and DLRs vectors, applying features reduction to avoid
overfitting and using them as input to a model, obtaining better
results in lung cancer survival models [27] or tumor detection
[38]. Thus, radiomics and deep features have proven to be two
novel technologies that have a high potential for early detection,
prediction of treatment response and prognosis of the disease.
Figure 3.2 shows the different approaches introduced along the
chapter, going from HCR to the different approaches to extract
DLR and how they can be combined.

Fig. 3.2 Pipeline of the different radiomics models. In yellow, prediction


models with HCRs; in blue, prediction models with DLRs; in red, prediction
models combining HCRs and DLRs
3 How to Extract Radiomic Features from Imaging 75

References
1. Chetan MR, Gleeson FV (2021) Radiomics in predicting treatment
response in non-small-cell lung cancer: current status, challenges and
future perspectives. Eur Radiol 31(2):1049–1058. https://doi.org/10.1007/
s00330-020-07141-9. Epub 2020 Aug 18. PMID: 32809167; PMCID:
PMC7813733
2. Du P, Liu X, Shen L, Wu X, Chen J, Chen L, Cao A, Geng D (2023)
Prediction of treatment response in patients with brain metastasis receiv-
ing stereotactic radiosurgery based on pre-treatment multimodal MRI
radiomics and clinical risk factors: A machine learning model. Front
Oncol 13:1114194. https://doi.org/10.3389/fonc.2023.1114194. PMID:
36994193; PMCID: PMC10040663
3. Huynh LM, Hwang Y, Taylor O, Baine MJ (2023) The use of MRI-­
derived radiomic models in prostate cancer risk stratification: a critical
review of contemporary literature. Diagnostics (Basel) 13(6):1128.
13(6):1128. https://doi.org/10.3390/diagnostics13061128. PMID:
36980436; PMCID: PMC10047271
4. Zhang Y, Yang Y, Ning G, Wu X, Yang G, Li Y (2023) Contrast computed
tomography-based radiomics is correlation with COG risk stratification
of neuroblastoma. Abdom Radiol (NY) https://doi.org/10.1007/s00261-
023-03875-4. Epub ahead of print. PMID: 36951989
5. Cui Y, Li Z, Xiang M, Han D, Yin Y, Ma C (2022) Machine learning mod-
els predict overall survival and progression free survival of non-surgical
esophageal cancer patients with chemoradiotherapy based on CT image
radiomics signatures. Radiat Oncol. 17(1):212. https://doi.org/10.1186/
s13014-022-02186-0. PMID: 36575480; PMCID: PMC9795769
6. Chu F, Liu Y, Liu Q, Li W, Jia Z, Wang C, Wang Z, Lu S, Li P, Zhang Y,
Liao Y, Xu M, Yao X, Wang S, Liu C, Zhang H, Wang S, Yan X, Kamel
IR, Sun H, Yang G, Zhang Y, Qu J (2022) Development and validation of
MRI-based radiomics signatures models for prediction of disease-free
survival and overall survival in patients with esophageal squamous cell
carcinoma. Eur Radiol. 32(9):5930–5942. https://doi.org/10.1007/
s00330-022-08776-6. Epub 2022 Apr 6. PMID: 35384460
7. Chen W, Qiao X, Yin S, Zhang X, Xu X (2022) Integrating Radiomics
with Genomics for Non-Small Cell Lung Cancer Survival Analysis. J
Oncol. 2022:5131170. https://doi.org/10.1155/2022/5131170. PMID:
36065309; PMCID: PMC9440821
8. Feng Q, Ding Z (2020) MRI Radiomics Classification and Prediction in
Alzheimer’s Disease and Mild Cognitive Impairment: A Review. Curr
Alzheimer Res. 17(3):297–309. https://doi.org/10.2174/1567205017666
200303105016. PMID: 32124697
9. Pujadas ER, Raisi-Estabragh Z, Szabo L, McCracken C, Morcillo CI,
Campello VM, Martín-Isla C, Atehortua AM, Vago H, Merkely B, Mau-
rovich-Horvat P, Harvey NC, Neubauer S, Petersen SE, Lekadir K 2022
76 A. Jimenez-Pastor and G. Urbanos-García

Prediction of incident cardiovascular events using machine learning and


CMR radiomics. Eur Radiol https://doi.org/10.1007/s00330-022-
09323-z. Epub ahead of print. PMID: 36512045
10. Gabryś HS, Gote-Schniering J, Brunner M, Bogowicz M, Blüthgen C,
Frauenfelder T, Guckenberger M, Maurer B, Tanadini-Lang S 2022
Transferability of radiomic signatures from experimental to human intersti-
tial lung disease. Front Med (Lausanne). 9:988927. https://doi.org/10.3389/
fmed.2022.988927. PMID: 36465941; PMCID: PMC9712180
11. Cho YH, Seo JB, Lee SM, Kim N, Yun J, Hwang JE, Lee JS, Oh YM, Do
Lee S, Loh LC, Ong CK (2021) Radiomics approach for survival predic-
tion in chronic obstructive pulmonary disease. Eur Radiol 31(10):7316–
7324. https://doi.org/10.1007/s00330-021-07747-7. Epub 2021 Apr 13.
PMID: 33847809.
12. Martí-Bonmatí Luis and Alberich-Bayarri, A (2018) Imaging biomarkers
development and clinical integration. Cham: springer international pub-
lishing.
13. Jha AK, Mithun S, Jaiswar V, Sherkhane UB, Purandare NC, Prabhash K,
Rangarajan V, Dekker A, Wee L, Traverso A (2021) Repeatability and
reproducibility study of radiomic features on a phantom and human
cohort. Sci Rep 11(1):2055. https://doi.org/10.1038/s41598-021-81. 526–
8. PMID: 33479392; PMCID: PMC7820018.
14. Liu R, Elhalawani H, Radwan Mohamed AS, Elgohari B, Court L, Zhu H,
Fuller CD (2019) Stability analysis of CT radiomic features with respect
to segmentation variation in oropharyngeal cancer. Clin transl radiat
oncol. 21:11–18. https://doi.org/10.1016/j.ctro.2019.11.005. PMID:
31886423; PMCID: PMC6920497.
15. Leithner D, Schöder H, Haug A, Vargas HA, Gibbs P, Häggström I,
Rausch I, Weber M, Becker AS, Schwartz J, Mayerhoefer ME (2022)
impact of combat harmonization on pet radiomics-based tissue classifica-
tion: a dual-cen2ter PET/MRI and PET/CT Study. J Nucl Med.
63(10):1611–1616. https://doi.org/10.2967/jnumed.121.263102. Epub
2022 Feb 24. PMID: 35210300; PMCID: PMC9536705.
16. Cabini RF, Brero F, Lancia A, Stelitano C, Oneta O, Ballante E, Puppo E,
Mariani M, Alì E, Bartolomeo V, Montesano M, Merizzoli E, Aluia D,
Agustoni F, Stella GM, Sun R, Bianchini L, Deutsch E, Figini S,
Bortolotto C, Preda L, Lascialfari A, Filippi AR (2022). Preliminary
report on harmonization of features extraction process using the ComBat
tool in the multi-center “Blue Sky Radiomics” study on stage III unre-
sectable NSCLC. Insights Imaging 13(1):38. https://doi.org/10.1186/
s13244-022-01171-1. PMID: 35254525; PMCID: PMC8901939
17. Zeineldin RA, Karar ME, Elshaer Z, Coburger J, Wirtz CR, Burgert O,
Mathis-Ullrich F (2022) Explainability of deep neural networks for MRI
analysis of brain tumors. Int J Comput Assist Radiol Surg 17(9):1673–
1683. https://doi.org/10.1007/s11548-022-02619-x. Epub 2022 Apr 23.
PMID: 35460019; PMCID: PMC9463287.
18. Zerunian M, Pucciarelli F, Caruso D, Polici M, Masci B, Guido G, De
Santis D, Polverari D, Principessa D, Benvenga A, Iannicelli E, Laghi A
(2022) Artificial intelligence based image quality enhancement in liver
3 How to Extract Radiomic Features from Imaging 77

MRI: a quantitative and qualitative evaluation. Radiol Med 127(10):1098–


1105. https://doi.org/10.1007/s11547-­022-­01539-­9. Epub 2022 Sep 7.
PMID: 36070066; PMCID: PMC9512724
19. Tixier F, Jaouen V, Hognon C, Gallinato O, Colin T, Visvikis D (2021)
Evaluation of conventional and deep learning based image harmonization
methods in radiomics studies. Phys Med Biol 66(24). https://doi.
org/10.1088/1361-­6560/ac39e5. PMID: 34781280
20. Shafiq-Ul-Hassan M, Zhang GG, Latifi K, Ullah G, Hunt DC,
Balagurunathan Y, Abdalah MA, Schabath MB, Goldgof DG, Mackin D,
Court LE, Gillies RJ, Moros EG (2017) Intrinsic dependencies of CT
radiomic features on voxel size and number of gray levels. Med Phys
44(3):1050–1062. https://doi.org/10.1002/mp.12123. PMID: 28112418;
PMCID: PMC5462462
21. Covert EC, Fitzpatrick K, Mikell J, Kaza RK, Millet JD, Barkmeier D,
Gemmete J, Christensen J, Schipper MJ, Dewaraja YK (2022) Intra- and
inter-operator variability in MRI-based manual segmentation of HCC lesions
and its impact on dosimetry. EJNMMI Phys 9(1):90. https://doi.org/10.1186/
s40658-­022-­00515-­6. PMID: 36542239; PMCID: PMC9772368
22. Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V,
Apte A, Ashrafinia S, Bakas S, Beukinga RJ, Boellaard R, Bogowicz M,
Boldrini L, Buvat I, Cook GJR, Davatzikos C, Depeursinge A, Desseroit
MC, Dinapoli N, Dinh CV, Echegaray S, El Naqa I, Fedorov AY, Gatta R,
Gillies RJ, Goh V, Götz M, Guckenberger M, Ha SM, Hatt M, Isensee F,
Lambin P, Leger S, Leijenaar RTH, Lenkowicz J, Lippert F, Losnegård A,
Maier-Hein KH, Morin O, Müller H, Napel S, Nioche C, Orlhac F, Pati S,
Pfaehler EAG, Rahmim A, Rao AUK, Scherer J, Siddique MM, Sijtsema
NM, Socarras Fernandez J, Spezi E, Steenbakkers RJHM, Tanadini-Lang
S, Thorwarth D, Troost EGC, Upadhaya T, Valentini V, van Dijk LV, van
Griethuysen J, van Velden FHP, Whybra P, Richter C, Löck S (2020) The
image biomarker standardization initiative: standardized quantitative
radiomics for high-throughput image-based phenotyping. Radiology
295(2):328–338. https://doi.org/10.1148/radiol.2020191145. Epub 2020
Mar 10. PMID: 32154773; PMCID: PMC7193906
23. Li W (2015) “Automatic segmentation of liver tumor in CT images with
deep convolutional neural networks”. J Comput Commun 3(11):146
24. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J,
Burren Y et al (2014) “The multimodal brain tumor image segmentation
benchmark (BRATS)”. IEEE Trans Med Imaging 34(10):1993–2024
25. Wang S, Zhou M, Liu Z, Liu Z, Gu D, Zang Y, Dong D, Gevaert O, Tian
J (2017) “Central focused convolutional neural networks: developing a
data-driven model for lung nodule segmentation”. Med Image Anal
40:172–183
26. Huynh BQ, Li H, Giger ML (2016) Digital mammographic tumor clas-
sification using transfer learning from deep convolutional neural net-
works. J Med Imaging 3(3):034501
27. Paul R et al (2016) Deep feature transfer learning in combination with
traditional features predicts survival among patients with lung adenocar-
cinoma. Tomography 2(4):388–395
78 A. Jimenez-Pastor and G. Urbanos-García

28. Summers RM, Johnson CD, Pusanik LM, Malley JD, Youssef, AM, Reed
JE (2001) Automated polyp detection at CT colonography: feasibility
assessment in a human population. Radiology 219(1):51–59
29. Wang Y, Sun L, Ma K, Fang J (2018) Breast cancer microscope image
classification based on CNN with image deformation. In Image Analysis
and Recognition: 15th International Conference, ICIAR 2018, Póvoa de
Varzim, Portugal 27–29;2018. Proceedings 15 (pp. 845–852). Springer
International Publishing
30. Dehmeshki J, Amin H, Valdivieso M, Ye X (2008) Segmentation of pul-
monary nodules in thoracic CT scans: a region growing approach. IEEE
transactions on medical imaging 27(4):467–480.
31. Szegedy C et al (2015) Going deeper with convolutions. En Proceedings
of the IEEE conference on computer vision and pattern recognition.
pp 1–9
32. Simonyan K, Zisserman A (2014) Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556
33. He K et al (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern rec-
ognition. pp 770–778
34. Zhu Y et al (2019) A deep learning radiomics model for preoperative
grading in meningioma. Eur J Radiol 116:128–134
35. Zheng X et al (2020) Deep learning radiomics can predict axillary lymph
node status in early-stage breast cancer. Nat Commun 11:1–9
36. Afshar P et al (2019) From handcrafted to deep-learning-based cancer
radiomics: challenges and opportunities. IEEE Signal Process Mag
36(4):132–160
37. Echaniz O, Graña M (2017) Ongoing work on deep learning for lung
cancer prediction. In: Biomedical applications based on natural and arti-
ficial computing: international work-conference on the interplay between
natural and artificial computation, IWINAC 2017, Corunna, Spain, June
19–23, 2017, Proceedings, Part II. Springer International Publishing,
pp 42–48
38. Fu L et al (2017) Automatic detection of lung nodules: false positive
reduction using convolution neural networks and handcrafted features.
In: Medical imaging 2017: computer-aided diagnosis. SPIE, pp 60–67
39. Hassan AH, Wahed ME, Metwally MS, Atiea MA (2022) A hybrid
approach for classification breast cancer histopathology images. Frontiers
in scientific research and technology 3(1):1–10
40. Liu S et al (2017) Pulmonary nodule classification in lung cancer screen-
ing with three-dimensional convolutional neural networks. J Med
Imaging 4(4):041308
Facts and Needs to Improve
Radiomics Reproducibility 4
P. M. A. van Ooijen , R. Cuocolo ,
and N. M. Sijtsema

4.1 Introduction

Quantitative imaging aims to extract quantifiable features


(radiomics, deep features, and/or imaging biomarkers) to deter-
mine the normal anatomy or disease, tumor characterization, and
chronic condition severity or status. These quantitative measures
can be used to get an objective measurement of a biological pro-
cess or endpoint, to perform early diagnosis, predict patient out-
comes, measure response to therapy, or assist surgery planning.
However, after the initial hype of quantitative imaging it became
clear that not all quantitative information obtained from the imag-
ing data was reliable because of the many dependencies [1].
Currently, in clinical practice, commonly used and accepted
quantifiable features are limited to rather simple measurements
such as size, volume, or histogram analysis (Fig. 4.1).

P. M. A. van Ooijen (*) · N. M. Sijtsema


Department of Radiation Oncology, University of Groningen, University
Medical Center Groningen, Groningen, The Netherlands
e-mail: [email protected]
R. Cuocolo
Department of Medicine, Surgery and Dentistry, University of Salerno,
Baronissi, Italy

© The Author(s), under exclusive license to Springer Nature 79


Switzerland AG 2023
Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image
Processing, Imaging Informatics for Healthcare Professionals,
https://doi.org/10.1007/978-3-031-48446-9_4
80 P. M. A. van Ooijen et al.

Fig. 4.1 Example of quantitative imaging. Lung volumes and areas of low
attenuation are provided in cm3 and in percentage of low attenuation

To become clinically accepted and used, the reproducibility of


radiomic features is a crucial factor to consider since this
­determines the generalizability of the resulting radiomics-based
models. This generalizability is required to be usable in the clini-
cal practice in any hospital around the world and not just at the
center where it was developed. However, so far, the reproducibil-
ity and generalizability of radiomic features and models have
been reported to be limited [2].
The main reason for this low reproducibility is that each step of
the radiomic analysis introduces its own factors that influence the
final output with often limited to no ability to predict the effect of
these factors. This was already demonstrated when moving from
qualitative to quantitative radiology where the quantification
results, among other factors, heavily rely on the properties of the
obtained images which in turn are depending greatly on image
4 Facts and Needs to Improve Radiomics Reproducibility 81

acquisition equipment manufacturer, version, and setup. This reli-


ance on the inherent properties of the acquired images has only
increased with the use of radiomics because of the full depen-
dency on the, sometimes hundreds of, radiomic features derived
directly from these images. Furthermore, the reproducibility of
the radiomics approach and thus the prediction gained from this
approach is, in turn, heavily depending on the reproducibility and
repeatability of the selected individual radiomic features.
High reproducibility indicates that the radiomic features are
stable when obtained from imaging data from different origin
(site, equipment, imaging acquisition protocol, etc.). High repeat-
ability indicates that the radiomic features are stable when
obtained multiple times from the same subject using the same
imaging equipment.
This chapter will focus on defining the confounding factors
that could lead to inaccurate and unreliable radiomic features esti-
mation, explaining how the inconsistencies that exist across CT
scanners and MRI scanners/sequences may decrease the reliabil-
ity of image-derived radiomic features. The needs that underlie
the process of evaluating these factors and reducing their influ-
ence as far as possible will also be covered.

4.2 Factors Influencing Reproducibility

Previous review papers have shown that radiomic features are


sensitive to image acquisition, reconstruction, tumor segmenta-
tion, and interpolation [3–5] (Table 4.1). They have also shown
that the level of sensitivity is depending on the radiomic feature
itself. As an example, textural features are reported to be less
robust than statistical features [4] and reproducibility was claimed
to be higher in first-order features when compared to shape met-
rics and textural features [5]. However, because of the variation in
design and implementation of studies in radiomics and variation
in scanner acquisition protocols used, the reported robustness of
specific radiomic features or feature groups are sometimes contra-
dictory in various reports.
82 P. M. A. van Ooijen et al.

Table 4.1 Overview of the steps in the radiomics process, the related factors
influencing the accuracy and reproducibility, and the possible solutions
reported in literature
Factors influencing
Step accuracy Possible solutions
Acquisition Digital image Protocol standardisation/
pre-processing harmonization
Voxel size/slice Histogram normalization
thickness Interpolation
Reconstruction filters
Contrast enhancement
protocol
Image noise
Patient size
Artifacts
Segmentation Lack of accuracy Fixed protocols
Inter-reader variability Consensus segmentation
Intra-reader variability
Validation
Feature Feature definition IBSI guidelines + digital
extraction Feature parameter phantom
setting Well accepted, open source,
Feature implementation feature implementations
Software used for Delta-radiomics
feature extraction
Model Feature selection IBSI guidelines
construction Machine learning Inter-software comparison
model selection
Cut-off selection
Model validation

4.2.1 Acquisition

One of the major issues with quantitative medical imaging in gen-


eral, and radiomics specifically, is the vast variation in the acquisi-
tion process of the imaging data. This variation is already
introduced when using data from acquisition scanners from dif-
ferent vendors who each have their own specific acquisition and
pre-processing techniques to obtain the best medical image.
Meaning that even though configurable acquisition parameters are
4 Facts and Needs to Improve Radiomics Reproducibility 83

kept the same, the images could still result in different radiomic
features values.
Another problem is the variation in the reconstruction param-
eters as defined by local protocols. These reconstruction parame-
ters have shown their influence to the appearance of the imaging
data to such an extent that they affect quantitative measurements
and radiomic features. Examples of such reconstruction parame-
ters are properties like the in-plane resolution, the slice thickness
and applied reconstruction kernels. Although previous imaging
studies have shown the effects of slice thickness and reconstruc-
tion kernels on computed features, between ~5% and ~25% of
radiomics studies prior to 2020 did not even report their study
imaging protocols. Most of those who did report their imaging
protocols only included the slice thickness information [2].
Additional to the scan protocol, the contrast enhancement pro-
tocol also plays a major role in the presentation of the image. This
includes the injection protocol itself (bolus timing and size) but
also the type of contrast media used (e.g., its iodine concentration
in CT scanning). One must keep in mind that the effect of the
contrast media can extend beyond the targeted area. For example,
with intravenous injection of contrast media into the blood for the
enhancement of the arteries this enhancement can also be appar-
ent beyond the wall of the arteries because of partial volume
effects. For example, Kristanto et al. showed a strong positive cor-
relation between lumen contrast enhancement and mean plaque
HU-value [6].
Scan artifacts also can hamper the determination of quantita-
tive features. These include not only artifacts caused by alien
objects such as metal implants (e.g., pacemakers, dental fillings,
hip/knee prostheses), but also those caused by inaccurate acquisi-
tion (e.g., incorrect triggering/gating or contrast timing) or volun-
tary and involuntary movement of the patient.
Finally, the patients themself also play a role in the determina-
tion of radiomic features. Different size patients or female patients
with different size breasts can—because of the disturbance of
fatty tissue—have different quantitative measures for the same
structure.
84 P. M. A. van Ooijen et al.

4.2.2 Segmentation

Before computing the radiomic features, segmentation of the


region of interest (e.g., tumor or other pathology) is required. This
segmentation can be performed either manually, semi-­
automatically, or fully automatically. These three methods all
have their own characteristics concerning accuracy, intra- and
inter-reader variability, and validation status.
It was shown that manual segmentation affects the reproduc-
ibility of radiomic features to some extent because of the intra-
and inter-reader variability and that those differences were
amplified in textural features [5]. Moving to semi-automatic seg-
mentation has shown to improve features reproducibility when
compared to fully manual segmentation [7].
Automatic segmentation promises more deterministic mod-
els able to provide more consistent outputs. However, these
automatic segmentation methods are nowadays mostly deep
learning based and thus heavily depending on the training data
used. This dependency makes them sensitive to variations in the
input data such as differences in the acquisition protocol used.
Therefore, they should still be verified for their accuracy as
varying degrees of precision in different cases would still nega-
tively influence radiomics robustness. Previous work showed
that even with high agreement among segmentation methods,
subtle differences can significantly affect radiomic features and
their predictive power [8].

4.2.3 Radiomic Features Extraction

In an extensive review of 41 studies, Traverso et al. showed that


reproducibility was highest in texture-based features when com-
pared to shape-based and intensity-based metrics [5]. The most
stable texture-based feature reported was entropy while the least
reproducible features were reported to be coarseness and contrast.
Pfaehler et al. showed that in general texture-based were less
4 Facts and Needs to Improve Radiomics Reproducibility 85

robust than statistical features [4]. However, as stated before,


there is no consensus about this in literature and contradictory
results are reported on the reproducibility of features or feature
groups. One frequently reported reason for this is the incomplete
reporting of radiomics studies in the literature which makes
extremely difficult to replicate previous work and thus leads to
new, (slightly) different, implementations for each new study con-
ducted. Another possible reason for the difference in results from
the same features is the lack of standardized metrics to report fea-
ture repeatability and/or reproducibility.
Besides the difference in the procedures to extract the features
from the imaging data, there is also a variety in the way feature
parameters are configured. These parameters are mostly fine-­
tuned on the local dataset to obtain the best results. However,
because of the lack of standardization there is no guarantee that
the same feature parameters will obtain the same results in a dif-
ferent dataset with (slightly) different properties. Furthermore,
again the exact configuration of the feature parameters is not
always reported in full in radiomics publications making it hard to
accurately replicate earlier studies.
Finally, the feature implementation can also contain slight
variations both in interpretation and exact definition of specific
features as well as in the exact name given to the features.

4.2.4 Model Construction

Feature selection is one of the most important steps in the model


construction. Features should be selected on their reproducibility
and their ability to differentiate between the outcome classes.
However, there is often a strong correlation between different fea-
tures from the same feature group, increasing the risk of falsely
significant associations when adopting multiple features from the
same feature group to construct the predictive machine learning
model.
86 P. M. A. van Ooijen et al.

4.3 How to Improve Reproducibility

One obvious solution to increase reproducibility would be the


standardization of acquisition protocols and segmentation strate-
gies. The standardization of acquisition protocols can be very dif-
ficult to achieve because of the variety in hardware and
implementation by the different vendors. Therefore, to achieve
this an increased effort is required from both the users and the
vendors of image acquisition equipment. Efforts to standardize
acquisition protocols in medical imaging have taken place in
recent years, especially through the proposal of Reporting and
Data Systems (RADS). These are typically tailored to specific
organs or pathologies, such as prostate (PI-RADS) or bladder (VI-­
RADS) cancer imaging, and often include technical requirements
for image acquisition [9, 10]. However, it should be noted that
adherence to these acquisition guidelines is still far from ideal in
clinical practice [11, 12].
Standardization of segmentation strategies could be obtained
in manual segmentation by providing strict guidelines on how to
perform the segmentation and how to deal with specific situations
that could occur [13]. Furthermore, contouring consensus could
be implemented to reduce intra- and inter-reader variability. When
moving to semi- or fully automatic segmentation, standardization
becomes challenging because in that case not only the human
reader but also the specific implementation of the software plays
a major role in the decisions made. However, automatic or semi-­
automatic segmentation is also reported to reduce inter-reader
variability and increase reproducibility. Both can also be com-
bined using automatic segmentation with human oversight by
supervision of computerized results and the ability to adjust it
when it does not comply to pre-defined rules.
When the image acquisition is done, pre-processing steps can
be taken to prepare the images to allow a higher level of reproduc-
ibility. Image normalization and interpolation are two basic steps
to obtain comparable results from radiomic features derived from
data from different origin. Normalization ensures that the histo-
gram distribution is similar in data with unit-less voxels. This can
4 Facts and Needs to Improve Radiomics Reproducibility 87

be obtained through different means, including gray level z-score


normalization or even discretization with a fixed bin number.
While generally advisable to implement, it is also true that the
effect of normalization and discretization on texture-based feature
reproducibility may vary based on the feature type and use case
[14–16]. Interpolation makes sure that the voxel size is the same
for different datasets when they are acquired, for example, with
different slice thickness. With interpolation or even when resam-
pling data from a single scanner dataset, the goal should be to
obtain high resolution isotropic voxels with the same dimensions
in all three directions. This is important to obtain rotationally
invariant texture matrices.
The reproducibility of radiomics can also be improved by
reducing the noise in the imaging data before feeding them into
the radiomics model. Noise can be reduced by applying filters
(e.g., Laplacian or Gaussian filters in PyRadiomics) or wavelet
decomposition to the images. While such conventional noise
reduction methods are commonly used, a more novel methodol-
ogy using deep learning to decrease the noise in, for example,
(low dose) computed tomography, is gaining interest. Chen et al.
demonstrated the use of a cycle GAN based approach to reduce
image noise and they showed increased survival prediction AUC
from 0.52 to 0.59 on a simulated noise CT dataset and 0.58 on the
RIDER dataset [17]. They concluded that cycle GANs trained to
reduce noise in CT can improve radiomics reproducibility and
performance in low-dose CT.
The implementation of delta radiomic features that do not pro-
vide information about a single time point but about the radiomic
feature change over time in repeated scanning is also reported to
increase the muti-site reproducibility of features in a phantom
study [18].
Another possible solution to tackle the diversity in acquisition
of imaging data is deep learning based harmonization. Exploiting
the generative capabilities of deep learning networks image har-
monization can be achieved to improve the accuracy of deep
learning predictors [19] and has also shown to be helpful to
increase reproducibility of radiomic features [17] and outper-
forming more conventional, histogram based, techniques [20].
88 P. M. A. van Ooijen et al.

4.3.1 Guidelines and Checklists

A more general development that could benefit radiomics is the


Quantitative Imaging Biomarkers Alliance (QIBA) initiative that
aims to standardize quantitative imaging. Adhering to these stan-
dardization guidelines would result in a more consistent represen-
tation of the imaging data for a specific application. Although the
aim of QIBA is to improve quantitative imaging, in general, this
will also have a direct positive effect on the reproducibility of
radiomic features. The Image Biomarker Standardization Initiative
(IBSI) is also worth mentioning here [21]. IBSI compliance is
certainly positive but still requires usage of comparable parame-
ters for the extraction to ensure robustness [22].
A reporting checklist for scientific radiomics papers was pro-
posed by Pfaehler et al. [4]. The main goal of their reporting
checklist was to evaluate the feasibility to reproduce a reported
study. Similar work was advocated earlier by Traverso et al. who
propose that radiomics software should be benchmarked on pub-
licly available datasets [5]. However, public datasets are not syn-
onym to perfect. These benchmarking datasets therefore should
contain data from different institutions to guarantee maximum
heterogeneity and be audited externally to ensure reliability.
Furthermore, use of public benchmarks needs to be carefully
implemented because of the risk of overfitting though iterative
testing. A solution for this could be to only provide benchmark
datasets with “hidden” labels and include automated feedback on
the results. Traveso et al. also propose a standard reporting of the
benchmark study [5].
Recently, a quality scoring tool has been developed to assess
and improve research quality of radiomics studies:
METhodological RadiomICs Score (METRICS). It is based on a
large international panel and a modified Delphi protocol with a
conditional format to cover methodological variations. It provides
a well-­constructed framework for the key methodological con-
cepts to assess the quality of radiomic research papers [23].
4 Facts and Needs to Improve Radiomics Reproducibility 89

4.3.2 Code and Development Platforms

To increase reproducibility in the implementation of radiomics,


standardized public domain code or development platforms could
also provide a means to avoid variation caused by factors such as
feature implementation and model construction. Examples of
such code bases and development platforms are radiomics exten-
sions on the Computational Environment for Radiological
Research (CERR) [24], International Radiomics Platform (IRP)
[25], PyRadiomics, and LIFEx. The challenge here is in the
choices since there are different solutions that provide different
capabilities and a varying level of programming knowledge, thus
also implying less reproducibility because of lack of standardiza-
tion between the different code bases and platforms.
Pyradiomics is an open-source Python package for the extrac-
tion of radiomics data from medical images. It provides a varia-
tion of feature groups that can be extracted namely first order,
shape, GLCM, GLRLM, and GLSZM features. Pydicom is easily
imported into python code providing all necessary procedures. An
example Jupyter Notebook can be found here: https://www.
radiomics.io/pyradiomicsnotebook.html. The downside of
PyRadiomics is that it lacks DICOM-RT input of anatomical
structures [24]. A more practical downside of PyRadiomics is that
it obviously requires Python programming skills. To overcome
this, SlicerRadiomics was developed. SlicerRadiomics is an
extension to 3D Slicer that encapsulates the PyRadiomics library,
which in turn implements calculation of a variety of radiomic fea-
tures. SlicerRadiomics can be obtained from GITHUB and used
in 3D Slicer by building it from the source provided or installed
directly from 3DSlicer by searching for “radiomics” in the
Extensions Manager. Advantage is that using the radiomics exten-
sion allows you to calculate radiomic features on a segmentation
in 3DSlicer without requiring any programming knowledge.
The radiomics extension on CERR is based on MATLAB [24].
It provides batch calculation and visualization of radiomic fea-
90 P. M. A. van Ooijen et al.

tures using a tailored data structure for radiomics metadata. A test


suite is also provided to allow comparison with radiomic features
computed with other platforms such as PyRadiomics.
LIFEx is an end-user freeware that allows to obtain a broad
range of conventional, textural, and shape features from medical
imaging data (www.lifexsoft.org).
Also, other implementations have been released with the aim
of harmonizing radiomic features. One example is the statistical
method ComBat, originally developed for genomics but adapted
to correct variations in radiomics measurements by Orlhac et al.
[26]. ComBat does not require modification of images but allows
for harmonization of radiomic features based on their distribution
and knowledge of covariates. It is claimed to be a data driven
approach that enables pooling of radiomic features from different
CT protocols. Although the ComBat method shows promises and
meaningful improvements in reproducibility of radiomic features,
it did perform worse in patients than in phantom images. More
work is needed to improve the method and extend it to other
patient cohorts than lung cancer.

4.4 Recommendations for Achieving Clinical


Adoption of Radiomics

Previous reviews have shown that the assessment of repeatability


and reproducibility of radiomic features is mainly performed in a
limited number of pathologies, with most frequent pathologies
being non-small cell lung cancer and oropharyngeal cancer [5].
Furthermore, they have shown that detailed information about the
radiomics methodology is often lacking or incomplete.
In reported papers the radiomics methodology is also applied
to single site databases resulting in possible overfitting of the pre-
diction model to the local data and thus no guaranteed high repro-
ducibility when applied to data from a different origin. One of the
main tasks for the current radiomics development is therefore the
ability to perform validation by replication of results in an exter-
nal dataset. This requires extensive and complete reporting of the
4 Facts and Needs to Improve Radiomics Reproducibility 91

radiomics development and implementation including a detailed


description of the patient cohort and the image acquisition and
reconstruction protocols used. Furthermore, the developed soft-
ware and datasets used in the model development should be made
publicly available.
In the future for radiomics to become clinically useful, an
extensive quality control must be implemented in the radiomics
process to avoid problems caused by data and model drift. To
detect data drift, quality control should be performed on the data
acquired to ensure it is complying to the expectations of the
radiomics evaluation and on the segmentation performed, espe-
cially in the case of automatic segmentation. For the detection of
model drift, quality control should be implemented on the
radiomics predictions themselves. In case of changes in imaging
equipment, in the image acquisition and reconstruction protocols
or changes in segmentation protocols or automatic segmentation
tools, a more extensive model validation and a model update
could be necessary.

References
1. Steiger P, Sood R (2019) How can radiomics be consistently applied
across imagers and institutes. Radiology 291:60–61
2. Zhao B (2021) Understanding sources of variation to improve the repro-
ducibility of radiomics. Front Oncol 11:633176
3. Lennartz S, O’Shea A, Parakh A, Persigehl T, Baessler B, Kambadakone
A (2022) Robustness of dual-energy CT-derived radiomic features across
three different scanner types. Eur Radiol 32:1959–1970
4. Pfaehler E, Zhovannik I, Wei L, Boellaard R, Dekker A, Monshouwer R,
El Naqa I, Bussink J, Gillies R, Wee L, Traverso A (2021) A systematic
review and quality of reporting checklist for repeatability and reproduc-
ibility of radiomic features. Phys Imaging Radiat Oncol 20:69–75
5. Traverso A, Wee L, Dekker A, Gillies R (2018) Repeatability and repro-
ducibility of radiomic features: a systematic review. Int J Radiat Oncol
Biol Phys 102(4):1143–1158
6. Kristanto W, van Ooijen PMA, Greuter MJW, Groen JM, Vliegenthart R,
Oudkerk M (2013) Non-calcified coronary atherosclerotic plaque visual-
ization on CT: effects of contrast-enhancement and lipid-content frac-
tions. Int J Cardiovasc Imaging 29:1137–1148
92 P. M. A. van Ooijen et al.

7. Parmar C, Rios Velazquez E, Leijenaar R et al (2014) Robust radiomics


feature quantification using semiautomatic volumetric segmentation.
PLoS One 9:e102107
8. Poirot MG, Caan MWA, Ruhe HG, Bjornerug A, Groote I, Reneman L,
Marquering HA (2022) Robustness of radiomics to variations in segmen-
tation methods in multimodal brain MRI. Sci Rep 12:16712
9. Panebianco V, Narumi Y, Altun E et al (2018) Multiparametric magnetic
resonance imaging for bladder cancer: development of VI-RADS (vesical
imaging-reporting and data system). Eur Urol 74(3):294–306
10. Turkbey B, Rosenkrantz AB, Haider MA et al (2019) Prostate imaging
reporting and data system version 2.1: 2019 update of prostate imaging
reporting and data system version 2. Eur Urol 76(3):340–351
11. Cuocolo R, Stanzione A, Ponsiglione A et al (2019) Prostate MRI techni-
cal parameters standardization: a systematic review on adherence to PI-­
RADSv2 acquisition protocol. Eur J Radiol 120:108662
12. Esses SJ, Taneja SS, Rosenkrantz AB (2018) Imaging facilities’ adher-
ence to PI-RADS v2 minimum technical standards for the performance of
prostate MRI. Acad Radiol 25(2):188–195
13. deSouza NM, van der Lugt A, Deroose CM, Alberich-Bayarri A, Bidaut
L, Fournier L, Costaridou L, Oprea-Lager DE, Kotter E, Smits M,
Mayerhoefer ME, Boellaard R, Caroli A, de Geus-Oei LF, Kunz WG, Oei
EH, Lecouvet F, Franca M, Loewe C, Lopci E, Caramella C, Persson A,
Golay X, Dewey M, O’Connor JPB, deGraaf P, Gatidis S, Zahlmann G,
European Society of Radiology, European Organisation for Research and
Treatment of Cancer (2022) Standardised lesion segmentation for imag-
ing biomarker quantitation: a consensus recommendation from ESR and
EORTC. Insights Imaging 13(1):159. https://doi.org/10.1186/s13244-­
022-­01287-­4. PMID: 36194301; PMCID: PMC9532485
14. Duron L, Balvay D, Vande Perre S et al (2019) Gray-level discretization
impacts reproducible MRI radiomics texture features. PLoS One
14(3):e0213459
15. Kociolek M, Strzelecki M, Obuchowicz R (2020) Does image normaliza-
tion and intensity resolution impact texture classification? Comput Med
Imaging Graph 81:101716
16. Schwier M, van Griethuysen J, Vangel MG et al (2019) Repeatability of
multiparametric prostate MRI radiomics features. Sci Rep 9(1):9441
17. Chen J, Wee L, Dekker A, Bermejo I (2022) Improving reproducibility
and performance of radiomics in low-dose CT using cycle GAN. J Appl
Clin Med Phys 23:e13739
18. Nardone V, Reginelli A, Guida C, Belfiore MP, Biondi M, Mormile M
et al (2020) Delta-radiomics increases multicentre reproducibility: a
phantom study. Med Oncol 37(5):38
19. Bashyam VM, Doshi J, Erus G, Srinivasan D et al (2022) Deep Generative
Medical Image Harmonization for improving cross-site generalization in
deep learning predictors. J Magn Reson Imaging 55(3):908–916
4 Facts and Needs to Improve Radiomics Reproducibility 93

20. Tixier F, Jaouen V, Hognon C, Gallinato O, Colin T, Visvikis D (2021)


Evaluation of conventional and deep learning based image harmonization
methods in radiomics studies. Phys Med Biol 66(24):ac39e5
21. Zwanenburg A, Vallieres M, Abdalah MA, Aerts HJWL, Andrearczyk V,
Apte A, Ashrafinia S, Bakas S, Beukinga RJ, Boellaard R, Bogowicz M,
Boldrini L, Buvat I, Cook GJR, Davatzikos C, Depeursinge A, Desseroit
M-C, Dinapoli N, Viet Dinh C, Echguray S et al (2020) The image bio-
marker standardization initiative: standardized quantitative radiomics for
high-throughput image-based phenotyping. Radiology 295:328–338
22. Fornacon-Wood I, Mistry H, Ackermann CJ, Blackhall F, McPartiin A,
Faivre-Finn C, Price GJ, O’Connor JPB (2020) Reliability and prognostic
value of radiomic features are highly dependent on choice of feature
extraction platform. Eur Radiol 30:6241–6250
23. Kocak B, Akinci d’Antonoli T, Mercaldo N, Alberich-Bavarri A, Baessler
B et al (2024) METhodological RadiomICs Score (METRICS): a quality
scoring tool for radiomics research endorsed by EuSoMII. Insights Imag-
ing 15:8. https://doi.org/10.1186/s13244-023-01572-w
24. Apte AP, Iyer A, Crispin-Ortuzar M, Pandya R, van Dijk LV, Spezi E,
Thor M, Um H, Veeraraghavan H, Oh JH, Shukla-Dave A, Deasy JO
(2018) Technical Note: extension of CERR for computational radiomics:
a comprehensive MATLAB platform for reproducible radiomics research.
Med Phys 45(8):3712–3720
25. Overhoff D, Kohlmann P, Frydrychowicz A, Gatidis S, Loewe C, Moltz J,
Kuhnigk J-M, Gutberlet M, Winter H, Volker M, Hahn H, Schoenberg SO
(2021) The international radiomics platform—an initiative of the German
and Austrian radiological societies—first application examples. Rofo
193(03):276–288
26. Orlhac F, Frouin F, Nioche C, Ayache N, Buvat I (2019) Validation of a
method to compensate multicenter effects affecting CT radiomics.
Radiology 291:53–59
Data Harmonization
to Address 5
the Non-­biological
Variances in Radiomic
Studies

Y. Nan , X. Xing , and G. Yang

5.1 Non-biological Variances in Radiomic


Analysis

To ensure the reliability and reproducibility of radiomics models,


it is essential to establish strict standards for data collection and
pre-processing. This means that the imaging data needs to be col-
lected and processed in the same way for all patients, to ensure
that the radiomic features are accurate and comparable across dif-
ferent samples.
However, the medical imaging data obtained from different
scanners or hospitals can be significantly various under different
image acquisition protocols (such as slice thickness, spatial resolu-
tion, and reconstruction kernels), which results in immense vari-
ability in the extracted radiomic features. For example, even when
imaging the same lung tumour region, CT scans acquired from

Y. Nan · X. Xing · G. Yang (*)


Bioengineering Department and Imperial-X, National Heart and Lung
Institute, Imperial College London, London, UK
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature 95


Switzerland AG 2023
Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image
Processing, Imaging Informatics for Healthcare Professionals,
https://doi.org/10.1007/978-3-031-48446-9_5
96 Y. Nan et al.

Fig. 5.1 Illustration of how non-biological variances exists and how it


affects radiomic studies (RoI region of interest)

different imaging devices (SIEMENS and PHILIPS in Fig. 5.1,


respectively) present different visual patterns, which can lead to
different texture-based features and prognostication results.
Additionally, there can be considerable variations between two
repeated scans for the same patient due to manual operations and
the patient’s status [1]. To minimize motion artifacts in the images,
patients are often asked to hold their breath when undergoing CT
scans. However, movement during the scanning process can still
occur, resulting in blurred or distorted images that are difficult to
interpret. To address these issues, there is an urgent need to imple-
ment an harmonization algorithm that can harmonize data with dif-
ferent quality and acquisition protocols (Fig. 5.2).
We refer to the variability caused by non-biological factors,
such as acquisition devices, acquisition protocols, and laboratory
preparations. These variances may reduce the reproducibility and
generalizability of radiomic models and lead to incorrect or unre-
liable conclusions. Here, we summarized these non-biological
factors into three main types:

1. Acquisition devices (Hardware): The variation of acquisition


devices significantly contributes to the variations in multicen-
tre data, especially in CT and MRI scans. Variations in detector
systems of different vendors, coil sensitivity, positional and
physiological differences during acquisition, as well as mag-
5 Data Harmonization to Address the Non-biological Variances… 97

Fig. 5.2 Three main types of non-biological variances

netic field variations in MRI, are some of the factors that cause
differences in these data. Such variations can have serious
implications for the reproducibility of radiomic features, even
when a fixed acquisition protocol is used for different brands
of scanners.
For example, researchers have investigated the reproduc-
ibility of radiomic features on different scanners despite using
the same acquisition protocol and found substantial differ-
ences [2]. The reproducibility of radiomic features ranged
from 16% to 85%, indicating that even with a fixed protocol,
there are still significant differences in the images produced by
different scanners. Similarly, Sunderland et al. [3] found a
large variation in the standard uptake value (SUV) of different
brands of scanners. They discovered that newer scanners had a
much higher maximum SUV compared to older ones, indicat-
ing that the heterogeneity of acquisition devices could signifi-
cantly impact the interpretation of imaging data.
These findings suggest that the variability in acquisition
devices can significantly affect the reproducibility and reliabil-
ity of imaging data, which can have serious implications for
clinical decision-making. To address this issue, there is a need
for data harmonization strategies to ensure that imaging data is
98 Y. Nan et al.

consistent across different devices and vendors. This would


help to increase the reproducibility of radiomic features and
enhance the reliability of imaging data in clinical practice.
2. Patient status: Patient status could also affect the image qual-
ity of CT and MRI scans. Patient positional changes during the
image acquisition could cause variation or artifacts that result
in reduced image quality and inconsistencies between scans.
For instance, if a patient moves during a CT or MRI scan,
image distortions may occur, resulting in not only poor image
quality and inconsistent results but also artifacts that could
potentially be misinterpreted as pathological changes. This
can be especially problematic in functional imaging studies
such as PET and functional MRI (fMRI), where changes in
patient motion can affect brain activation patterns or tracer
uptake, significantly impacting the reproducibility and consis-
tency of the results.
Therefore, patient status, including patient positional
changes and motion artifacts, should be carefully monitored
and minimized during imaging procedures to improve the
reproducibility and accuracy of the results. Techniques such as
immobilization devices, patient coaching, and motion correc-
tion software can help minimize these effects and improve the
quality and consistency of medical imaging in some extent,
with additional costs and the patients’ cooperation.
3. Acquisition protocols: Variations in acquisition protocols are
a significant cause of cross-cohort variability, which can have
a significant impact on the reproducibility of radiomic fea-
tures. These acquisition protocols include scanning parameters
such as voltage, tube current, field of view, slice thickness,
microns per pixel, as well as reconstruction approaches such
as different reconstruction kernels. Details of factors that lead
to non-biological variances in CT and MRI scans are given in
Table 5.1.

To investigate the reproducibility of radiomic features, several


reproducibility studies have been conducted using test-reset
experiments. These experiments determine the correlation coeffi-
cient or error between two features, where a high correlation coef-
5 Data Harmonization to Address the Non-biological Variances… 99

Table 5.1 Factors that lead to non-biological variances in CT and MRI


imaging
Acquisition
parameters Impact on CT imaging Impact on MRI
Voltage Image contrast and noise Signal-to-noise ratio
levels
Tube current Image noise and N/A
radiation dose
Field of view Image resolution and anatomy coverage
Slice thickness Image resolution and anatomical detail
Pixel/voxel size Image resolution and spatial detail
Kernels Image sharpness and N/A
texture
Magnetic field a
N/A Signal-to-noise ratio
strength
Pulse sequence N/A Contrast and spatial
resolution
Echo time N/A Contrast and signal-to-­
noise ratio
Repetition time N/A Contrast and temporal
resolution
Radiation dose Image quality N/A
a
N/A means “not applicable” as the parameter does not apply to the given
imaging modality

ficient or low error indicates good reproducibility/repeatability.


For instance, a radiomic feature is considered r­eproducible/
repeatable when the correlation coefficient score between features
extracted from two comparison scans is greater than 0.90 [4].
We summarized the previous reproducibility studies given in
Table 5.2, which further demonstrate that the scanning parameters
significantly affect radiomic features, making statistical analysis
difficult. As illustrated in Table 5.2, the reproducibility of radiomic
features ranges from 8.0 to 63.3%, showing a weak stability and
capacity of unharmonized image-derived radiomic features.
Among these variables, the reconstruction kernel during CT scans
has a distinct effect on radiomics reproducibility, which cannot be
eliminated by unifying the reconstruction kernels as different ker-
nels are used to meet different clinical demands. For example,
when using different kernels (soft and sharp, respectively) during
100 Y. Nan et al.

Table 5.2 Summary of the reproducibility/repeatability studies


Reference Reproducibility Definition Variables Object Modality
Jha et al. [5], 30.7% ICC > 0.90 Slice sickness Phantoms CT
2021 (332/1080)
Emaminejad 8.0% CCC > 0.90 R-Kernel Human CT
et al. [6], (18/226)
2021 7.5% CCC > 0.90 Radiation Human CT
(17/226) dose
Kim et al. [7], 11.0% CCC > 0.85 Acceleration Human MRI
2021 (112/1020) factors
Saeedi et al. 20.5% CoV < 5% Tube voltage Phantoms CT
[8], 2019 (8/39)
30% CoV < 5% Tube current Phantoms CT
(13/39)
Meyer et al. 20.8% R2 > 0.95 Radiation Human CT
[9], 2019 (22/106) dose
52.8% R2 > 0.95 R-Kernel Human CT
(56/106)
39.6% R2 > 0.95 R-Kernel Human CT
(42/106)
12.3% R2 > 0.95 Slice sickness Human CT
(13/106)
Perrin et al. 24.8% CCC > 0.90 Injection Human CECT
[10], 2018 (63/254) rates
13.4% CCC > 0.90 Resolution Human CECT
(34/254)
Midya et al. 11.7% CCC > 0.90 Tube current Phantoms CT
[11], 2018 (29/248)
19.8% CCC > 0.90 Noise Phantoms CT
(49/248)
63.3% CCC > 0.90 R-Kernel Human CT
(157/248)
Altazi et al. 21.5% L1 < 25% R-Kernel Human PET
[12], 2017 (17/79)
Zhao et al. 11.2% CCC > 0.90 R-Kernel Human CT
[13], 2016 (10/89)
Choe et al. 15.2% CCC > 0.85 R-Kernel Human CT
[14], 2019 (107/702)
R2: R-squared coefficient, CCC concordance correlation coefficient, ICC
intraclass correlation coefficient, CoV coefficient of variation, CECT con-
secutive contrast-enhanced computed tomography, PET positron emission
tomography, L1 mean difference score, R-Kernel reconstruction kernel
5 Data Harmonization to Address the Non-biological Variances… 101

the reconstruction, only 15.2% of radiomic features are reproduc-


ible [14]. While strict standard protocols can reduce ­non-­biomedical
variances, radiologists often require specific acquisition protocols
to ensure personalized, centre-based image quality consider-
ations. For instance, radiologists may adjust the spacings (voxel
sizes) on a case-by-case basis to assist the diagnosis. This hetero-
geneity in acquisition protocols is therefore unavoidable and
requires a general solution to harmonize these data.

5.2 Data Harmonization

5.2.1 Data Harmonization in Radiomics Studies

Data harmonization refers to the process of integrating data from


multiple sources to facilitate analysis and comparison. Collecting
data using different methods, storing it in different formats, or
measuring it on different scales can make it challenging to inte-
grate and compare data across sources. To overcome these chal-
lenges, data harmonization methods are used to standardize,
match, transform, aggregate, or clean the data. The choice of
method depends on the nature of the data being harmonized and
the objectives of the project, with each approach having its own
advantages and disadvantages. Factors such as data quality, avail-
able resources, and research topics are considered when choosing
the appropriate method.
Standardization involves creating a common set of data ele-
ments and ensuring that they are consistently defined and mea-
sured across different sources. This may involve developing a data
dictionary that includes definitions of each data element, as well
as instructions for how to collect and record data in a consistent
manner. Standardization is often used in cases where multiple
data sources need to be integrated, such as in the case of clinical
trials where data may be collected from multiple sites.
Matching involves identifying corresponding data elements in
different sources and reconciling any differences between them.
102 Y. Nan et al.

Matching may involve comparing data elements based on a set of


pre-defined criteria, such as matching on patient ID or other
demographic information. Once corresponding data elements
have been identified, any differences between them may be
resolved through manual review or automated algorithms.
Transformation involves converting data from one format or
measurement scale to another, so that it can be integrated with
other data. This may involve converting data from one type of unit
(e.g., pounds to kilograms) or from one type of measurement
(e.g., self-reported data to objectively measured data).
Transformation may also involve data normalization, which
involves adjusting data values so that they are on a common scale,
often by dividing each value by a baseline value or standard devi-
ation.
Aggregation involves combining data from multiple sources
into a single dataset, often by creating summary statistics or
aggregating individual records. Aggregation may involve summa-
rizing data by specific categories, such as age or geographic loca-
tion, or by creating overall summary statistics, such as means or
medians.
Cleaning involves identifying and correcting errors or incon-
sistencies in the data, such as misspellings, duplicate entries, or
outliers. Data cleaning may involve manual review or automated
algorithms to identify and correct errors. Once errors have been
identified and corrected, the data can be harmonized across mul-
tiple sources.
It is of note that many data harmonization approaches rely on
pre-defined criteria to guide the process. For instance, standard-
ization usually involves creating a set of pre-defined data elements
with clear definitions and measurement scales. To address the
non-biological variances in radiomics study, smart harmonization
approaches are carried out to integrate image data.

5.2.2 Automatic Harmonization Schemes

In radiomics studies, large scale analysis on multicentre datasets


has become increasingly important for improving the generaliz-
5 Data Harmonization to Address the Non-biological Variances… 103

ability of radiomics models and for gaining more insight into


complex disease processes. To increase the efficiency of data har-
monization and to alleviate human workload, automatic data har-
monization has been proposed. There are two main schemes of
smart data harmonization: samplewise and featurewise, respec-
tively.
Featurewise harmonization (scheme shown in blue colour in
Fig. 5.3) aims to reduce the bias of extracted features by fusing
the extracted features to eliminate cohort variances. In this work-
flow, models are developed separately regarding of the number of
data sources (each model corresponds to a data source). Then,
multicentre features are extracted following the same feature
extraction criteria, followed by the featurewise harmonization
techniques to eliminate the non-biological variances. While this
approach could improve data consistency and comparability, it
can be more complex than samplewise harmonization, as it often
requires multiple models to extract features of interest.
Additionally, when the number of samples in each cohort is small,
it can be challenging to develop corresponding models due to lim-
ited training samples.
Samplewise harmonization (scheme shown in orange colour
in Fig. 5.3) is typically performed before modelling which
involves reducing the cohort variance of all training samples.
Normally, different sources of datasets are first pre-processed
under the same criteria, followed by the harmonization model to
merge all these data together. This process is achieved through

Fig. 5.3 Two typical ways (featurewise and samplewise) of automatic data
harmonization
104 Y. Nan et al.

various techniques, such as image processing, synthesis, and


invariant feature learning. By harmonizing the data in this way,
multicentre samples can be fused into a single dataset, allowing
for a more robust and accurate model. Based on these harmonized
data, a single model is trained to extract feature of interest for
clinical analysis. It is also known as image-domain harmoniza-
tion.
Task-driven harmonization is different from the samplewise
and featurewise harmonization approaches which harmonize the
data or the feature for further analysis. The task-driven harmoni-
zation is designed to learn cohort-invariant features from multiple
data sources, then applies these features to the primary task (e.g.,
segmentation, classification, regression). The concept behind
task-driven harmonization approaches is that if a sparse diction-
ary/mapping can be constructed from the data of various cohorts,
then these learned representations will not contain intra/inter
cohort variability. It focuses more on the development of robust
computational models instead of harmonizing raw data or
extracted features.

5.2.3 Automatic Harmonization Approaches

In this section, we summarize harmonization approaches into dif-


ferent groups based on the techniques behind them, including
location and scale, clustering, matching, synthesis, and invariant
feature learning. Among these approaches, location and scale and
clustering can be both used for samplewise and featurewise, while
matching and synthesis can be only implemented to samplewise
harmonization (Fig. 5.4).

 ocation and Scale Methods


L
Location and scale methods are statistical techniques utilized to
estimate the distribution of a dataset. These methods are com-
monly used in descriptive statistics and can be used to summarize
the distribution of a dataset in a few key measures.
5 Data Harmonization to Address the Non-biological Variances… 105

Fig. 5.4 Harmonization approaches. Blue blocks represent methods that can
be used for samplewise harmonization only, while the orange blocks corre-
spond to methods that can be used for two harmonization schemes. The yel-
low block indicates invariant representation learning approach which is
mainly used to develop harmonized models

The location method is used to describe the centre of the data


distribution. Among different types of estimation approaches,
mean or average value is the commonly used measurement.
Another helpful measure of location is the median value, which is
the middle value when the samples are arranged in order. The
median is a useful measure of location when the dataset includes
some outliers, which may skew the mean value.
The scale method is used to describe the variation of the data
distribution. The common measure of scale is the standard devia-
tion, which measures how much the values in the dataset deviate
from the mean value. A low standard deviation indicates that the
values are tightly clustered around the mean, while a high stan-
dard deviation indicates that the values are more spread out. In
addition to the standard deviation, range and interquartile range
(the difference between the 75th and 25th percentiles), are also
used as measurements.
Based on these location and scale parameters, data collected
from different site is aligned towards the same location and scale
value. One intuitive way is normalization (also called standardiza-
tion), which rescales the samples to same ranges. Given the mean
μ and standard deviation σ of sample x, the commonly used
z-score normalization and max–min normalization can be given
by
x-m
x¢ = , (5.1)
s
106 Y. Nan et al.

x - min ( x )
x¢ = , (5.2)
max ( x ) - min ( x )
respectively.
In addition to normalization/standardization, the ComBat
algorithm, as described in [15, 16], was proposed for featurewise
harmonization. For instance, researchers used ComBat to harmo-
nize the image-derived features from multicentre MRI datasets
[16]. It utilized empirical Bayes shrinkage to accurately estimate
the mean and variance for each batch of data. These estimates
were then used to harmonize the data across cohorts. The first step
is standardizing the data to ensure similar overall mean and vari-
ance, followed by the empirical Bayes estimation with parametric
empirical priors. The resulting adjusted bias estimators were then
used in the location-scale model-based functions to harmonize the
data.
Another type of location and scale methods are based on the
alignment of data distributions, using cumulative distribution
functions or pdfs. For instance, Wrobel et al. [17] proposed a
method to harmonize MRI multicentre data, which aligns the
voxel intensities of the source dataset with the target cumulative
distribution functions by estimating a non-linear intensity trans-
formation. In another study [18], the empirical density was esti-
mated and the distance between probability density functions was
calculated. Common features from different datasets were selected
first, and then their probability density functions were estimated
to determine the most suitable matching offsets. The harmonized
data was obtained by subtracting the estimated offsets from the
source cohorts.

Clustering Methods
Clustering methods are commonly used in data harmonization to
group data samples based on their distances. In clustering, dis-
tance measures the similarity or dissimilarity between pairs of
data samples. The distance between two observations is typically
calculated based on the values of their attributes or features. The
aim of clustering is to create subsets or clusters of samples that are
5 Data Harmonization to Address the Non-biological Variances… 107

Fig. 5.5 Steps of using clustering methods for data harmonization

more like each other than to those in other clusters. This grouping
can help to harmonize the data by creating a more uniform repre-
sentation of the samples that can be used for subsequent analysis.
Figure 5.5 illustrates the steps when using clustering methods for
data harmonization.
Several factors, such as the choice of clustering algorithm, the
selection of distance metrics, the pre-processing of data, and the
determination of the number of clusters, can influence the quality
of harmonization obtained through clustering methods.
Clustering algorithm: The selection of clustering algorithm
can affect the quality of harmonization. Different algorithms have
different assumptions and properties and may perform different
on various types of data. For instance, k-means assumes spherical
clusters and is sensitive to initialization, while hierarchical clus-
tering can handle non-spherical clusters but requires more compu-
tational costs.
Distance metrics: The choice of distance metric can also
impact the quality of harmonization. Different distance metrics
lead to different clustering results, as the similarity or dissimilar-
ity between samples is calculated in different ways.
108 Y. Nan et al.

Data pre-processing: Pre-processing steps such as scaling,


normalization, or handling missing values can impact the distance
between observations, and therefore the clustering results. It is
important to carefully apply appropriate pre-processing steps
before clustering.
The number of clusters: Define an appropriate number of
clusters can be challenging, this can be achieved by introducing
the prior knowledge (when the number of data sources is known)
or different assessment approaches such as silhouette scores and
gap statistics. Choosing inappropriate number of clusters can lead
to poor harmonization performance.
Here we introduce some clustering methods for harmoniza-
tion.
Nearest neighbours methods: these methods first identify the
pairs of mutual nearest neighbours and then estimate the bias cor-
rection vectors between paired samples. These vectors are then
subtracted from the source cohort. The differences in NN methods
primarily relate to the way in which the mutual nearest pairs are
located within the geometric space [19–23]. For instance, MNN
[19] identified nearest neighbours between different datasets and
used them as reference points to calculate cohort bias. It employed
cosine normalization to pre-normalize the data, and then esti-
mated the bias correction vector by computing the Euclidean dis-
tances between paired samples. The bias correction vector was
subsequently applied to all samples, not just the paired samples.
Iterative clustering methods: Iterative clustering methods
aim to address cohort bias by conducting multiple bias correction
iterations through repeated clustering procedures. Typically, these
methods (1) cluster all samples from different cohorts, and (2)
determine the correction vectors for harmonization based on the
cluster centroids. Harmony [24] first utilized principal component
analysis (PCA) to reduce the dimensionality of all samples and
then divided them into multiple groups, with one centroid per
group, by using k-means clustering. These centroids were then
used to calculate the correction factors for harmonization. The
clustering and correction steps were repeated until convergence
was achieved.
5 Data Harmonization to Address the Non-biological Variances… 109

Matching Methods
Matching methods in data harmonization are used to align data
collected from different sources that may have different formats
or structures. Resampling is the common matching method used
for automatic data harmonization. Resampling, also known as
resizing, is a method that involves altering the dimensions or reso-
lution of images or signals to match those of other datasets. This
method can be used to harmonize data collected from different
sources with varying resolutions or image sizes. In radiomic
study, the reproducibility of radiomic features is heavily affected
by the voxel/pixel size (refers to the physical length of a single
pixel in the CT/MRI image).

Synthesis Methods
Synthesis is a method used to generate samples that belong to a
specific modality or domain, effectively harmonizing multi-cohort
datasets. This approach simplifies the task of data harmonization
by considering each cohort as a distinct style and transferring all
samples to a common style. Synthesis techniques can be divided
into paired synthesis and unpaired synthesis, depending on the
features of the training sample. Paired synthesis is used when cor-
responding samples from different cohorts are available, while
unpaired synthesis is used when such correspondence is absent (in
Fig. 5.6).
Paired synthesis approaches are trained on paired samples that
originate from the same object but are obtained using different
protocols (e.g., CT scans collected from same patient with differ-
ent scanners). These techniques are developed to learn how to
transform data between the source and reference cohorts. For
example, Park et al. proposed “deep harmonics” for CT slice

Fig. 5.6 Workflow of paired synthesis and unpaired synthesis model


110 Y. Nan et al.

thickness harmonization, by introducing an end-to-end deep neu-


ral network to generate CT scans. However, in clinical practice,
paired data is difficult to acquire with a high cost.
Unpaired synthesis mainly refers to cycle-GAN and condi-
tional VAE approaches (Fig. 5.6), which can be well trained with
sufficient unpaired samples. The cycle-GAN based approaches
[25, 26] are trained in a cycle-consistent manner, including for-
ward translation, backward translation, and cycle consistency.
The training procedure remains iteratively until the synthetic
images are close to the target domain images. Different from
cycle-GAN based method, VAE applies an encoder to compress
the input (high-dimensional data) into data representations (low
dimensional vector), and a decoder to reconstruct the raw data
using these data representations. Conditional VAE changes the
decoder to a conditional one which transfers the data representa-
tions back to the harmonized data regarding the cohort prior
knowledge. By integrating conditional VAE with the adversarial
module, the cohort transfer can be performed without paired
training samples. For instance, Moyer et al. [27] proposed a con-
ditional VAE to provide cohort-invariant representations by intro-
ducing spherical harmonics coefficients as inputs and outputs.

I nvariant Representation Learning Methods


Invariant feature learning techniques aim to identify features that
are consistent across different sets of data, and then use these fea-
tures to perform specific tasks such as segmentation, classifica-
tion, or regression. The idea behind representation learning
methods for harmonization is that by creating a concise dictionary
or mapping from diverse data sources, the resulting representa-
tions will not include any variability that is specific to a particular
dataset or cohort.
There are mainly two schemes of invariant feature learning (in
Fig. 5.7). Approaches such as “Deep unlearning harmonics” [28]
are usually applied with an adversarial module, or domain classi-
fier, to aid the encoder in identifying features that are consistent
across various cohorts of data. This is accomplished by maximiz-
ing the adversarial loss Ladv while simultaneously minimizing the
5 Data Harmonization to Address the Non-biological Variances… 111

Fig. 5.7 Invariant feature learning schemes

main task loss LTask. To obtain accurate representations of the fea-


tures, methods such as normalization autoencoder [29] introduced
a decoder that reconstructs the original input data, thereby mini-
mizing the reconstruction loss LRec. By incorporating these opti-
mization functions, these methods can ensure stable performance
when working with data from multiple cohorts.

5.3 Challenges for Data Harmonization

For years, computational data harmonization has been proposed


as a solution to mitigate the data inconsistency issue in digital
healthcare research studies. However, applying this concept to
real-world multicentre, multimodal, and multi-scanner medical
practice and clinical trials poses a significant challenge. Although
transfer/federated/multitask learning approaches have shown
promising results, their success depends on ideal conditions and
may fail when working across different data sources, requiring
effective data harmonization. Unfortunately, there is limited con-
sensus on which approaches and metrics are best suited for deal-
ing with multimodal datasets [1]. In addition, the lack of a
standardized stepwise design methodology makes it difficult to
reproduce existing studies, hindering progress in the field. The
112 Y. Nan et al.

challenges of data harmonization in radiomics include various


aspects.
Firstly, the local and scale approaches present several issues.
Most distribution-based methods require refined feature vectors
that depend on prior knowledge of the regions of interest, while
accurate prediction of regions of interest cannot be achieved with-
out proper data harmonization. Furthermore, although some
distribution-­based methods, such as ComBat, can remove cohort
bias while preserving differences between radiomic features on
phantoms, they are not well-suited to images or high-dimensional
signals due to their demanding computational complexity.
Additionally, when new data is added, data harmonization needs
to be performed on the entire dataset again, and some pairwise
approaches require a complex training procedure, such as repeated
training, when applied to multicentre datasets with more than two
cohorts.
Secondly, despite the significant progress made in deep
learning-­based synthesis methods, their reproducibility and gen-
eralizability are still a concern. These methods face clear limita-
tions such as (1) being mainly built on existing multicentre
datasets, lacking evaluations on new datasets; (2) being based on
GAN models that are unstable and may introduce unrealistic
changes or hallucinations; and (3) requiring a large amount of
training data for all cohorts, which may not be feasible for clinical
studies. To address these issues, researchers should report the per-
formance of data harmonization on new datasets that were not
involved in the model development; improve the stability of data
synthesis; and develop data harmonization strategies that require
less training data.
Moreover, while extracting invariant features across cohorts is
a promising approach to address the limitations of synthesis meth-
ods, it also has its challenges. Specifically, it can only extract
invariant features for analysis and cannot generate harmonized
data. Therefore, future research should aim to develop methods
that can generate harmonized data using the extracted invariant
features.
An unexplored research area in data harmonization is the use
of explainable artificial intelligence (XAI) methods [30]. XAI
5 Data Harmonization to Address the Non-biological Variances… 113

techniques can provide insight into the possible reasons for incon-
sistent data representations that contribute to bias in data-based
models. By analysing these insights, researchers can determine
whether the biasing artifacts are due to inadequate data harmoni-
zation before the learning phase. Additionally, local explanatory
methods can identify out-of-distribution examples that may relate
to data harmonization issues, such as equipment miscalibration or
changes in data capture protocols. Improved data harmonization
can benefit XAI by standardizing all data and eliminating cohort
biases [31, 32]. In summary, we anticipate an exciting cross-­
disciplinary research area at the intersection of harmonization and
XAI.

References
1. Nan Y et al (2022) Data harmonization for information fusion in digital
healthcare: a state-of-the-art systematic review, meta-analysis and future
research directions. Inf Fusion 82:99
2. Berenguer R et al (2018) Radiomics of CT features may be nonreproduc-
ible and redundant: influence of CT acquisition parameters. Radiology
288(2):407–415
3. Sunderland JJ, Christian PE (2015) Quantitative PET/CT scanner perfor-
mance characterization based upon the society of nuclear medicine and
molecular imaging clinical trials network oncology clinical simulator
phantom. J Nucl Med 56(1):145–152
4. Yamashita R et al (2020) Radiomic feature reproducibility in contrast-­
enhanced CT of the pancreas is affected by variabilities in scan parame-
ters and manual segmentation. Eur Radiol 30(1):195–205
5. Jha A et al (2021) Repeatability and reproducibility study of radiomic
features on a phantom and human cohort. Sci Rep 11(1):1–12
6. Emaminejad N, Wahi-Anwar MW, Kim GHJ, Hsu W, Brown M, McNitt-­
Gray M (2021) Reproducibility of lung nodule radiomic features: multi-
variable and univariable investigations that account for interactions
between CT acquisition and reconstruction parameters. Med Phys
48:2906
7. Kim M, Jung SC, Park JE, Park SY, Lee H, Choi KM (2021)
Reproducibility of radiomic features in SENSE and compressed SENSE:
impact of acceleration factors. Eur Radiol 31:1–14
8. Saeedi E et al (2019) Radiomic feature robustness and reproducibility in
quantitative bone radiography: a study on radiologic parameter changes.
J Clin Densitom 22(2):203–213
114 Y. Nan et al.

9. Meyer M et al (2019) Reproducibility of CT radiomic features within the


same patient: influence of radiation dose and CT reconstruction settings.
Radiology 293(3):583–591
10. Perrin T et al (2018) Short-term reproducibility of radiomic features in
liver parenchyma and liver malignancies on contrast-enhanced CT imag-
ing. Abdom Radiol 43(12):3271–3278
11. Midya A, Chakraborty J, Gönen M, Do RK, Simpson AL (2018) Influence
of CT acquisition and reconstruction parameters on radiomic feature
reproducibility. J Med Imaging 5(1):011020
12. Altazi BA et al (2017) Reproducibility of F18-FDG PET radiomic fea-
tures for different cervical tumor segmentation methods, gray-level dis-
cretization, and reconstruction algorithms. J Appl Clin Med Phys
18(6):32–48
13. Zhao B et al (2016) Reproducibility of radiomics for deciphering tumor
phenotype with imaging. Sci Rep 6(1):1–7
14. Choe J et al (2019) Deep learning–based image conversion of CT recon-
struction kernels improves radiomics reproducibility for pulmonary nod-
ules or masses. Radiology 292(2):365–373
15. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in micro-
array expression data using empirical Bayes methods. Biostatistics
8(1):118–127
16. Whitney HM, Li H, Ji Y, Liu P, Giger ML (2020) Harmonization of
radiomic features of breast lesions across international DCE-MRI datas-
ets. J Med Imaging 7(1):012707
17. Wrobel J et al (2020) Intensity warping for multisite MRI harmonization.
NeuroImage 223:117242
18. Lazar C et al (2013) GENESHIFT: a nonparametric approach for inte-
grating microarray gene expression data based on the inner product as a
distance measure between the distributions of genes. IEEE/ACM Trans
Comput Biol Bioinform 10(2):383–392
19. Haghverdi L, Lun AT, Morgan MD, Marioni JC (2018) Batch effects in
single-cell RNA-sequencing data are corrected by matching mutual near-
est neighbors. Nat Biotechnol 36(5):421–427
20. Hie B, Bryson B, Berger B (2019) Efficient integration of heterogeneous
single-cell transcriptomes using Scanorama. Nat Biotechnol 37(6):685–
691
21. Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell
gene expression data analysis. Genome Biol 19(1):1–5
22. Polański K, Young MD, Miao Z, Meyer KB, Teichmann SA, Park J-E
(2020) BBKNN: fast batch alignment of single cell transcriptomes.
Bioinformatics 36(3):964–965
23. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R (2018) Integrating
single-cell transcriptomic data across different conditions, technologies,
and species. Nat Biotechnol 36(5):411–420
5 Data Harmonization to Address the Non-biological Variances… 115

24. Korsunsky I et al (2019) Fast, sensitive and accurate integration of single-­


cell data with Harmony. Nat Methods 16(12):1289–1296
25. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image
translation using cycle-consistent adversarial networks. In: Proceedings
of the IEEE international conference on computer vision, pp 2223–2232
26. Zhao F et al (2019) Harmonization of infant cortical thickness using
surface-­to-surface cycle-consistent adversarial networks. In: International
conference on medical image computing and computer-assisted interven-
tion. Springer, pp 475–483
27. Moyer D, Ver Steeg G, Tax CM, Thompson PM (2020) Scanner invariant
representations for diffusion MRI harmonization. Magn Reson Med
84(4):2174–2189
28. Dinsdale NK, Jenkinson M, Namburete AI (2021) Deep learning-based
unlearning of dataset bias for MRI harmonization and confound removal.
NeuroImage 228:117689
29. Rong Z et al (2020) NormAE: deep adversarial learning model to remove
batch effects in liquid chromatography mass spectrometry-based metabo-
lomics data. Anal Chem 92(7):5082–5090
30. Arrieta AB, et al (2020). Explainable Artificial Intelligence (XAI):
Concepts, taxonomies, opportunities and challenges toward responsible
AI. Information fusion, 58:82–115.
31. Yang G, Ye Q, Xia J (2022) Unbox the black-box for the medical explain-
able ai via multi-modal and multi-centre data fusion: a mini-review, two
showcases and beyond. Inf Fusion 77:29–52
32. Holzinger A et al (2022) Information fusion as an integrative cross-­
cutting enabler to achieve robust, explainable, and trustworthy medical
artificial intelligence. Inf Fusion 79:263–278
Harmonization in the Image
Domain 6
F. Garcia-Castro and E. Ibor-Crespo

6.1 The Need for Image Harmonization

Medical imaging presents variability that depends on several fac-


tors [1]. The scanner model and manufacturer, acquisition proto-
col or patient preparation are factors that affect image quality.
When describing image quality, two main aspects must be consid-
ered within the field of medical imaging: diagnostic quality and
technical quality. Although both quality definitions focus on dif-
ferent aspects to assess whether an image is satisfactory or not,
they are not independent or unrelated. Obtaining an adequate
diagnostic quality requires the images in the study to capture the
necessary attributes and characteristics that represent physiology
and pathology, facilitating the clinician or radiologist to identify
the relevant findings with a correct interpretation of the image.
Technical quality should be defined by the parameters used to
acquire, reconstruct or generate the images ensuring, for example,
that the images of the study will be free of artifacts, with sufficient

F. Garcia-Castro (*)
Department of Technology and Innovation, Quibim, Valencia, Spain
e-mail: [email protected]
E. Ibor-Crespo
Department of AI Research, Quibim, Valencia, Spain

© The Author(s), under exclusive license to Springer Nature 117


Switzerland AG 2023
Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image
Processing, Imaging Informatics for Healthcare Professionals,
https://doi.org/10.1007/978-3-031-48446-9_6
118 F. Garcia-Castro and E. Ibor-Crespo

spatial resolution, or with satisfactory contrast. Technical quality


has, of course, a large effect on diagnostic quality. A low signal-­
to-­noise ratio (SNR) can lead to inaccurate representation of
­tissues and organs, as a high noise level can render features indis-
tinguishable [2], thus decreasing the diagnostic quality. However,
a high technical quality does not automatically produce an image
of high diagnostic quality as, for instance, the image focus could
be completely out of the scope of the tissues or organs that need
to be properly represented on the image.
In an ideal scenario, images acquired under the same condi-
tions should present the same technical quality with minimal
deviations. However, in a real-world scenario, we will find differ-
ent manufacturers and scanner models in clinical practice. These
variations will cause shifts in the signal intensities of the image
voxels1 even with similar acquisition protocols. This situation is
accentuated when dealing with multi-centric real-world data
(RWD), as a high degree of variability affects images coming
from the different imaging sites [3]. Different manufacturers,
scanners or acquisition protocols introduce image variability that
can affect the possibility of developing generalizable AI models,
while also affecting the reproducibility in the calculation of quan-
titative imaging biomarkers (QIBs) [4], these being paramount to
successfully develop and apply radiomic techniques and method-
ologies. These sources of variability are especially pronounced
when dealing with magnetic resonance (MR) images, since the
intensity values are not normalized to standard units as in com-
puted tomography (CT) scans, where intensity values can be stan-
dardized to Hounsfield units values due to the intrinsic physics
principles of X-ray attenuation.
To reduce this variability, two large groups of techniques can
be used: standardization and harmonization.
Through standardization, it is intended to ensure that the pro-
cesses are always carried out in the most similar way possible to
reduce variability in image quality and in the quantification of
imaging biomarkers. For example, always using the same acquisi-

1
Since we will be discussing mainly tomographic image, we will refer to
voxels instead of pixels.
6 Harmonization in the Image Domain 119

tion protocol [5, 6] or performing patient preparation in a strict


and reproducible manner [7]. Also, when it comes to quantifying
imaging biomarkers, standardization should seek to perform the
measurements in a homogeneous way in all cases. Initiatives such
as the Quantitative Imaging Biomarkers Alliance (QIBA) from
the Radiological Society of North America (RSNA) or the
European Imaging Biomarkers Alliance (EIBALL) from the
European Society of Radiology (ESR) seek standardization in the
quantification of imaging biomarkers by setting profiles for spe-
cific methodologies. These profiles include recommendations that
depict not only image acquisition protocols in detail, but also the
specific methodologies to be followed for the calculation of a
given quantitative imaging biomarker.
The objective of harmonization processes in medical imaging,
and specifically of image harmonization techniques (IHTs), is to
bring the images to a common space, space referring to the set of
characteristics that define the appearance of the image: contrast,
resolution, dynamic range, etc. Harmonization is an important
process to ensure better generalization in artificial intelligence
(AI) models [8]. Supervised AI models based on convolutional
neural networks (CNNs) focused on segmentation, classification
or object detection learn from a finite data set. This dataset will
contain images with a specific contrast range determined by the
acquisition protocols and characteristics of the scanners used. The
trained network will have learned, therefore, to recognize findings
or structures in images that present these characteristics and the
generalization that the network is capable of reaching will be lim-
ited by the particularities of the training dataset.
Applying IHTs before using a CNN to segment a specific anat-
omy can indirectly help in the generalization of the model, as the
objective of the IHT would be to bring the image contrast to the
general vicinity of the images used for training, therefore ensur-
ing a better performance of the model than when using a com-
pletely foreign contrast. A model to predict patient relapse based
on radiomic features would be also greatly benefit from IHTs
application, since it would ensure that radiomic features are
always calculated from images with similar contrast and signal
intensities.
120 F. Garcia-Castro and E. Ibor-Crespo

The field of radiomics has gained significant attention in the


recent years, with researchers seeking to develop models that can
extract useful information from medical images to aid in ­diagnosis,
treatment planning, and patient outcomes. In this context, IHTs
have emerged as a crucial tool for improving the performance of
radiomic models [9].
A wide range of IHTs has been investigated to address the
various sources of image variability that can impact the accuracy
and reliability of radiomic features. These techniques range from
traditional computer vision algorithms for image normalization to
more advanced solutions based on AI, such as generative adver-
sarial networks (GANs) and autoencoders.
In this chapter, we will describe several of these IHTs, while
also considering the various sources of image variability, includ-
ing factors that affect signal intensity, contrast, and spatial resolu-
tion, as well as issues related to image artifacts and noise. By
gaining a better understanding of these topics, we can develop
more effective strategies for optimizing radiomics performance
and advancing the field of medical imaging in research and clini-
cal practice.

6.2 Image Variability Sources

Variability sources in medical imaging are usually inherent to


each modality. While these sources can be grouped in general cat-
egories, such as patient preparation, acquisition protocol, foreign
bodies, etc., all of them have concrete root causes that can be tied
to the intrinsic characteristics of each modality. Patient prepara-
tion is one of the sources that induces variability across all modal-
ities, but with very specific issues arising depending on the
modality. The effect of patient preparation is especially relevant
and concerning, as if the process was not performed appropri-
ately, the imaging study might be rendered completely useless,
not only for radiomic model development, but also for reading
purposes in some scenarios.
Lack of patient preparation can create a wide array of issues
with many of them affecting the possibility of developing AI
6 Harmonization in the Image Domain 121

models. If when acquiring a lung CT, the patient does not perform
the full inspiration cycle properly, the study will not be adequate
to perform any kind of volumetric analysis [10]. In ­multiparametric
prostate MRI (mpMRI), if the patient underwent suboptimal rec-
tal preparation, the rectum could show an excessive amount of
gas, creating susceptibility artifacts that especially affect the dif-
fusion weighted imaging (DWI) sequence with a large degree of
unwanted deformation [11], affecting image registration or organ
segmentation. Or if during the acquisition of a FDG PET/CT (flu-
orodeoxyglucose positron emission tomography + computed
tomography) scan the patient is not placed with the arms raised
above the head but alongside the body, beam hardening artifacts
and field-of-view (FOV) truncation artifacts might appear [12].
Poor patient preparation, in general, cannot be solved by har-
monization techniques. Standardization of patient preparation for
each type of imaging study is the best way to ensure reducing
variability in the cases where this procedure is needed. However,
there is no consensus patient preparation protocols depicted for all
reporting guidelines. For instance, the PI-RADS v2.1 prostate
mpMRI reading guidelines of the American College of Radiology
(ACR) specifically states that there is no consensus patient prepa-
ration that demonstrates an improvement in diagnostic accuracy.
Nevertheless, artifacts introduced by lack of patient preparation
will significantly hinder the performance of AI models and tradi-
tional computer vision algorithms, as anatomy can suffer heavy
deformations or other unwanted modifications.
While IHTs may not be the ideal solution for addressing this
kind of issues, they are particularly useful when it comes to deal-
ing with variability sources that directly impact the signal inten-
sity and contrast of an image. Factors such as the vendor of the
equipment, the specific scanner model, the firmware version, the
reconstruction algorithms employed, and the acquisition proto-
cols utilized can all affect image contrast in different ways
depending on the modality.
For instance, the tube voltage in CT scans and the repetition
time (TR) in MRI scans can fall within acceptable ranges for diag-
nostic purposes. However, the differences in SNR and image con-
trast that they introduce can hinder the performance of AI
122 F. Garcia-Castro and E. Ibor-Crespo

algorithms, rendering them less effective. In such cases, employ-


ing IHTs to standardize the contrast of the images can signifi-
cantly enhance the accuracy and reliability of AI models [13].
It is important to note that the choice of IHT will depend on the
specific imaging modality and the particular variability sources
that need to be addressed. Furthermore, careful consideration of
factors such as the computational resources required and the
impact of the IHT on the final image quality should also be con-
sidered when selecting the appropriate IHT for a given applica-
tion.

6.2.1 Image Acquisition

Several aspects influence image SNR and contrast across different


modalities. Due to the physical principles that each modality is
based on, the characteristics that affect the image are different.
In CT, the SNR is affected by many different aspects, from
detector, collimator, tube current, slice thickness or the recon-
struction algorithm. All these features affect the number of pho-
tons detected, which in turn affects the appearance of the image.
Tube current, or dose, is one of the most closely related aspects of
SNR variability. Increasing the current by a factor of 2 means
potentially having twice the signal. However, since CT is based
on the emission of ionizing radiation, the trend in clinical practice
is to follow the principle “as low as reasonably achievable”
(ALARA) [14]. The ALARA principle is based on the fact that
with ionizing radiation there is no safe dose, no matter how small,
and therefore it must be kept as low as possible. This principle has
led to the fact that, for good reason, most CT acquisitions in clin-
ical practice are low-dose CT scans.
In MRI, one of the main factors affecting the SNR is the inten-
sity of the magnetic field. Magnetic field strength is directly propor-
tional to SNR, hence introducing variability on image quality
depending on if the study was acquired with a 1.5 Tesla (T) or a 3 T
scanner. TR and echo time (TE) also affect SNR in different ways.
Long TRs will increase SNR as longitudinal magnetization will get
closer to its maximum. However, excessively long TRs in
6 Harmonization in the Image Domain 123

T1-weighted (T1w) images will result in contrast loss among tis-


sues. TE behaves contrary to TR, as decreasing TE will increase the
SNR. Short TEs ensure that the transverse magnetization is high,
resulting in high signal. However, for T2-weighted (T2w) images,
greatly decreasing TE will result in suboptimal image contrast.
Many other acquisition-related factors will affect SNR, and
hence contrast, in MRI. Flip angle, slice thickness, spacing
between slices, matrix size, and field-of-view (FOV) are among
the key factors that influence image contrast. Each of these param-
eters can be adjusted in different ways depending on the specific
MRI imaging protocol and the scanner configuration, resulting in
significant variability in image contrast.
The radiofrequency coil used for signal reception is another
important factor that can impact the contrast of an MRI sequence.
Different types of coils have varying sensitivity to different tissue
types and magnetic field strengths and selecting the appropriate
coil for a particular imaging task is crucial for obtaining high-­
quality images with consistent contrast. In addition to these
acquisition-­related factors, the chosen k-space filling technique
can also affect image contrast. K-space is a mathematical repre-
sentation of the raw MRI data, and the way that it is sampled and
filled can impact image contrast and resolution.
Figure 6.1 shows the effect on image contrast of TR and TE
combinations on different prostate T2w image series acquired in
clinical practice. The contrast differences shown can easily be
appreciated and might introduce a degree of variability that an AI
model cannot simply overcome without proper application of
IHTs.

Fig. 6.1 Effect of TR and TE on prostate T2w images


124 F. Garcia-Castro and E. Ibor-Crespo

Due to this complexity, without the use of IHTs, an AI model


trained on a finite dataset of MRI images could perform poorly on
new, unseen images [15] due to the variability in image acquisi-
tion parameters. However, by incorporating IHTs into the prepro-
cessing pipeline, the AI model can learn to recognize important
features and patterns in the images despite the differences in
acquisition parameters, leading to more accurate and robust pre-
dictions.

6.3 Harmonization Techniques

The scope of image harmonization is broad and encompasses a


variety of techniques. However, for the purposes of this chapter,
we will focus on IHTs that aim to normalize the intensity of vox-
els across multiple images. This means that the techniques we will
discuss are those that adjust the intensity values of the voxels to a
similar range, regardless of their specific implementation. As a
result, we will not be discussing techniques such as blurring fil-
ters, inhomogeneity filters, or other image correction methods in
this chapter. Instead, we will concentrate on methods that are
designed to bring image intensity into alignment across different
images. Application-specific techniques will not be depicted in
detail either. These techniques make use of particular characteris-
tics of the acquired images, usually specific tissues, in order to
achieve a certain degree of harmonization. Using the values of
healthy liver parenchyma in order to normalize the standardized
uptake values (SUV) on a PET scan [16] or the WhiteStripe [17]
method to normalize of the tissues on brain MRI are examples of
such techniques.
In the field of image harmonization, we can classify IHTs into
two main categories from an implementation point of view: those
that employ conventional computer vision algorithms and those
that utilize AI methodologies. While non-AI methods are appli-
cable in many different situations, they often lack robustness and
overall performance compared to AI-based techniques. This is
where CNNs have emerged as a more robust alternative, learning
from the complexities and nuances of images, resulting in a more
6 Harmonization in the Image Domain 125

accurate and effective image harmonization. Despite the many


advantages of AI-based IHTs, they also have some limitations that
must be addressed to ensure their successful implementation in
certain scenarios. Nonetheless, incorporating IHTs into the pre-
processing pipeline can significantly improve an AI model’s abil-
ity to recognize important features and patterns in images, even in
the face of variability in image acquisition parameters.

6.3.1 Non-AI Methods

Image harmonization techniques that do not rely on AI are based


on conventional image processing techniques. These techniques
have been in use for several years and rely on traditional computer
vision algorithms. The primary goal of these techniques, as with
AI-based IHTs, is to adjust the intensity and contrast of the images
so that they appear similar, regardless of differences in the imag-
ing conditions or equipment used to acquire them.
The most commonly used image harmonization techniques are
based on intensity scaling histogram matching or normalization,
with the purpose of adjusting the intensity values of each voxel in
the image so that they have a similar distribution across different
images.
These techniques are used extensively in the medical imaging
field to improve the quality of the images and reduce the variabil-
ity among images acquired using different equipment or proto-
cols. The aim is to produce images that are visually comparable
and that can be used to facilitate diagnosis or further analysis.
One of the most significant advantages of non-AI image har-
monization techniques is that they are generally straightforward
to implement and can be applied to a broad range of imaging
modalities. They do not require any specific hardware or software
and can be implemented on a standard computer. This makes
them cost-effective and accessible to a broad range of users,
including those with limited computational resources.
While non-AI techniques have been widely used for several
years, their limitations include a lack of robustness and the inabil-
ity to handle complex variations in image characteristics. These
126 F. Garcia-Castro and E. Ibor-Crespo

limitations can result in suboptimal performance, especially when


dealing with images acquired using different protocols or
­equipment. However, despite these limitations, non-AI image har-
monization techniques remain a valuable tool in the field of med-
ical imaging, even as part of preprocessing pipelines for the
implementation of AI models.
In the following sections, some of these techniques will be
described.

Intensity Scaling
Intensity scaling is a technique used in image processing to adjust
the contrast and brightness of an image by scaling the range of
voxel intensity values. In other words, it involves mapping the
original intensity values of an image to a new range of values.
The scaling process of a grayscale digital image typically
involves two steps: normalization and rescaling. In the normaliza-
tion step, the minimum and maximum intensity values in the
image are identified. Then, the intensity values of all the voxels in
the image are shifted and scaled to be in the range of [0,1] using
Eq. 6.1,

I i - I min
Ni = (6.1)
I max - I min
where Ii is an image voxel, Imin and Imax are the image minimum
and maximum intensity, respectively, and Ni is the normalized
voxel. After normalization, the image is rescaled to a new range of
intensity values. This is usually done to enhance the contrast of
the image by stretching the intensity range to occupy the full
available range of values. The new intensity values are obtained
using Eq. 6.2,

Ri = ( N i * ( N max - N min ) ) + N min (6.2)


where Ri is a rescaled voxel and Nmin and Nmax are the normalized
image minimum and maximum intensity, respectively.
Other intensity scaling approaches can be used depending on
the application, ranging from simply dividing by the maximum
intensity of the image to more complex solutions.
6 Harmonization in the Image Domain 127

 -Score Normalization
Z
Z-score is a statistical measure that is used to evaluate how many
standard deviations a data point is from the mean of a dataset. In
the context of image processing, Z-score normalization is a tech-
nique that is used to normalize the intensity of voxels in an image.
It is a linear transformation method that scales the voxel values to
have a mean of zero and a standard deviation of one.
Z-score normalization is applied to images by computing the
mean and standard deviation of the intensity values of all the vox-
els in the image. The mean is subtracted from each voxel value,
and the result is divided by the standard deviation (Eq. 6.3). This
transforms the voxel values such that the mean of the image is
zero and the standard deviation is one.

Ii - m
Zi = (6.3)
s
where Zi is a Z-score normalized voxel and μ and σ are the mean
and standard deviation of all the image voxels, respectively.
As a standalone method for image harmonization, the Z-score
might not be applicable in all scenarios, as it assumes that the
distribution of voxel intensities in an image is Gaussian. If the
distribution is non-Gaussian, the normalization may not be appro-
priate. Additionally, Z-score normalization can only adjust the
overall brightness and contrast of an image and may not be able to
correct for more complex artifacts or variability in image acquisi-
tion. However, it has been used in recent research in different situ-
ations, such as a normalization method in MRI of head and neck
cancer [18] or as part of normalization strategies for radiomic
pipelines [19]. It can also be of great help as part of the prepro-
cessing pipeline of an AI model training, as data with an average
close to zero can help speed up convergence in specific scenarios
[20].

Histogram Equalization
Histogram equalization (HE) is a technique used in image pro-
cessing to enhance the contrast of an image by redistributing
voxel values in the image’s histogram. It works by increasing the
128 F. Garcia-Castro and E. Ibor-Crespo

global contrast of the image, which can reveal hidden details and
improve the overall quality of the image. While HE does not guar-
antee obtaining the same contrast across a dataset, it will create a
similar effect on all of them due to the flattening of the histogram.
HE works by transforming the original image’s voxel intensi-
ties to a new set of intensities such that the cumulative distribution
function (CDF) of the resulting image is as flat as possible. The
CDF is a measure of the distribution of voxel intensities in the
image.
The histogram equalization algorithm is a two-step process.
The first step is to calculate the histogram of the input image,
which is a plot of the frequency of occurrence of each gray level
in the image. The second step is to calculate the cumulative distri-
bution function of the histogram, which represents the number of
voxels with intensity levels less than or equal to a given level. The
image is then transformed by mapping the original voxel intensi-
ties to their new values in a way that equalizes the CDF as seen in
Eq. 6.4.

æ ( L - 1) ö
Ei = round ç * CDF ( I i ) ÷ (6.4)
è M ø
where Ei is the new voxel intensity value, Ii is the original voxel
intensity value, M is the total number of voxels in the image and
L is the number of possible voxel intensity levels. Figure 6.2
shows the effect of HE on a T2w prostate MRI slice.
Histogram equalization may not work well for images with a
bimodal or multimodal histogram, where there are several peaks
in the histogram. In such cases, adaptive histogram equalization
techniques such as contrast limited adaptive histogram equaliza-
tion (CLAHE) may be used [21].
CLAHE is a modified version of the traditional HE technique
which overcomes the limitations of HE by dividing the image into
small rectangular regions called tiles, and then applying the HE
technique to each tile separately. The size of the tiles is usually
chosen based on the size of the features of interest in the image.
For example, for medical images such as MRI scans, smaller tiles
can be used to capture the fine details of the image.
6 Harmonization in the Image Domain 129

Fig. 6.2 Effect of HE on a T2w prostate MRI slice. Top left, original slice.
Top right, equalized slice. Bottom left, original histogram. Bottom right,
equalized histogram

To prevent over-enhancement, CLAHE limits the maximum


amount of contrast enhancement that can be applied to each tile.
This limit is determined by the contrast distribution of the sur-
rounding tiles. In other words, the maximum enhancement that
can be applied to a tile is based on the contrast distribution of the
neighboring tiles. This approach ensures that the contrast enhance-
ment is adaptive to the local features of the image and prevents the
formation of artifacts or noise. CLAHE has applications as part of
machine and deep learning preprocessing pipelines [22], improv-
ing contrast enhancement before model training or inference.

Histogram Matching
Histogram matching, also known as histogram specification, is a
technique used to match the histogram of one image to another,
typically a reference image. It has been applied as a normalization
technique of medical images for many years [23]. The goal of
130 F. Garcia-Castro and E. Ibor-Crespo

histogram matching is to adjust the intensity values of an input


image such that it matches the intensity distribution of a reference
image.
To obtain the new histogram, the histograms of the input image
and the reference image are computed. After calculating the
cumulative distribution function (CDF) of both histograms they
are normalized to be in the range of 0–1. The inverse CDF of the
reference histogram is calculated, and the intensity values of the
input image are mapped to the corresponding values in the inverse
CDF. Figure 6.3 shows the effect of histogram matching on a slice
of a brain FLAIR MRI.
The resulting image has a histogram that matches the histo-
gram of the reference image. By matching the histograms of two
images, it is possible to transfer the statistical properties of the
reference image to the input image.

Fig. 6.3 Effect of histogram matching on a FLAIR MRI slice. Top left, input
FLAIR image. Top middle, reference FLAIR image. Top right, result image
after histogram matching. Bottom left, input histogram and CDF. Bottom
middle, reference histogram and CDF. Bottom right, result histogram and
CDF after histogram matching. Note that the result image CDF matches the
shape of the reference image CDF
6 Harmonization in the Image Domain 131

Compared to histogram equalization, histogram matching pro-


vides more control over the output histogram, since it uses a refer-
ence histogram to determine the mapping function. Histogram
matching can be used to match the histogram of an image to a
specific desired histogram, whereas histogram equalization
attempts to spread the intensity values evenly over the entire
dynamic range of the image. Histogram equalization can be con-
sidered a particular case of histogram matching. Histogram
matching can also preserve the spatial structure of the image, as it
does not rely on global operations, whereas histogram equaliza-
tion can produce undesirable artifacts due to the global nature of
the operation.
However, histogram matching can also introduce artifacts, par-
ticularly in regions where the reference image has very few vox-
els. In such cases, the mapping function may be non-monotonic,
resulting in non-linearities and distortion in the output image. To
mitigate these artifacts, various modifications to the standard his-
togram matching algorithm have been proposed, such as adaptive
histogram matching (AHM).
AHM divides the image into small regions or blocks and works
independently on each block. The size of the block and the num-
ber of bins used in the histogram can be adjusted depending on the
characteristics of the image. This approach ensures that the
­contrast and brightness of different regions of the image are
adjusted independently, while preserving the local details and
avoiding the over-enhancement of noise. One advantage of AHM,
over other histogram-based techniques, is its ability to enhance
images with varying illumination conditions or local contrast
variations, such as medical images.
Piecewise linear histogram matching (PLHM) introduces a dif-
ferent approach to histogram matching. Unlike regular histogram
matching, which applies a global transformation to the entire
image, PLHM performs piecewise linear transformations on the
image histogram, allowing for more precise and fine-grained
adjustments.
PLHM first divides the image histogram into several equal
intervals or bins. It then computes the CDF of both the source
image and the target histogram. Next, it divides the CDF of the
132 F. Garcia-Castro and E. Ibor-Crespo

source image into the same number of segments as the histogram


bins and fits a linear function to each segment. The slope of each
linear function represents the degree of contrast enhancement or
attenuation for that segment. Finally, it applies the piecewise lin-
ear transformation to the image, mapping the intensities of each
voxel to the corresponding intensities of the target histogram.
Compared to regular histogram matching, PLHM offers sev-
eral advantages. First, it preserves the local contrast of the image,
which can be important for preserving the fine details and tex-
tures. Second, it can handle non-linear mappings between the
source and target histograms, which can occur when the histo-
grams have different shapes. Third, it can be used to selectively
enhance or attenuate specific regions of the image, by adjusting
the slopes of the linear functions in different parts of the histo-
gram. However, PLHM also has some limitations. One major
issue is the potential for introducing artifacts or discontinuities in
the image, particularly at the boundaries between adjacent histo-
gram segments. Another issue is the requirement for careful selec-
tion of the number of histogram bins and the placement of the
linear functions, as poorly chosen parameters can lead to subopti-
mal results. PLHM has been used in recent research for harmoniz-
ing image quality in multi-center radiomic studies [24].

6.3.2 AI Methods

As with non-AI methodologies, IHTs based on AI are designed to


reduce variability among images that come from different sources
or scanners. AI approaches have been shown to be particularly
effective in achieving this goal. Specifically, image harmonization
techniques based on deep learning methods, have been found to
be more computationally advanced and capable of generating
much more satisfactory results than traditional techniques [25],
greatly reducing the variability on the resulting images. Compared
to traditional techniques, AI-based methods can effectively cap-
ture and model the underlying distribution of the data, leading to
more accurate and robust results.
6 Harmonization in the Image Domain 133

In this section, we will explore two techniques for image har-


monization: autoencoders and GANs. Both methods are based on
deep learning and have shown promising results in reducing vari-
ability among images from different sources or scanners.

Autoencoders
Autoencoders are a type of CNN that can be trained to learn a
compressed representation of input images by encoding them into
a low-dimensional latent space. The encoded information is then
formatted to solve a particular task, which depends on the archi-
tecture design and loss function, among others.
In medical imaging, autoencoders have been used for various
tasks, such as denoising [26] or segmentation [27]. Recently,
autoencoders have also been explored for image harmonization
purposes [28]. The basic idea behind image harmonization using
autoencoders is to train the network to learn the underlying fea-
tures of a set of medical images and then use this knowledge to
generate new images that have similar features, but with reduced
variability in appearance.
To achieve image harmonization using autoencoders, the net-
work is first trained using a set of input medical images. The
encoder part of the network is used to extract features from the
images, which are then compressed into a lower-dimensional
latent space. The decoder part of the network then takes this com-
pressed representation and reconstructs an output image that is as
close as possible to the original input image.
Once the autoencoder has been trained, it can be used to gener-
ate new images that are similar to the original input images but
with reduced variability. To do this, a new image is first fed
through the encoder to generate its latent representation. This
latent representation is then fed into the decoder to generate a new
image that is similar in appearance to the original input image but
has been harmonized to match the features of the training set.
Figure 6.4 shows a generic diagram of an autoencoder architec-
ture.
There are some limitations that should be considered. One
limitation is that the quality of the harmonized image is heavily
dependent on the quality and quantity of the training data. If the
134 F. Garcia-Castro and E. Ibor-Crespo

Fig. 6.4 Generic autoencoder architecture

training dataset is small or unrepresentative of the target


population, the performance of the model may be limited.
­
Additionally, autoencoders may struggle with image harmoniza-
tion tasks that require complex transformations or adjustments,
such as correcting for geometric distortions or registration errors.
Another limitation of autoencoders for image harmonization is
their sensitivity to noise and artifacts in the input images. This can
be particularly problematic in medical imaging, where images
may have low signal-to-noise ratios or other types of artifacts.
Furthermore, autoencoders may struggle with images that have a
very significant variability in appearance. In such cases, GANs
may be more effective for image harmonization.

 enerative Adversarial Networks (GANs)


G
Generative adversarial networks (GANs) have emerged as a pow-
erful tool for image synthesis and manipulation in medical imag-
ing [29]. They were introduced as a means of creating synthetic
images in a non-supervised manner. GANs have the ability to
learn complex and high-dimensional data distributions, making
them suitable for many different applications. Unlike autoencod-
ers, which learn a compressed representation of input images,
GANs learn to generate new images that are similar to a target
distribution. This makes GANs well-suited for harmonizing
images [30] from different sources or with different acquisition
protocols, as they can learn to generate new images that match the
target distribution.
6 Harmonization in the Image Domain 135

Fig. 6.5 Generic GAN architecture

The GAN architecture consists of two neural networks: a gen-


erator and a discriminator. The generator network learns to
­generate new images that match the target distribution, while the
discriminator network learns to distinguish between real images
and generated images. The two networks are trained simultane-
ously in an adversarial manner, where the generator tries to fool
the discriminator, and the discriminator tries to correctly classify
the images as real or generated. Figure 6.5 shows a generic GAN
architecture.
In the context of medical image harmonization, the generator
network can be trained to generate new images that match the
distribution of a target dataset, such as a dataset of images acquired
with a specific scanner or protocol. This can help to reduce the
variability in image appearance and make the images more com-
parable across different sources or protocols.
One approach for using GANs for medical image harmoniza-
tion is to use a CycleGAN [31] architecture, which can learn map-
pings between two different domains of images without the need
for paired training data. In the context of medical images, this
means that the CycleGAN can learn to map images from one
acquisition protocol to another, without the need for images that
are acquired with both protocols.
Another approach is to use a conditional GAN (cGAN) [32],
which can learn to generate new images conditioned on a specific
input, such as an image acquired with a specific scanner or proto-
col. This can help to generate new images that match the target
136 F. Garcia-Castro and E. Ibor-Crespo

distribution, even if the input image is from a different source or


protocol.
Some limitations should be taken into account when using
GANs for the development of IHTs. One of the main limitations
is the lack of control over the generated images. GANs generate
images by learning the distribution of the training data but con-
trolling the features of the generated images can be challenging,
leading to unrealistic or undesired characteristics. “Hallucinations”
can occur in both GANs and autoencoders, but they are more
common in GANs due to their generative nature. In the context of
image harmonization, “hallucinations” refer to the generation of
unrealistic or implausible details in the output images. GANs are
designed to generate new images by learning the distribution of
the training data, but this can lead to the generation of images with
unwanted features. To prevent “hallucinations” in image harmoni-
zation, it is important to use a diverse and representative dataset
during training. Additionally, regularization techniques such as
dropout and batch normalization can help prevent overfitting and
improve generalization.
Another limitation is the difficulty in training GANs. Large
amounts of training data are required and choosing the appropri-
ate hyperparameters can be challenging. Moreover, GANs can be
unstable during training and prone to mode collapse, where the
generator produces a limited set of similar images, ignoring the
diversity of the training data. Also, GANs are designed to gener-
ate images that are similar to the training data and may not gener-
alize well to new, unseen data.

 pplications and Other Approaches


A
The applications of AI methodologies for image harmonization
are varied and can be very problem specific. The effectiveness of
AI-based harmonization techniques largely depends on the char-
acteristics of the imaging data being analyzed and the specific
requirements of the application. For example, the types of harmo-
nization required for X-ray or CT scans may be very different
from those required for MRI or ultrasound images.
In CT, image harmonization is most commonly used for the
mitigation of the acquisition protocol induced variability, and
6 Harmonization in the Image Domain 137

Fig. 6.6 High-dose CT image reconstructed from low-dose CT image. Left,


reconstructed high-dose CT image. Right, original low-dose CT image

more specifically for converting low-dose images into high-dose


images (Fig. 6.6). This approach involves the generation of
­high-­quality medical images from low-quality, low-dose images,
which has significant implications for reducing radiation expo-
sure to patients. To achieve this goal, GANs have been utilized to
generate high-dose CT images from low-dose CT images [33].
GANs have the ability to learn and mimic the image characteris-
tics of high-dose images based on the low-dose images and can
generate realistic and high-quality images with fine details.
In MRI, the range of applications is much wider, from image
harmonization for a specific sequence, such T1w, T2w or FLAIR,
to style transfer to convert contrast from a T1w sequence to a T2w
sequence. However, image harmonization in MRI is mainly used
to reduce the variability introduced by the different acquisition
protocols. This is particularly important for multi-center studies
where imaging data are acquired from different sites. Several
research studies have been focused on the use of GANs for image
harmonization, comparing the performance of GANs for T1w
harmonization to traditional histogram matching techniques
applied to development of radiomic prediction models [25].
Trying to avoid the need of paired data, style-blind autoencoders
can also be used for T1w harmonization from previously unseen
scanners [34].
138 F. Garcia-Castro and E. Ibor-Crespo

Besides GANs and autoencoders, other deep learning architec-


tures might be suitable for image harmonization. U-Net has been
extensively used for anatomy and pathology segmentation, but it
is also being researched as an effective method for image harmo-
nization [35]. As some types of autoencoders and GANs do,
U-Net requires of paired data for training, with one scan acting as
the ground truth and the other as the image to be adapted to this
ground truth.

6.4 Conclusions

Radiomics has emerged as a promising field for personalized


medicine, where medical images are used to extract quantitative
features for diagnosis and treatment planning. However, the accu-
racy and reproducibility of radiomic studies can be affected by
variations in image acquisition protocols, leading to inconsisten-
cies in the extracted features. Medical image harmonization tech-
niques have been developed to address these issues by
standardizing images acquired from different scanners or at dif-
ferent timepoints.
Conventional image processing techniques have been used for
image harmonization, such as histogram matching or intensity
scaling. However, these methods may not capture the complex
inter-voxel relationships in medical images, leading to limited
success in harmonizing images across different modalities or
scanners.
Recent advances in deep learning have shown great potential in
medical image harmonization using GANs and autoencoders.
These methods can learn the complex mappings between images
from different sources. GAN-based methods can generate realis-
tic images with high fidelity, while autoencoder-based methods
can preserve the structural information of the original image.
Autoencoders have been shown to be effective in harmonizing
medical images with and without the need for paired data, but
they require careful design and training to ensure the quality of
the generated images. GANs, on the other hand, have shown to be
highly effective in generating realistic images that can be used for
6 Harmonization in the Image Domain 139

medical image harmonization, but some types of GANs require


paired data and are prone to instability during training.
Overall, both conventional image processing techniques and
AI methods have their strengths and limitations in medical image
harmonization. While conventional image processing techniques
can provide some level of harmonization, AI methods based on
GANs and autoencoders have shown greater success in capturing
the complex relationships in medical images. The choice of
approach depends on the specific application, available data, and
resources.
In conclusion, medical image harmonization is a crucial step in
radiomic studies to ensure the accuracy and reproducibility of
extracted features. Future research in this area should focus on
developing more robust and efficient AI methods for medical
image harmonization, as well as validating their clinical impact
on diagnosis and treatment planning.

References
1. Smith TB, Zhang S, Erkanli A, Frush D, Samei E (2021) Variability in
image quality and radiation dose within and across 97 medical facilities.
J Med Imaging (Bellingham) 8(5):052105. https://doi.org/10.1117/1.
JMI.8.5.052105. Epub 2021 May 8. PMID: 33977114; PMCID:
PMC8105613
2. Smith NB, Webb A (2010) Introduction to medical imaging: physics,
engineering and clinical applications. Cambridge University Press
3. Yan W, Huang L, Xia L, Gu S, Yan F, Wang Y, Tao Q (2020) MRI manu-
facturer shift and adaptation: increasing the generalizability of deep
learning segmentation for MR images acquired with different scanners.
Radiol Artif Intell 2(4):e190195. https://doi.org/10.1148/
ryai.2020190195. PMID: 33937833; PMCID: PMC8082399
4. Shukla-Dave A, Obuchowski NA, Chenevert TL, Jambawalikar S,
Schwartz LH, Malyarenko D, Huang W, Noworolski SM, Young RJ,
Shiroishi MS, Kim H, Coolens C, Laue H, Chung C, Rosen M, Boss M,
Jackson EF (2019) Quantitative imaging biomarkers alliance (QIBA) rec-
ommendations for improved precision of DWI and DCE-MRI derived
biomarkers in multicenter oncology trials. J Magn Reson Imaging
49(7):e101–e121. https://doi.org/10.1002/jmri.26518
5. Schellinger PD, Jansen O, Fiebach JB, Hacke W, Sartor K (1999) A stan-
dardized MRI stroke protocol: comparison with CT in hyperacute intra-
140 F. Garcia-Castro and E. Ibor-Crespo

cerebral hemorrhage. Stroke 30(4):765–768. https://doi.org/10.1161/01.


str.30.4.765. PMID: 10187876
6. Purysko AS, Baroni RH, Giganti F, Costa D, Renard-Penna R, Kim CK,
Raman SS (2021) PI-RADS version 2.1: a critical review, from the AJR
special series on radiology reporting and data systems. AJR Am J
Roentgenol 216(1):20–32. https://doi.org/10.2214/AJR.20.24495. Epub
2020 Nov 19. PMID: 32997518
7. Sheikh-Sarraf M, Nougaret S, Forstner R, Kubik-Huch RA (2020) Patient
preparation and image quality in female pelvic MRI: recommendations
revisited. Eur Radiol 30(10):5374–5383. https://doi.org/10.1007/s00330-­
020-­06869-­8. Epub 2020 Apr 30. PMID: 32356160
8. Bashyam VM, Doshi J, Erus G, Srinivasan D, Abdulkadir A, Habes M,
Fan Y, Masters CL, Maruff P, Zhuo C, Völzke H, Johnson SC, Fripp J,
Koutsouleris N, Satterthwaite TD, Wolf DH, Gur RE, Gur RC, Morris JC,
Albert MS, Grabe HJ, Resnick SM, Bryan RN, Wolk DA, Shou H,
Nasrallah IM, Davatzikos C (2020) Medical image harmonization using
deep learning based canonical mapping: toward robust and generalizable
learning in imaging. ArXiv.abs/2010.05355
9. Isaksson LJ, Raimondi S, Botta F, Pepa M, Gugliandolo SG, De Angelis
SP, Marvaso G, Petralia G, De Cobelli O, Gandini S, Cremonesi M,
Cattani F, Summers P, Jereczek-Fossa BA (2020) Effects of MRI image
normalization techniques in prostate cancer radiomics. Phys Med 71:7–
13. https://doi.org/10.1016/j.ejmp.2020.02.007. Epub 2020 Feb 18
PMID: 32086149
10. Petersen J, Wille MM, Rakêt LL, Feragen A, Pedersen JH, Nielsen M,
Dirksen A, de Bruijne M (2014) Effect of inspiration on airway dimen-
sions measured in maximal inspiration CT images of subjects without
airflow limitation. Eur Radiol 24(9):2319–2325. https://doi.org/10.1007/
s00330-­014-­3261-­3. Epub 2014 Jun 6. PMID: 24903230
11. Plodeck V, Radosa CG, Hübner HM, Baldus C, Borkowetz A, Thomas C,
Kühn JP, Laniado M, Hoffmann RT, Platzek I (2020) Rectal gas-induced
susceptibility artefacts on prostate diffusion-weighted MRI with epi read-­
out at 3.0 T: does a preparatory micro-enema improve image quality?
Abdom Radiol (NY) 45(12):4244–4251. https://doi.org/10.1007/s00261-­
020-­02600-­9. Epub 2020 Jun 4. Erratum in: Abdom Radiol (NY). 2021
Nov;46(11):5450. PMID: 32500236; PMCID: PMC8260527
12. Boellaard R, Delgado-Bolton R, Oyen WJ, Giammarile F, Tatsch K,
Eschner W, Verzijlbergen FJ, Barrington SF, Pike LC, Weber WA,
Stroobants S, Delbeke D, Donohoe KJ, Holbrook S, Graham MM,
Testanera G, Hoekstra OS, Zijlstra J, Visser E, Hoekstra CJ, Pruim J,
Willemsen A, Arends B, Kotzerke J, Bockisch A, Beyer T, Chiti A, Krause
BJ, European Association of Nuclear Medicine (EANM) (2015) FDG
PET/CT: EANM procedure guidelines for tumour imaging: version 2.0.
Eur J Nucl Med Mol Imaging 42(2):328–354. https://doi.org/10.1007/
6 Harmonization in the Image Domain 141

s00259-­014-­2961-­x. Epub 2014 Dec 2. PMID: 25452219; PMCID:


PMC4315529
13. Shao M, Zuo L, Carass A, Zhuo J, Gullapalli RP, Prince JL (2022)
Evaluating the impact of MR image harmonization on thalamus deep net-
work segmentation. Proc SPIE Int Soc Opt Eng 12032:120320H. https://
doi.org/10.1117/12.2613159. Epub 2022 Apr 4. PMID: 35514535;
PMCID: PMC9070007
14. Krishnamoorthi R, Ramarajan N, Wang NE, Newman B, Rubesova E,
Mueller CM, Barth RA (2011) Effectiveness of a staged US and CT pro-
tocol for the diagnosis of pediatric appendicitis: reducing radiation expo-
sure in the age of ALARA. Radiology 259(1):231–239. https://doi.
org/10.1148/radiol.10100984. Epub 2011 Jan 28 PMID: 21324843
15. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K,
Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in
healthcare. Nat Med 25(1):24–29. https://doi.org/10.1038/s41591-­018-­
0316-­z. Epub 2019 Jan 7 PMID: 30617335
16. Kuhnert G, Boellaard R, Sterzer S, Kahraman D, Scheffler M, Wolf J,
Dietlein M, Drzezga A, Kobe C (2016) Impact of PET/CT image recon-
struction methods and liver uptake normalization strategies on quantita-
tive image analysis. Eur J Nucl Med Mol Imaging 43(2):249–258. https://
doi.org/10.1007/s00259-­015-­3165-­8. Epub 2015 Aug 18. Erratum in: Eur
J Nucl Med Mol Imaging. 2015 Oct 19; PMID: 26280981
17. Shinohara RT, Sweeney EM, Goldsmith J, Shiee N, Mateen FJ, Calabresi
PA, Jarso S, Pham DL, Reich DS, Crainiceanu CM, Australian Imaging
Biomarkers Lifestyle Flagship Study of Ageing, Alzheimer’s Disease
Neuroimaging Initiative (2014) Statistical normalization techniques for
magnetic resonance imaging. Neuroimage Clin 6:9–19. https://doi.
org/10.1016/j.nicl.2014.08.008. Erratum in: Neuroimage Clin.
2015;7:848. PMID: 25379412; PMCID: PMC4215426
18. Wahid KA, He R, McDonald BA, Anderson BM, Salzillo T, Mulder S,
Wang J, Sharafi CS, McCoy LA, Naser MA, Ahmed S, Sanders KL,
Mohamed ASR, Ding Y, Wang J, Hutcheson K, Lai SY, Fuller CD, van
Dijk LV (2021) Intensity standardization methods in magnetic resonance
imaging of head and neck cancer. Phys Imaging Radiat Oncol 20:88–93.
https://doi.org/10.1016/j.phro.2021.11.001. PMID: 34849414; PMCID:
PMC8607477
19. Carré A, Klausner G, Edjlali M, Lerousseau M, Briend-Diop J, Sun R,
Ammari S, Reuzé S, Alvarez Andres E, Estienne T, Niyoteka S, Battistella
E, Vakalopoulou M, Dhermain F, Paragios N, Deutsch E, Oppenheim C,
Pallud J, Robert C (2020) Standardization of brain MR images across
machines and protocols: bridging the gap for MRI-based radiomics. Sci
Rep 10(1):12340. https://doi.org/10.1038/s41598-­020-­69298-­z. PMID:
32704007; PMCID: PMC7378556
20. LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient BackProp. In:
Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the
142 F. Garcia-Castro and E. Ibor-Crespo

trade. Lecture notes in computer science, vol 7700. Springer, Berlin,


Heidelberg. https://doi.org/10.1007/978-­3-­642-­35289-­8_3
21. Pizer SM, Eberly DH, Fritsch DS, Yushkevich PA (1987) Adaptive histo-
gram equalization and its variations. Comput Vis Graph Image Process
39(3):355–368. https://doi.org/10.1016/s0734-­189x(87)80186-­x. PMID:
11538358
22. Alghamedy FH, Shafiq M, Liu L, Yasin A, Khan RA, Mohammed HS
(2022) Machine learning-based multimodel computing for medical imag-
ing for classification and detection of Alzheimer disease. Comput Intell
Neurosci 2022:9211477. https://doi.org/10.1155/2022/9211477. PMID:
35990121; PMCID: PMC9391119
23. Wang L, Lai HM, Barker GJ, Miller DH, Tofts PS (1998) Correction for
variations in MRI scanner sensitivity in brain studies with histogram
matching. Magn Reson Med 39(2):322–327. https://doi.org/10.1002/
mrm.1910390222. PMID: 9469718
24. Campello VM, Martín-Isla C, Izquierdo C, Guala A, Palomares JFR,
Viladés D, Descalzo ML, Karakas M, Çavuş E, Raisi-Estabragh Z,
Petersen SE, Escalera S, Seguí S, Lekadir K (2022) Minimising multi-­
centre radiomics variability through image normalisation: a pilot study.
Sci Rep 12(1):12532. https://doi.org/10.1038/s41598-­022-­16375-­0.
PMID: 35869125; PMCID: PMC9307565
25. Tixier F, Jaouen V, Hognon C, Gallinato O, Colin T, Visvikis D (2021)
Evaluation of conventional and deep learning based image harmonization
methods in radiomics studies. Phys Med Biol 66(24). https://doi.
org/10.1088/1361-­6560/ac39e5. PMID: 34781280
26. Nishio M, Nagashima C, Hirabayashi S, Ohnishi A, Sasaki K, Sagawa T,
Hamada M, Yamashita T (2017) Convolutional auto-encoder for image
denoising of ultra-low-dose CT. Heliyon 3(8):e00393. https://doi.
org/10.1016/j.heliyon.2017.e00393. PMID: 28920094; PMCID:
PMC5577435
27. Baur C, Denner S, Wiestler B, Navab N, Albarqouni S (2021)
Autoencoders for unsupervised anomaly segmentation in brain MR
images: a comparative study. Med Image Anal 69:101952. https://doi.
org/10.1016/j.media.2020.101952. Epub 2021 Jan 2. PMID: 33454602
28. An L, Chen J, Chen P, Zhang C, He T, Chen C, Zhou JH, Yeo BTT (2022)
Alzheimer’s disease neuroimaging initiative; Australian imaging bio-
markers and lifestyle study of aging. Goal-specific brain MRI harmoniza-
tion. Neuroimage 263:119570. https://doi.org/10.1016/j.
neuroimage.2022.119570. Epub ahead of print. PMID: 35987490
29. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair
S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural
Inf Process Syst 27:2672–2680
30. Bashyam VM, Doshi J, Erus G, Srinivasan D, Abdulkadir A, Singh A,
Habes M, Fan Y, Masters CL, Maruff P, Zhuo C, Völzke H, Johnson SC,
Fripp J, Koutsouleris N, Satterthwaite TD, Wolf DH, Gur RE, Gur RC,
6 Harmonization in the Image Domain 143

Morris JC, Albert MS, Grabe HJ, Resnick SM, Bryan NR, Wittfeld K,
Bülow R, Wolk DA, Shou H, Nasrallah IM, Davatzikos C, iSTAGING
and PHENOM Consortia (2022) Deep generative medical image harmo-
nization for improving cross-site generalization in deep learning predic-
tors. J Magn Reson Imaging 55(3):908–916. https://doi.org/10.1002/
jmri.27908. Epub 2021 Sep 25. PMID: 34564904; PMCID: PMC8844038
31. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image trans-
lation using cycle-consistent adversarial networks. In: IEEE international
conference on computer vision (ICCV), pp 2242–2251
32. Mirza M, Osindero S (2014) Conditional generative adversarial nets.
arXiv preprint arXiv:1411.1784
33. Chen J, Wee L, Dekker A, Bermejo I (2022) Improving reproducibility
and performance of radiomics in low-dose CT using cycle GANs. J Appl
Clin Med Phys 23(10):e13739. https://doi.org/10.1002/acm2.13739.
Epub 2022 Jul 30. PMID: 35906893; PMCID: PMC9588275
34. Fatania K, Clark A, Frood R, Scarsbrook A, Al-Qaisieh B, Currie S, Nix
M (2022) Harmonisation of scanner-dependent contrast variations in
magnetic resonance imaging for radiation oncology, using style-blind
auto-encoders. Phys Imaging Radiat Oncol 22:115–122. https://doi.
org/10.1016/j.phro.2022.05.005. PMID: 35619643; PMCID:
PMC9127401
35. Dewey BE, Zhao C, Reinhold JC, Carass A, Fitzgerald KC, Sotirchos ES,
Saidha S, Oh J, Pham DL, Calabresi PA, van Zijl PCM, Prince JL (2019)
DeepHarmony: a deep learning approach to contrast harmonization
across scanner changes. Magn Reson Imaging 64:160–170. https://doi.
org/10.1016/j.mri.2019.05.041. Epub 2019 Jul 10. PMID: 31301354;
PMCID: PMC6874910
Harmonization
in the Features Domain 7
J. Lozano-Montoya
and A. Jimenez-­Pastor

7.1 Introduction

Harmonization of radiomic features is the process of standardiz-


ing the extraction and quantification of imaging features from
medical images by establishing guidelines and protocols for each
step of the radiomics workflow, to ensure that the extracted fea-
tures are consistent and comparable across different imaging plat-
forms, institutions, and studies.
The harmonization methods, which fall under the feature
domain category, are applied either after or during feature extrac-
tion to ensure consistency among the extracted radiomic features
once the image has been processed. Under the feature domain,
there are mainly two differentiated approaches. On the one hand,
some methods seek to identify radiomic variables that are more
stable under the type of image, modifications in the acquisition
parameters, or the center effect. On the other hand, there are dif-
ferent methods based on normalization techniques, which use sta-
tistical or deep learning approaches based on variables
standardization or scaling.

J. Lozano-Montoya (*) · A. Jimenez-Pastor


Department of AI Research, Quibim, Valencia, Spain
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature 145


Switzerland AG 2023
Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image
Processing, Imaging Informatics for Healthcare Professionals,
https://doi.org/10.1007/978-3-031-48446-9_7
146 J. Lozano-Montoya and A. Jimenez-Pastor

7.2 Reproducibility of Radiomic Features

The selection of reproducible features is a task performed in


radiomic studies that seeks to ensure their robustness to imaging
variability, identifying a subset of features that are insensitive to
variations in imaging protocols and acquisition settings.
One of the challenges in radiomics is that the reproducibility of
radiomic features is often not generalizable to different sites,
modalities, or scanners. This is due in part to the retrospective
nature of many radiomic studies and the lack of standardization
and variability in imaging protocols [1]. Additionally, radiomic
feature values are also influenced by patient variabilities, such as
weight and height or patient’s movements, which can impact the
levels of noise and presence of artifacts in the image [2].
Moreover, when assessing the reproducibility of radiomics, it
is important to be aware that cut-offs for correlation coefficients
are often arbitrarily chosen, and the number of “robust” features
depends on the number of subjects involved. Furthermore, it is
also important to note that the information from studies that have
assessed the impact of imaging settings on radiomics is often not
directly helpful to future studies, as the reproducibility of radiomic
features is not necessarily generalizable to different disease sites,
modalities, or scanners [3].
In order to overcome these challenges, several methods and
considerations can be applied at different reproducibility stages to
ensure reliability, see Fig. 7.1. These methods aim to minimize
the impact of the variability of imaging protocols and patient
characteristics on the extracted features.

Fig. 7.1 Reproducibility in radiomic features can be affected by different


aspects: imaging data, region of interest (ROI) segmentation, post-processing
and feature extraction algorithms, and research reproducibility
7 Harmonization in the Features Domain 147

7.2.1 Imaging Data Reproducibility

Imaging data reproducibility refers to the ability to consistently


obtain similar results from imaging studies. It encompasses two
aspects: repeatability and reproducibility. Repeatability refers to
the consistency of results obtained from repeated measurements
under identical or near-identical conditions, using the same equip-
ment and procedures. Reproducibility, on the other hand, refers to
the consistency of results obtained from measurements taken in
different settings, using different equipment or operators [4]. The
methodology for conducting these analyses differs depending on
the type of study [5].

I mage Acquisition and Reconstruction Parameters


Image acquisition and reconstruction parameters play an impor-
tant role in the extraction of reproducible and robust radiomic fea-
tures. These parameters include things like the imaging modality
(e.g., computed tomography [CT], magnetic resonance imaging
[MRI]), the imaging protocol (e.g., slice thickness, field of view),
and the reconstruction algorithm (e.g., filtered back projection,
iterative reconstruction) [6].
Ideally, the same imaging parameters must be used across dif-
ferent scans of the same patient, or different patients with similar
characteristics. This ensures that the features extracted from the
images are directly comparable and that any changes in the fea-
tures can be attributed to changes in the underlying disease rather
than variations in the imaging. However, when dealing with retro-
spective multi-centric real world data (RWD), it becomes difficult
to ensure the same acquisition protocol across scanners.
Furthermore, images should be acquired and reconstructed in a
way that minimizes noise and artifacts. This can be achieved
through the use of high-quality imaging equipment and standard
imaging protocols, as well as the use of advanced reconstruction
algorithms that can reduce noise and improve the signal-to-noise
ratio of the images.
Several studies have investigated the repeatability and repro-
ducibility of radiomic features using CT, MRI, and positron
148 J. Lozano-Montoya and A. Jimenez-Pastor

e­ mission tomography (PET) with strategies such as test-retest or


phantom (specialized object utilized in medical imaging for qual-
ity control, equipment calibration, dosimetry, and education)
studies to ensure that radiomic features are robust and reproduc-
ible. Although these studies reduce the exposure to patients, it is
important to note that phantom studies do not fully replicate the
complexity and heterogeneity of human tissues, and are com-
monly used to equipment calibrations.

CT Scans
One of the most studied factors influencing reconstruction in CT
scans is the voxel size. However, some studies also investigate the
impact of image discretization on radiomic features [1]. When
different acquisition modes and image reconstructions were
applied to CT, most features were found to be redundant, with
only 30% of them being reproducible across test-retest, with a
concordance correlation coefficient (CCC) of at least 0.90 [7].
When a phantom with 177 features was used and the pitch factor
and reconstruction kernel were modified, it was found that
between 76 and 151 of the features were reproducible [8]. This
highlights the importance of carefully considering factors such as
the voxel size, image discretization, and the reconstruction kernel
when analyzing CT-based datasets.

PET Scans
Many studies have been conducted to assess the reproducibility of
radiomic features in PET scans, but most of them only examine
the impact of variability in scanner and imaging parameters and
do not provide specific methods for achieving reproducible fea-
tures. The full-width half maximum (FWHM) of the Gaussian
filter is the most frequently investigated reconstruction factor in
this context [1].
An important study evaluates the impact of various image
reconstruction settings in PET/CT scans using data from a phan-
tom and a patient dataset from two different scanners [9]. The
study grouped the radiomic features into intensity-based,
geometry-­ based, and texture-based features, and their
7 Harmonization in the Features Domain 149

r­eproducibility and variability were measured using the coeffi-


cient of variation (COV). The results from both, the phantom and
patient studies, showed that 47% of all radiomic features were
reproducible. Another study [10] investigated whether radiomic
models developed using PET/CT images could be transferred to
PET/MRI images by assessing the reproducibility of radiomic
features under different test-retest and attenuation correction vari-
ability. The results of this study showed that intensity-based and
geometry-­based features were also reproducible.

MRI Sequences
The impact of test-retest, acquisition, and reconstruction settings
in MRI has been explored less extensively than for PET and CT. A
recent study investigated the robustness of radiomic features
across different MRI scanning protocols and scanners using a
phantom [11]. The results showed that the robustness of the fea-
tures varied depending on the feature, with most intensity-based
and gray-level co-occurrence matrix (GLCM) features showing
intermediate or small variation, while most neighborhood gray-­
tone difference (NGTD) features showed high variation. In the
GLCM features, variance, cluster shade, cluster tendency, and
cluster prominence had poor robustness. However, these features
had high reproducibility if the scanning parameters were kept the
same, making them useful for intrascanner studies. Nevertheless,
the study had limitations, including the effect of subject move-
ment and uncertainty in lesion segmentation.

Intra-individual Test-Retest Repeatability


Intra-individual test-retest repeatability studies involve measuring
the same individual multiple times. Radio mic features in test-­
retest repeatability assessments may be influenced by a variety of
factors, including variations in patient positioning, respiration
phase, contrast enhancement, and acquisition and processing
parameters [5]. Studies have shown that respiration can have a
significant impact on the reproducibility of radiomic features in
CT images of lung cancer patients during test-retest assessments
[12]. An MRI study performed a test-retest analysis for three
150 J. Lozano-Montoya and A. Jimenez-Pastor

acquisitions showing that only 37% of radiomic features were


found to be reproducible with a CCC > 0.8 [13].

Multi-scanner Reproducibility
Multi-machine reproducibility studies involve measuring the
same image on different scanners. A recent study based on the
reproducibility of radiomic features across several MRI scanners
and acquisition protocol parameters using both phantom and
patient data with a test-retest strategy, revealed very little differ-
ences in the variability between filtering and normalizing effect
which were used for preprocessing [11]. Moreover, the intra-class
correlation coefficient (ICC) measurements showed higher repro-
ducibility for the phantom data than for the patient data, however,
the study was unable to mitigate the impact of patient’s move-
ments despite simulating movement during scanning. A similar
study also extracted stable MRI radiomic features with a mini-
mum CCC of 0.85 between data derived from 61 patients’ test and
retest apparent diffusion coefficient (ADC) maps across various
MRI systems, tissues, and vendors [14].

7.2.2 Segmentation Reproducibility

Segmentation is a challenging and contentious aspect, particularly


in the context of oncology research, where tumors often have
complex borders and there is a high degree of inter-reader vari-
ability in manually contouring tumors. While normal structures
can now be segmented with full automation, diseases such as can-
cer, require operator input due to the inter- and intra-reader mor-
phologic and contrast heterogeneity at the initial examination.
There is an ongoing debate over the optimal methods for seg-
mentation, including the use of manual or automatic techniques
and the pursuit of ground truth or reproducibility. One solution to
this problem is the use of semiautomatic segmentation, which is
more reproducible than manual segmentation [15]. However, even
with semiautomatic segmentation, reproducibility is not ideal,
and researchers continue exploring automatic segmentation meth-
ods. One study in MRI brain exams that were segmented with
7 Harmonization in the Features Domain 151

four different methods found that deep learning-based approaches


had higher accuracy in predictive models, but also noted that sub-
tle differences in the segmentation methods can affect the radiomic
features obtained [16]. The reproducibility of segmentation can
also vary depending on the type of tumor being studied. For
example, a study of CT scans from patients with head and neck
cancer, pleural mesothelioma, and non-small cell lung cancer
found that the ROIs and radiomic features were most reproducible
in lung cancer [17].
However, despite these challenges, a consensus is emerging
that the optimal approach to segmentation is a combination of
computer-aided contouring followed by manual curation [18].

7.2.3 Post-processing and Feature Extraction

The process of feature extraction is complex and can be influ-


enced by several factors, such as outlier control, setting ranges of
intensity, and the number of bins used to discretize an image (i.e.,
for GLCM matrix calculation). An MRI study tested 33 different
combinations of variations using different voxel sizes, four gray-­
level discretizations, and three quantization methods [19].
However, it did not find a strong CCC across the combinations.
Furthermore, to ensure the reproducibility and generalizability
of results across studies, it is important to use consistent methods
for discretization and quantification, as the IBSI (Imaging
Biomarker Standardization Initiative) manual proposes [20].
Moreover, the use of different software packages for radiomic
extraction can lead to increased variability in feature values and
can negatively impact the reliability and prognostication ability of
radiomic models. Thus, one way to improve the reproducibility
and generalizability of results is by using a standardized and
open-source platform for feature extraction that follows IBSI
guidelines to obtained comparable features. PyRadiomics [21] is
the platform that is commonly used in this field and is publicly
available, with source code, documentation, and examples.
Recently, a study investigated the reproducibility of radiomic fea-
tures with two widely used radiomic software packages (IBEX
152 J. Lozano-Montoya and A. Jimenez-Pastor

and MaZda) in comparison to an IBSI compliant software


(PyRadiomics) [22]. The non-compliant packages obtained sig-
nificantly less reproducible features compare with the IBSI com-
pliant ones; however, both options had similar predictive power in
a model of response to radiotherapy for head and neck cancer.

7.2.4 Reporting Reproducibility

Open-source data plays an important role in the improvement and


reproducibility of radiomics. The availability of open datasets,
like the RIDER dataset, and public phantoms can aid to under-
stand the effects of different factors on radiomics and could help
to further assess the influence of acquisition settings.
Ensuring consistency and transparency in radiomic studies is
crucial, and it is necessary to provide detailed reporting on pre-
processing steps to enhance reproducibility and repeatability.
Fortunately, recent developments in the field aim to improve the
quality of radiomic studies by providing several guidelines that
facilitate their execution. Initiatives such as the IBSI [20], RQS
(radiomics quality score) [23] or TRIPOD (transparent reporting
of a multivariate prediction for individual prognosis or diagnosis)
[24] are recommended to improve the final quality and reproduc-
ibility of the studies.
Furthermore, to increase the potential for clinically relevant
and valuable radiomic studies, some authors [1] recommend
assessing whether the following questions can be answered affir-
matively before starting a new study:

–– Is there an actual clinical need which could potentially be


answered with the help of radiomics?
–– Is there enough expertise in the research team to ensure high
quality of the study and potential of clinical implementation?
–– Is there access to enough data to support the conclusions with
sufficient power, including external validation datasets?
–– Is it possible to retrieve all other non-imaging data that is
known to be relevant for the research question?
7 Harmonization in the Features Domain 153

–– Is information on the acquisition and reconstruction of the


images available?
–– Are the imaging protocols standardized and if not, is there a
solution to harmonize images or to ensure minimal influence of
varying settings on the modeling?

7.3 Normalization Techniques

Normalization techniques are statistical approaches that are used


to account for variations in image intensity, brightness, and con-
trast. These methods are designed to standardize the features by
making them comparable across different imaging modalities,
scanners, and centers improving the reliability and comparability
of radiomic features, making possible to use them in clinical prac-
tice.
Data normalization methods are crucial for radiomic features,
as they are often characterized by differences in scale, range, and
statistical distributions. Without normalization, features may
exhibit high levels of skewness, which can artificially result in
lower p-values in statistical analysis [25]. Neglecting feature nor-
malization or using inappropriate methods can also lead to indi-
vidual features being over or underrepresented and introduce bias
into developed models.
Normalization techniques are further divided into several sub-
types: statistical normalization, ComBat methods, and normaliza-
tion with deep learning. It is important to note that while image
preprocessing normalization steps are important to reduce
­technical variability across images, additional feature normaliza-
tion steps are still necessary and should not be overlooked [26].

7.3.1 Statistical Normalization

Normalization techniques are used to correct biases and differ-


ences in radiomic features that may be caused by variations in
imaging devices, acquisition protocols, or reconstruction param-
154 J. Lozano-Montoya and A. Jimenez-Pastor

eters, and several studies have specifically evaluated the benefit of


normalization for this purpose. Various methods of statistical nor-
malization can be applied, but some of the most commonly used
include the following:

• Z-score normalization, which scales feature values to have a


mean of 0 and a standard deviation of 1. A variation of this
method is the robust Z-score, which uses median and absolute
deviation from the median instead of mean and standard devia-
tion to account for outliers.
• Min–max normalization, which performs a linear transforma-
tion to scale feature values to a common range of 0–1. This
preserves relationships among the original data but can sup-
press the effect of outliers due to the bounded range.
• Square root or log transformations can be used to decrease the
skewness of distributions. However, these transformations can
only be used for positive values and can sometimes make the
distribution more skewed than the raw data.
• Upper quartile normalization divides each read count by the
75th percentile of read counts in the sample [27].
• Quantile normalization, which transforms the original data to
remove undesirable technical variation by forcing observed
distributions to be the same. This method can work well in
practice but can wipe out important information and artificially
induce features that are not statistically different across sam-
ples [28].
• Whitening normalization using principal component analysis
(PCA), which is based on a linear transformation that converts
a vector of random variables with a known covariance matrix
into a set of new variables whose covariance is the identity
matrix [29]. This technique can make a more substantial nor-
malization of the features but can also exaggerate noise in the
data [30].

It is important to keep in mind that different normalization


methods have their own specific advantages and disadvantages,
and it may be necessary to experiment multiple methods to find
the best approach for a given set of radiomic features.
7 Harmonization in the Features Domain 155

A recent study compared different normalization methods to


standardize the radiomic features extracted from CT images of
non-small cell lung cancer (NSCLC) patients. The study found
that using a z-score normalization resulted in the best prediction
radiomic model with an AUC of 0.789 when compared to min-­
max normalization (0.725 AUC) and PCA (0.785 AUC) [30].
Another study also evaluated the effect of several normaliza-
tion techniques, including no normalization, z-score, robust
z-score, log-transformation, upper quantile, and quantile on pre-
dicting several clinical phenotypes using a machine learning pipe-
line [26]. The results from the correlation analysis showed that all
radiomic features were perfectly correlated with non-normalized
radiomic features when using scaling, z-score, robust z-score, and
upper quartile normalization methods. These methods were found
to help reduce bias and not alter the information. However, log-­
transformation, quantile, and whitening methods showed a poor
correlation value with non-normalized radiomic features.

7.3.2 ComBat

ComBat is a statistical method that was originally developed to


harmonize gene expression arrays and correct “batch effects” in
genomic studies [31] but can also be applied on radiomics to
remove discrepancies introduced by technical differences in the
images. It is a data-driven post-processing technique that employs
empirical Bayes methods to estimate the differences in feature
values due to a batch effect. Moreover, it can provide satisfactory
results even for small datasets depending on the representative-
ness of the samples available for each site [32].
ComBat standardizes radiomic features by centering them to
the overall mean of all samples. This process shifts the data into a
new location that is different from the original centers, conse-
quently, features lose their physical meaning [33]. In cases where
there is high heterogeneity and the number of variables to use
with ComBat would be too high for the number of patients, unsu-
pervised clustering can be used to identify potential labels for har-
monization. One of the main limitations of the ComBat method is
156 J. Lozano-Montoya and A. Jimenez-Pastor

its inability to harmonize new data that comes from a different


source than the data used during the feature transformation phase.
This means that if new data is added to the analysis, ComBat
needs to be reapplied to the entire dataset to ensure that reliability.
A detailed manual was recently provided to explain the correct
application of Combat in multi-site research. The guide illustrates
and clarifies under what conditions Combat can be utilized to
standardize image-based biomarkers [34]:

1. The distributions of the features to be realigned must be simi-


lar except for shift (additive factor) and spread (multiplicative
factor) effects.
2. Any covariates that might explain different distributions at the
two sites must be identified and considered.
3. The different sets of feature values to be realigned must be
independent.

An extensive comparison of ComBat with the previous nor-


malization techniques in the context of radiomics has not yet been
carried out, although previous comparisons between ComBat and
similar techniques for batch effect correction in different fields
indicated the superiority of ComBat [35]. Several modifications
of ComBat have been proposed to improve its performance and to
solve its limitations, however, some variations still require valida-
tion for certain image modalities [36]:

• M-Combat addresses the problem of features losing their phys-


ical meaning after harmonization by allowing the selection of
a reference center to align other centers with no loss of perfor-
mance.
• B-ComBat and BM-ComBat are modifications that use boot-
strapping to improve the accuracy of the estimates, as they
account for the uncertainty in the data. The initial estimates
obtained are resampled a specified number of times (B) with
replacement. The resamples are then fitted to obtain new esti-
mates of the coefficients and finally, they are calculated using
the Monte Carlo method by taking the mean of all of them. A
study reported an increase in the radiomic models’ perfor-
7 Harmonization in the Features Domain 157

mance using these variations compared with standard ComBat


and M-Combat [33].
• Transfer learning ComBat addresses the limitation of ComBat’s
inability to harmonize new unseen data by coupling the method
with a transfer learning technique, allowing the use of previ-
ously learned harmonization transforms on new data [37].
• Nested ComBat provides a sequential radiomic features har-
monization workflow to compensate for multicenter heteroge-
neity caused by multiple batch effects.
• NestedD was introduced to handle bimodal feature distribu-
tions instead of Gaussian distributions and is recommended for
high-dimensional datasets [38].
• GMM ComBat is a modification that uses the Gaussian mix-
ture model split method to handle bimodality coming from
unknown factors, providing an alternative to ComBat for
­confronting potential issues arising from the assumption that
all imaging factors are known [38].
• Longitudinal ComBat expands ComBat’s applicability to the
longitudinal domain by eliminating additive and multiplicative
scanner effects [39].

Figure 7.2 shows an example of the application of ComBat on


two radiomic variables (median gray-level intensity and entropy)
extracted from non-small cell lung cancer (NSLC) lesions to cor-
rect for the manufacturer effect (GE, Canon, Siemens, and
­Philips). In this case, the GE manufacturer was used as a reference
to standardize the other distributions.
A study used ComBat to transform MRI-based radiomic fea-
tures from T1 phantom images to T1-weighted brain tumors [40].
The study found that ComBat eliminated the scanner effect and
increased the number of statistically significant features that could
be used to differentiate between low and intermediate/high-risk
scores. Additionally, a radiomic model based on linear discrimi-
nant analysis was implemented, which achieved a higher Youden
Index when ComBat was used (0.43) compared to when it was not
used (0.12). Another study evaluated its use for harmonizing
radiomic features in a combined PET and MRI radiomic model
158 J. Lozano-Montoya and A. Jimenez-Pastor

Fig. 7.2 Application of ComBat for two radiomic features extracted from
NSCLC lesions to correct for manufacturer’s batch effect: median gray-level
intensity (top) and entropy values (bottom). On the left, original density dis-
tributions and, on the right, density distributions after harmonization with
ComBat

using ADC parametric maps to predict the recurrence in advanced


cervical patients [41]. In this case, ComBat improved the accu-
racy when the model was validated externally from two different
cohorts: 82–85% accuracy without harmonization and 90% after
ComBat. Most radiomic features were significantly affected by
differences in acquisition and reconstruction parameters in CT
scans. In this study [42], the authors reported an improvement in
the performance of the developed radiomic signatures with the
ComBat harmonization, where all the features could be used after
the process.
7 Harmonization in the Features Domain 159

Nevertheless, harmonizing the distribution without paying


attention to individual value and rank is not expected to be benefi-
cial for the generalizability of radiomic signatures. The effective
use of ComBat ideally requires the evaluation of the consistency
of radiomic features after applying ComBat on samples that lack
biological variability, such as phantoms. Then, radiomic features
extracted from patients’ scans obtained with the same imaging
settings can be transformed using the location/scale parameters
determined through the application of ComBat on the phantom
data. Therefore, a framework that guides the use of ComBat in
radiomic analyses has been published, however, this procedure
could be also applied to other feature harmonization methods
[43]. The workflow starts by collecting imaging datasets and
extracting the imaging acquisition and reconstruction parameters.
A phantom is then scanned with the different acquisition and
reconstruction parameters used for acquiring the scans in the
patient’s imaging dataset. Radiomic features are extracted from
the phantom scans and their reproducibility is assessed using the
CCC. Features that obtain a CCC > 0.9 are considered reproduc-
ible and further used for modeling. Finally, to assess the perfor-
mance of the feature harmonization method, it is applied on the
phantom’s scans and the robust features are obtained again
(CCC > 0.9). The combination of the identified stable and harmo-
nizable features should be used for further analysis.
One study applied this framework with ComBat to 13 scans of
a phantom using different imaging protocols and vendors [44]. By
investigating the reproducibility of radiomic features in a pairwise
manner, the study found a wide range of reproducible features,
between 9 and 78. The harmonization did not have a uniform
impact on radiomic features and the number of features that could
be used following harmonization varied widely. Therefore, the
impact of ComBat harmonization should be carefully analyzed
depending on the data under analysis.
In summary, ComBat seems a promising and straightforward
method for standardizing radiomic features, as long as there is a
sufficient number of labels and the sources of variations are iden-
tified.
160 J. Lozano-Montoya and A. Jimenez-Pastor

7.3.3 Deep Learning Approaches

Deep learning has been widely used in image domain harmoniza-


tion, but recent studies have shown that it can also be an effective
alternative for batch effect removal in the feature domain. A study
[45] trained a deep neural network to standardize radiomic and
deep features across different scanner models, acquisition, and
reconstruction settings, using a publicly available texture phan-
tom dataset. The idea behind this approach was to use a neural
network to learn a nonlinear normalization transformation that
reduces intra-scan clustering while maintaining informative and
discriminative features. The generalization to unknown textures
and scans was demonstrated through a series of experiments using
a publicly available phantom CT texture dataset scanned with
various imaging devices and parameters.
Another approach that has been explored is the use of domain
adversarial neural networks (DANNs). DANNs use a label predic-
tor and a domain classifier to optimize the features and make them
discriminative for the main task, but not discriminative between
domains. A study used this method with an iterative update
approach to generate harmonized features of MRI images and
evaluated their performance in segmentation, resulting in a
decreased influence of scanner variation on predictions [46]. The
method was tested on a multi-centric dataset, making it a more
suitable approach for feature harmonization.
Finally, other studies have also achieved good results in reduc-
ing divergence between source and target data feature distribu-
tions in other fields, but these methods tend to be less successful
with medical imaging data. Overall, deep learning has proven to
be a promising alternative for feature harmonization in medical
imaging, but more research is needed to fully understand its
potential and limitations.
7 Harmonization in the Features Domain 161

7.4 Strategies Overview

Radiomics is a powerful tool for characterizing and predicting


diseases. However, multiple factors can influence the feature val-
ues, including scanner and patient variability, image acquisition
and reconstruction settings, and image preprocessing. Table 7.1
illustrates a summary of the technical factor that influence
radiomics stability in the radiomics workflow. Thus, radiomics
harmonization is an important step to ensure robust and stable
features that can enable reproducibility and generalizability in
radiomics modeling.
One common approach to deal with variability is to eliminate
radiomic features that are not robust against these factors. This is
typically done by evaluating the variability of radiomic features
across different scanners and protocols using metrics such as the
ICC and the COV. The challenge is to find the ideal threshold to
select the stable radiomic features because potentially relevant
information could be removed [36]. To ensure that radiomic fea-
tures are reproducible and robust, it is recommended to follow a
standard protocol for image acquisition and feature extraction and
to perform appropriate quality control measures. Figure 7.3 shows
an overview of the strategies described during the chapter to
ensure the harmonization of the radiomic features.
In conclusion, harmonization techniques in the feature domain
are important for ensuring that imaging data is consistent across
different studies. Two studied approaches are deep learning and
statistical methods. While deep learning is well-suited for detect-
ing nonlinear patterns in imaging data, it can be more difficult to
apply than statistical methods and more research is needed in this
field. One statistical method that has been extensively studied is
ComBat, which has been shown to offer better results when com-
pared to other methods. However, it is important to consider the
number of samples that are available for analysis when choosing
an harmonization technique, for example, ComBat requires a
minimum of 20–30 patients per batch, whereas other methods,
such as Z-score or White-Stripe normalization do not have any
restriction on dataset size.
Table 7.1 Factor influencing radiomics stability and posterior reproducibility. Adapted from: J. E. van Timmeren, D. Cester,
162

S. Tanadini-­Lang, H. Alkadhi, y B. Baessler, “Radiomics in medical imaging—“how-to” guide and critical reflection”, Insights Imag-
ing, vol. 11, n.o 1, p. 91, dic. 2020, doi: https://doi.org/10.1186/s13244-­020-­00887-­2. Licensed under a Creative Commons Attribution
4.0 International License https://creativecommons.org/licenses/by/4.0/
Image modality Image acquisition Reconstruction parameters Segmentation Post-processing Feature extraction
MRI • Field strength Reconstructed matrix size Manual 2D Image interpolation Mathematical formula
• Sequence design Reconstruction technique Manual 3D Intensity discretization Package
• Acquired matrix size Semi-automated Normalization
• Field of view 2D
• Slice thickness Semi-automated
• Acceleration techniques 3D
• Vendor Automated 2D
• Contrast timing Automated 3D
• Movement Size of the ROI
CT • Tube voltage Reconstruction matrix
• Milliamperage Slice thickness
• Pitch Reconstruction kernel
• Field of view/pixel spacing Reconstruction technique
• Slice thickness
• Acquisition mode
• Vendor
• Contrast timing
• Movement
PET • Field of view/pixel spacing Reconstruction matrix
• Slice thickness Slice thickness
• Injected activity Reconstruction technique
• Acquisition time Attenuation correction
• Scan timing
• Duty cycle
• Vendor
J. Lozano-Montoya and A. Jimenez-Pastor

• Movement
7 Harmonization in the Features Domain 163

Fig. 7.3 Summary of strategies for harmonization of radiomic features

References
1. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B
(2020) Radiomics in medical imaging—“how-to” guide and critical
reflection. Insights Imaging 11(1):91. https://doi.org/10.1186/s13244-­
020-­00887-­2
2. Mühlberg A et al (2020) The technome—a predictive internal calibration
approach for quantitative imaging biomarker research. Sci Rep
10(1):1103. https://doi.org/10.1038/s41598-­019-­57325-­7
3. van Timmeren JE et al (2016) Test–retest data for radiomics feature sta-
bility analysis: generalizable or study-specific? Tomography 2(4):361–
365. https://doi.org/10.18383/j.tom.2016.00208
4. Kessler LG et al (2015) The emerging science of quantitative imaging
biomarkers terminology and definitions for scientific studies and regula-
tory submissions. Stat Methods Med Res 24(1):9–26. https://doi.
org/10.1177/0962280214537333
5. Park JE, Park SY, Kim HJ, Kim HS (2019) Reproducibility and generaliz-
ability in radiomics modeling: possible strategies in radiologic and statis-
tical perspectives. Korean J Radiol 20(7):1124. https://doi.org/10.3348/
kjr.2018.0070
6. Midya A, Chakraborty J, Gönen M, Do RKG, Simpson AL (2018)
Influence of CT acquisition and reconstruction parameters on radiomic
feature reproducibility. J Med Imaging 5(01):1. https://doi.org/10.1117/1.
JMI.5.1.011020
164 J. Lozano-Montoya and A. Jimenez-Pastor

7. Balagurunathan Y et al (2014) Test–retest reproducibility analysis of lung


CT image features. J Digit Imaging 27(6):805–823. https://doi.
org/10.1007/s10278-­014-­9716-­x
8. Berenguer R et al (2018) Radiomics of CT features may be nonreproduc-
ible and redundant: influence of CT acquisition parameters. Radiology
288(2):407–415. https://doi.org/10.1148/radiol.2018172361
9. Shiri I, Rahmim A, Ghaffarian P, Geramifar P, Abdollahi H, Bitarafan-­
Rajabi A (2017) The impact of image reconstruction settings on 18F-­
FDG PET radiomic features: multi-scanner phantom and patient studies.
Eur Radiol 27(11):4498–4509. https://doi.org/10.1007/s00330-­017-­
4859-­z
10. Vuong D et al (2019) Interchangeability of radiomic features between
[18F]-FDG PET/CT and [18F]-FDG PET/MR. Med Phys 46(4):1677–
1685. https://doi.org/10.1002/mp.13422
11. Lee J et al (2021) Radiomics feature robustness as measured using an
MRI phantom. Sci Rep 11(1):3973. https://doi.org/10.1038/s41598-­021-­
83593-­3
12. Hunter LA et al (2013) High quality machine-robust image features:
identification in nonsmall cell lung cancer computed tomography images:
robust quantitative image features. Med Phys 40(12):121916. https://doi.
org/10.1118/1.4829514
13. Kickingereder P et al (2018) Radiomic subtyping improves disease strat-
ification beyond key molecular, clinical, and standard imaging character-
istics in patients with glioblastoma. Neuro Oncol 20(6):848–857. https://
doi.org/10.1093/neuonc/nox188
14. Peerlings J et al (2019) Stability of radiomics features in apparent diffu-
sion coefficient maps from a multi-centre test-retest trial. Sci Rep
9(1):4800. https://doi.org/10.1038/s41598-­019-­41344-­5
15. Parmar C et al (2014) Robust radiomics feature quantification using semi-
automatic volumetric segmentation. PLoS One 9(7):e102107. https://doi.
org/10.1371/journal.pone.0102107
16. Poirot MG et al (2022) Robustness of radiomics to variations in segmen-
tation methods in multimodal brain MRI. Sci Rep 12(1):16712. https://
doi.org/10.1038/s41598-­022-­20703-­9
17. Pavic M et al (2018) Influence of inter-observer delineation variability on
radiomics stability in different tumor sites. Acta Oncol 57(8):1070–1074.
https://doi.org/10.1080/0284186X.2018.1445283
18. Gillies RJ, Kinahan PE, Hricak H (2016) Radiomics: images are more
than pictures, they are data. Radiology 278(2):563–577. https://doi.
org/10.1148/radiol.2015151169
19. Li Q et al (2017) A fully-automatic multiparametric radiomics model:
towards reproducible and prognostic imaging signature for prediction of
overall survival in glioblastoma multiforme. Sci Rep 7(1):14331. https://
doi.org/10.1038/s41598-­017-­14753-­7
7 Harmonization in the Features Domain 165

20. Zwanenburg A, Leger S, Vallières M, Löck S (2020) Image biomarker


standardisation initiative. Radiology 295(2):328–338. https://doi.
org/10.1148/radiol.2020191145
21. van Griethuysen JJM et al (2017) Computational radiomics system to
decode the radiographic phenotype. Cancer Res 77(21):e104–e107.
https://doi.org/10.1158/0008-­5472.CAN-­17-­0339
22. Korte JC et al (2021) Radiomics feature stability of open-source software
evaluated on apparent diffusion coefficient maps in head and neck cancer.
Sci Rep 11(1):17633. https://doi.org/10.1038/s41598-­021-­96600-­4
23. Lambin P et al (2017) Radiomics: the bridge between medical imaging
and personalized medicine. Nat Rev Clin Oncol 14(12):749–762. https://
doi.org/10.1038/nrclinonc.2017.141
24. Collins GS, Reitsma JB, Altman DG, Moons KGM (2015) Transparent
reporting of a multivariable prediction model for individual prognosis or
diagnosis (TRIPOD): the TRIPOD statement. BMJ 350(4):g7594–g7594.
https://doi.org/10.1136/bmj.g7594
25. Parmar C, Barry JD, Hosny A, Quackenbush J, Aerts HJWL (2018) Data
analysis strategies in medical imaging. Clin Cancer Res 24(15):3492–
3499. https://doi.org/10.1158/1078-­0432.CCR-­18-­0385
26. Castaldo R, Pane K, Nicolai E, Salvatore M, Franzese M (2020) The
impact of normalization approaches to automatically detect radioge-
nomic phenotypes characterizing breast cancer receptors status. Cancers
12(2):518. https://doi.org/10.3390/cancers12020518
27. Bullard JH, Purdom E, Hansen KD, Dudoit S (2010) Evaluation of statis-
tical methods for normalization and differential expression in mRNA-Seq
experiments. BMC Bioinformatics 11(1):94. https://doi.
org/10.1186/1471-­2105-­11-­94
28. Hicks SC, Irizarry RA (2014) When to use quantile normalization?
Genomics, preprint. https://doi.org/10.1101/012203
29. Kessy A, Lewin A, Strimmer K (2018) Optimal whitening and decorrela-
tion. Am Stat 72(4):309–314. https://doi.org/10.1080/00031305.2016.12
77159
30. Haga A et al (2019) Standardization of imaging features for radiomics
analysis. J Med Invest 66(1.2):35–37. https://doi.org/10.2152/jmi.66.35
31. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in micro-
array expression data using empirical Bayes methods. Biostatistics
8(1):118–127. https://doi.org/10.1093/biostatistics/kxj037
32. Goh WWB, Wang W, Wong L (2017) Why batch effects matter in omics
data, and how to avoid them. Trends Biotechnol 35(6):498–507. https://
doi.org/10.1016/j.tibtech.2017.02.012
33. Da-ano R et al (2020) Performance comparison of modified ComBat for
harmonization of radiomic features for multicenter studies. Sci Rep
10(1):10248. https://doi.org/10.1038/s41598-­020-­66110-­w
166 J. Lozano-Montoya and A. Jimenez-Pastor

34. Orlhac F et al (2022) A guide to ComBat harmonization of imaging bio-


markers in multicenter studies. J Nucl Med 63(2):172–179. https://doi.
org/10.2967/jnumed.121.262464
35. Papadimitroulas P et al (2021) Artificial intelligence: deep learning in
oncological radiomics and challenges of interpretability and data harmo-
nization. Phys Med 83:108–121. https://doi.org/10.1016/j.
ejmp.2021.03.009
36. Stamoulou E et al (2022) Harmonization strategies in multicenter MRI-­
based radiomics. J Imaging 8(11):303. https://doi.org/10.3390/jimag-
ing8110303
37. Da-ano R et al (2021) A transfer learning approach to facilitate ComBat-­
based harmonization of multicentre radiomic features in new datasets.
PLoS One 16(7):e0253653. https://doi.org/10.1371/journal.
pone.0253653
38. Horng H et al (2022) Generalized ComBat harmonization methods for
radiomic features with multi-modal distributions and multiple batch
effects. Sci Rep 12(1):4493. https://doi.org/10.1038/s41598-­022-­08412-­9
39. Beer JC et al (2020) Longitudinal ComBat: a method for harmonizing
longitudinal multi-scanner imaging data. NeuroImage 220:117129.
https://doi.org/10.1016/j.neuroimage.2020.117129
40. Orlhac F et al (2021) How can we combat multicenter variability in MR
radiomics? Validation of a correction procedure. Eur Radiol 31(4):2272–
2280. https://doi.org/10.1007/s00330-­020-­07284-­9
41. Lucia F et al (2019) External validation of a combined PET and MRI
radiomics model for prediction of recurrence in cervical cancer patients
treated with chemoradiotherapy. Eur J Nucl Med Mol Imaging 46(4):864–
877. https://doi.org/10.1007/s00259-­018-­4231-­9
42. Orlhac F, Frouin F, Nioche C, Ayache N, Buvat I (2019) Validation of a
method to compensate multicenter effects affecting CT radiomics.
Radiology 291(1):53–59. https://doi.org/10.1148/radiol.2019182023
43. Ibrahim A et al (2021) Radiomics for precision medicine: current chal-
lenges, future prospects, and the proposal of a new framework. Methods
188:20–29. https://doi.org/10.1016/j.ymeth.2020.05.022
44. Ibrahim A et al (2021) The application of a workflow integrating the vari-
able reproducibility and harmonizability of radiomic features on a phan-
tom dataset. PLoS One 16(5):e0251147. https://doi.org/10.1371/journal.
pone.0251147
45. Andrearczyk V, Depeursinge A, Müller H (2019) Neural network training
for cross-protocol radiomic feature standardization in computed tomog-
raphy. J Med Imaging 6(02):1. https://doi.org/10.1117/1.JMI.6.2.024008
46. Dinsdale NK, Jenkinson M, Namburete AIL (2021) Deep learning-
based unlearning of dataset bias for MRI harmonisation and confound
removal. NeuroImage 228:117689. https://doi.org/10.1016/j.neuroim-
age.2020.117689

You might also like