Deep Learning-Based MR-to-CT Synthesis: The Influence of Varying Gradient Echo-Based MR Images As Input Channels

Seminar Report On
Deep learning–based MR-to-CT

synthesis: The influence of varying
gradient echo–based MR images as
input channels
Submitted in partial fulfillment of the requirements for
the award of the degree of
Bachelor of Technology
in
Computer Science & Engineering
Submitted by
DELLA DOMINIC
RET17CS072
Under the guidance of
Ms. MEHARBAN M.S.
Department of Computer Science & Engineering

Rajagiri School of Engineering and Technology
Rajagiri Valley, Kakkanad, Kochi, 682039
November 2020
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
RAJAGIRI SCHOOL OF ENGINEERING AND TECHNOLOGY
RAJAGIRI VALLEY, KAKKANAD, KOCHI, 682039
CERTIFICATE
This is to certify that report entitled ”Deep learning–based MR-to-CT

synthesis: The influence of varying gradient echo–based MR images
as input channels ” is a bonafide report of the seminar presented during
seventh semester by Della Dominic, RET17CS072, to the APJ Abdul
Kalam Technological University in partial fulfillment of the requirements for
the award of the degree of Bachelor of Technology (B.Tech) in Computer
Science & Engineering during the academic year 2020-2021.
Dr. Dhanya P M Mr.Harikrishnan M Ms. Meharban MS

Head Of Department Seminar Coordinator Seminar Guide
Dept.of CSE Asst.Professor Asst.Professor
RSET Dept.of CSE Dept.of CSE
RSET RSET
ACKNOWLEDGEMENT
I wish to express my sincere gratitude towards Dr.P.S.Sreejith,Princi

pal of RSET and Dr.Dhanya P.M, Head of Department of Computer
Science & Engineering for providing me the opportunity to present the
seminar ”Deep learning–based MR-to-CT synthesis: The influence of
varying gradient echo–based MR images as input channels”.
I am highly indebted to my seminar coordinators, Mr.Harikrishnan
M, Assistant Professor, Department of CSE, Ms.Meenu Mathew,
Assistant Professor, Department of CSE and Dr.Sminu Izudheen,
Associate Professor, Department of CSE for their valuable support.
It is indeed my pleasure and a moment of satisfaction for me to
express my sincere thanks to my seminar guide Ms. Meharban MS, for
her patience and all priceless advices and for all the wisdom she has
shared with me.
Last but not the least,I would like to express my sincere gratitude
towards all other teachers and friends for their continuous support and
constructive ideas.
Della Dominic
i
ABSTRACT
Magnetic Resonance Imaging and Computed tomography are two

well known modalities in medical imaging. MRI provides clear
soft tissue imaging and CT is an imaging modality in medicine
that uses combinations of many X-ray measurements from different
angles to produce cross-sectional images of specific areas and is
used for detecting bone and joint problems, conditions like cancer,
heart disease and internal injuries.Unfortunately, CT imaging exposes
patients to potentially harmful effects and in order to circumvent
these problems,and additionally save the extra cost and time for
multiple imaging modalities,many different techniques to generate a CT
surrogate, referred to as synthetic CT (sCT), have been developed in
recent years.
Recent studies have shown that deep learning–based models generate
equivalent or better sCTs than more conventional methods.But most
studies using deep learning exploit only single contrasts of MR
images.As the performance of deep learning models is influenced by the
data by which they are trained and evaluated, it was hypothesized that
combinations of MR images obtained from gradient-echo experiments
could provide the neural network with additional information about
tissue composition including proton density, water and fat fractions, as
well as magnetic properties.This paper attempts to study the influence
of gradient echo–based contrasts as input channels to a 3D patch-based
neural network trained for synthetic CT (sCT) generation in canine and
human populations.
ii
Contents
Acknowledgement i
Abstract ii
List of Figures v
List of Tables vi
1 Objective and Motivation 1

1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Introduction 2
2.1 MRI Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Influence of different MR contrasts as input channel . . . . . . . 3
3 Methods 4
3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Input configurations . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Contraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Architechture 7
4.1 U-Net Architechure . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.1 Different parts of the U-Net Architecture . . . . . . . . . 7
4.1.2 Advantages of U-Net . . . . . . . . . . . . . . . . . . . . 9
4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Replications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Image Preprocessing 11
5.1 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6 Evaluation 13
6.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.1.1 Mean Error (ME) . . . . . . . . . . . . . . . . . . . . . . 13
6.1.2 Mean Absolute Error (MAE) . . . . . . . . . . . . . . . 13
6.1.3 Dice Similarity Coefficient (DSC) . . . . . . . . . . . . . 13
iii
6.1.4 Surface Distance . . . . . . . . . . . . . . . . . . . . . . 14
6.1.5 peak Signal-to-noise ratio (PSNR) . . . . . . . . . . . . . 14
6.2 Repeatability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . 14
7 Results 15
7.1 Per subject results . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.2 Per Model results . . . . . . . . . . . . . . . . . . . . . . . . . . 16
7.3 Bone reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 Discussions 19
9 Applications and Future Scope 20

9.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
9.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
10 Conclusion 21
References 22
Appendix 23
CO–PO and CO–PSO Mapping 24
iv
List of Figures
1 Different MR contrasts . . . . . . . . . . . . . . . . . . . . . . . 2
2 Transverse slices of the canine and human data sets. . . . . . . . 5
3 6 different input configurations. . . . . . . . . . . . . . . . . . . 6
4 U-Net Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Variation of mean absolute error (MAE) per subject . . . . . . . 16
6 Computed tomography and synthesized CTs (sCTs) generated
per input configurations for 3 subjects. . . . . . . . . . . . . . 18
v
List of Tables
1 Performance obtained for each model per data set -
MAEbody (HU) , MEbody (HU), MAEbone (HU) . . . . . . . . . . . 17
2 Performance obtained for each model per data set -DSCbone ,
Surface Distance(mm) , PSNR(dB) . . . . . . . . . . . . . . . . 17
3 Repeatability obtained for each model per data set, averaged
across replicates and subjects Population Input R . . . . . . . . 18
vi
The influence of varying gradient echo–based MR images as input channels 1
1 Objective and Motivation
1.1 Objective
The prime objective is to study the influence of gradient echo–based contrasts
as input channels to a 3D patch-based neural network trained for synthetic CT
(sCT) generation in canine and human populations.
1.2 Motivation
Computed tomography is an imaging modality in medicine that uses
combinations of many X-ray measurements from different angles to produce
cross-sectional images of specific areas. Unfortunately, CT imaging exposes
patients to potentially harmful sources of ionizing radiation. The probability
for absorbed x-rays to induce cancer or heritable mutations, especially in
offspring, even though small, does exist. There also exists many side effects
caused due to reactions to any dye used.
In order to circumvent these problems, and additionally save the extra
cost and time for multiple imaging modalities, we are coerced to look for
alternative solutions. In Deep Learning approaches,combinations of MR
images obtained from gradient-echo experiments could provide the neural
network with additional information about tissue composition including proton
density, water and fat fractions, as well as magnetic properties such as
relaxation constants (T1, T) and susceptibility.

2 Introduction
2.1 MRI Sequence

An MRI[3] sequence is a number of radiofrequency pulses and gradients
that result in a set of images with a particular appearance. Different MRI
sequences are proton density (PD) weighted, T1 weighted, T2 weighted,
diffusion weighted, flow sensitive etc.
Each of these different MR contrasts have different properties and practical
applications.T1 weighted sequences are part of almost all MRI protocols and
are best thought of as the most ’anatomical’ of images, resulting in images that
most closely approximate the appearances of tissues macroscopically, although
even this is a gross simplification.The dominant signal intensities of different
tissues are:
• fluid (e.g. urine, CSF): low signal intensity (black)
• muscle: intermediate signal intensity (grey)
• fat: high signal intensity (white)
• brain
– grey matter: intermediate signal intensity (grey)

– white matter: hyperintense compared to grey matter (white-ish)
Fig. 1: Different MR contrasts

[4]

T2 weighted sequences are part of almost all MRI protocols. Without

modification the dominant signal intensities of different tissues are:
• fluid (e.g. urine, CSF): high signal intensity (white)
• muscle: intermediate signal intensity (grey)
• fat: high signal intensity (white)
• brain
– grey matter: intermediate signal intensity (grey)

– white matter: hypointense compared to grey matter (dark-ish)
2.2 Influence of different MR contrasts as input channel

Many different techniques to generate a CT surrogate, referred to as synthetic
CT (sCT), have been developed in recent years. A recent study has also been
proposed in the lower arm for orthopedic purposes. These techniques can
roughly be categorized as atlas-based methods or voxel-based methods, which
include deep learning methods. They use different images as inputs, including
acquired T1-weighted, T2-weighted, ultrashort TEor zero-TE (ZTE) images,
reconstructed Dixon images, or combinations of these.
In parallel to the use of specialized and combined MR images, processing
techniques have also expanded to involve statistical and machine learning
models, with recent advances in deep learning. Recent studies have shown
that deep learning–based models generate equivalent or better sCTs than
more conventional methods.Recently, with incredible progress in deep learning
techniques in computer science, GANs or Generative Advarsarial Networks,
seem to be a promising path towards finding a global optimum in training
neural networks.GANs have proved significant in the task of image to image
translation
The performance of deep learning models is influenced by the data by
which they are trained and evaluated.It was hypothesized that combinations
of MR images obtained from gradient-echo experiments could provide the
neural network with additional information about tissue composition including
proton density, water and fat fractions, as well as magnetic properties such as
relaxation constants (T1, T2) and susceptibility. These properties may be
particularly useful for the task of mapping MR data into HU values. Hence
this study.

3 Methods
Magnetic resonance images and CT scans of human and canine pelvic regions
were acquired and paired using nonrigid registration. Magnitude MR images
and Dixon reconstructed water, fat, in-phase and opposed-phase images were
obtained from a single T1-weighted multi-echo gradient-echo acquisition. From
this set, 6 input configurations were defined, each containing 1 to 4 MR images
regarded as input channels. For each configuration, a UNet-derived[7] deep
learning model was trained for synthetic CT generation.[12] Reconstructed
Hounsfield unit maps were evaluated with peak SNR, mean absolute error,
and mean error. Dice similarity coefficient and surface distance maps assessed
the geometric fidelity of bones. Repeatability was estimated by replicating the
training up to 10 times.
3.1 Data Collection

The reported study was performed on images of the pelvic region of canine (ex
vivo) and human (in vivo) populations.
The canine data set consisted of 18 domestic dogs (5 males and 13
females) that deceased of natural causes. The MR images were acquired in
a fixed supine position in a 1.5T scanner (Ingenia; Philips Healthcare, Best,
Netherlands) using a 3D RF spoiled T1-weighted multiple gradient-echo (T1w-
MGE) sequence.
The human data set consisted of 27 male patients diagnosed with prostate
cancer who underwent intensity- modulated radiotherapy. The T1w-MGE MR
images were acquired in a head-first supine position on a flat table in a 3T
scanner (Ingenia; Philips Healthcare).
The CT scans (Brilliance CT Big Bore; Philips Healthcare) were acquired
with a slice spacing of 0.7 mm and a pixel spacing ranging from 0.3 mm to
0.7 mm in the canine data set. Tube current ranged from 30 mA to 66 mA.
For the human subjects, the CT slice spacing was 3 mm, and the pixel spacing
ranged from 0.8 mm to 1.12 mm. Tube current ranged from 62 mA to 154
mA. For both data sets, tube voltage was 120 kV.
In addition, the in-phase and opposed-phase images were acquired at
different TEs on both data sets.By working on these 2 data sets, the robustness
of the findings to variations in acquisition parameters and physiology were
confirmed.

Fig. 2: Transverse slices of the canine and human data sets (paired in-phase
MR images and CT scans). As compared with the human data set, the canine
data set showed substantial inter-subject variability in morphology (e.g., shape,
size).
[4]
3.2 Input configurations

Six input configurations were investigated in the study, each containing 1, 2,
or 4 channels. Here a channel refers to an MR image obtained from the single
T1w-MGE acquisition. In particular, the first 2 acquired echoes (aIP and aOP)
were investigated along with the Dixon reconstructed[11] IP, OP, W, and F
images. These images were combined into 6 input configurations called aIP,
aOP, Dual, IPOP, WF, and Dixon[11], which are defined in Figure 3.
Because each image within an input was considered as a channel, input
configurations containing more than 1 image are referred to as multichannel as
opposed to single-channel input configurations. For our statistical comparison,
we preferred the use of the aIP configuration as a reference, because it is a
single-channel configuration with a higher correlation between MR intensities
and CT HU than the aOP configuration.

Fig. 3: Description of how MR images from the T1-weighted multiple gradient-

echo (T1w-MGE) were combined to form all 6 input configurations.
[4]
3.3 Contraints
To perform a fair comparative study, a single network architecture was used
to process the heterogeneous data sets and the variable input configurations.
Notably, the choices defining the neural network architecture and its
hyperparameters were constrained by 2 requirements:
• The neural network needed to handle inputs containing different numbers

of images without increasing the number of trainable parameters (fixed
complexity).
• To make repeatable predictions in order to identify input-related changes

easily in the reconstructed sCTs.

4 Architechture
To generate sCT images from MR inputs, a patch- based convolutional
neural network, a 3D extension of the widely used U-Net, was used[6].This
architecture took as input 4D MR images with 3 spatial dimensions and a
channel dimension. The size of the patches was C × 24 × 24 × 24 voxels,
where C is the number of channels in the input configuration.
4.1 U-Net Architechure

The U-Net [7] architecture is built upon the Fully Convolutional Network.
Fully Convolutional Networks (FCNs) are built only from locally connected
layers, such as convolution, pooling and upsampling and does not contain
any dense layers as in traditional CNNs.It contains 1x1 convolutions that
perform the task of fully connected layers.U-Net[7] is build by extending this
architecture such that it works with very few training images and yields more
precise segmentations.
4.1.1 Different parts of the U-Net Architecture
U-Net architecture is separated in 3 parts:
• The contracting or downsampling path.
• Bottleneck.
• The expanding or upsampling path.
The contracting or downsampling path: The contracting path is

composed of 4 blocks. Each block is composed of
• 3x3 Convolution Layer + activation function (with batch normalization)
• 3x3 Convolution Layer + activation function (with batch normalization)
• 2x2 Max Pooling
Note that the number of feature maps doubles at each pooling, starting
with 64 feature maps for the first block, 128 for the second, and so on. The
purpose of this contracting path is to capture the context of the input image
in order to be able to do segmentation. This coarse contextual information
will then be transfered to the upsampling path by means of skip connections.

Bottleneck: This part of the network is between the contracting and

expanding paths. The bottleneck is built from simply 2 convolutional layers
(with batch normalization), with dropout.
The expanding or upsampling path:
The expanding path is also composed of 4 blocks. Each of these blocks is
composed of
• Deconvolution layer with stride 2.
• Concatenation with the corresponding cropped feature map from the

contracting path.
• 3x3 Convolution layer + activation function (with batch normalization).
• 3x3 Convolution layer + activation function (with batch normalization).
The purpose of this expanding path is to enable precise localization

combined with contextual information from the contracting path.
Fig. 4: U-Net Architecture

[7]

4.1.2 Advantages of U-Net
U-Net has various advanatages :
• The U-Net combines the location information from the downsampling

path with the contextual information in the upsampling path to finally
obtain a general information combining localisation and context, which
is necessary to predict a good segmentation map.
• No dense layer, so images of different sizes can be used as input (since

the only parameters to learn on convolution layers are the kernel, and
the size of the kernel is independent from input image’ size).
• The use of massive data augmentation is important in domains like

biomedical segmentation, since the number of annotated samples is
usually limited.
4.2 Training
An experiment consisted of the training of a neural network with 1 input
configuration. The canine and human data sets being distinct, models were
trained independently on each population. For each data set, a 3-fold cross-
validation procedure was applied to synthetize sCTs.The models trained on 1
data set were only evaluated on that data set.
Models were trained with a Nadam optimizer whose objective was to
minimize the L1 loss between the CT and the sCT. The learning rate was
constant and set at 104. With the L1 loss function, the optimization process
was robust to outliers produced by noise, artifacts, or by the imperfect
matching between MR and CT. It also resulted in sharper images than the L2
norm. Patches used to perform an optimization step were randomly extracted
from the MR images but were balanced between soft tissues and bone by using
a weighted probability map that resulted in an equal sampling of bone and
nonbone voxels. Because 6 input configurations were tested, 6 independent
models were optimized per fold.

4.3 Replications
Because the training of the neural network contained random factors, the
training of each model was replicated multiple times. This evaluated the
repeatability of sCT generation and corroborated the statistical significance
of the findings. Experiments were repeated 5 times on the human data set
and 10 times on the canine data set, because the latter presented a larger
anatomical variability.

5 Image Preprocessing
Data processing is the conversion of data into usable and desired form.Here it
refers to image processing, which is a method to perform some operations on an
image, in order to get an enhanced image or to extract some useful information
from it.The image Preprocessing performed here is Image Registration using
Elastix registration toolbox[9]
5.1 Image Registration

Image registration[1] is a process of mapping one image onto another image
or similar object by using a perfect transformation. Registration aims to fuse
the data from two or more images. When two images are taken one of them
is referred as a reference image called original image which is kept untouched
and other is called as a sensed image or template image and is employed to
register the reference image.
Medical image registration[9] is an important task in medical image
processing. It refers to the process of aligning data sets, from different
modalities (e.g., magnetic resonance and computed tomography), different
time points and different subjects.
Here, the CT scans were registered to the magnitude of the aIP image using
the Elastix registration toolbox[9], to create a voxel-wise MRI-CT matching
for training and evaluation. The registration was a composition of a rigid
Euler transform and a nonrigid B-spline transform,[8] both optimized with an
adaptive stochastic gradient descent procedure with mutual information as the
similarity metric. A rigidity penalty was applied to the entire volume during
deformable registration. In addition, CT scans were resampled to match the
resolution of MR images using third-order B-spline interpolation.
To facilitate the training of each model, a per-subject linear normalization
[10]was applied on MR images and CT scans. The normalization mapped
intensities to [1; 1] using
(I − shif t)
Inorm = ∗2−I (1)
scale
For the MR image, the shift and scale were derived from minimal and
maximum intensities. For the CT, these values were constant to preserve HU
quantitative nature and were set to 1000 for the shift and 4000 for the scale.

5.2 Masking
To include only relevant anatomy and to perform a tissue-specific evaluation,
binary masks were automatically derived from the MR in-phase images and CT
scans by application of thresholding and mask filling. For both modalities, the
average intensity of the image was used as a threshold value. The intersection
of MR and CT body masks isolated the volume of interest from the background.
Bone voxels in the CT scan whose intensity was greater than 200 HU and that
were within the volume of interest were defined as bone.

6 Evaluation
Model evaluation aims to estimate the generalization accuracy of a model
on future or unseen data.The similarity between the CT and sCT were
evaluated[2] in this study.
6.1 Metrics
The similarity between the CT and sCT was assessed using 5 metrics:
• Mean Error (ME)
• Mean Absolute Error (MAE)
• Dice Similarity Coefficient (DSC)
• Surface Distance
• peak Signal-to-noise ratio (PSNR)
6.1.1 Mean Error (ME)
The ME[5] is voxel-wise differences commonly reported in research for sCT

generation and reflecting the sCT fidelity in radiodensity reconstruction, which
is particularly useful for radiotherapy planning and PET/MR reconstruction.
It was computed on the entire body contour (MEbody ).
6.1.2 Mean Absolute Error (MAE)
The MAE is also voxel-wise differences commonly reported in research for sCT
generation and reflecting the sCT fidelity in radiodensity reconstruction, which
is particularly useful for radiotherapy planning and PET/MR reconstruction.
It was computed on the entire body contour (MAEbody ). and exclusively on
the bone (MAEbone ) using the aforementioned masks.
6.1.3 Dice Similarity Coefficient (DSC)
To estimate the degree of misclassification and the geometric integrity of the

bone, DSC was computed with regard to bone.

6.1.4 Surface Distance
surface distance maps were computed to evaluate local surface dissimilarities by

measuring the bilateral distance between 2 surfaces. Bone segmentations used
to compute the surface distance maps were obtained by thresholding the CT
and sCT at 200 HU and by automatically excluding nonosseous, unconnected
components.
6.1.5 peak Signal-to-noise ratio (PSNR)
The PSNR approximates human perception of reconstruction quality and was

defined as
40952
P SN R = 10 log10 ( 1 PN 2
) (2)
N i=1 (ICT (i) − IsCT (i))
6.2 Repeatability
The repeatability per input configuration was assessed by calculating the
standard deviation of MAEbody (r,s) averaged across subjects.
6.3 Statistical analysis

The significance of the differences observed was tested between sCTs using
a repeated-measure analysis of variance. Statistical tests were performed on
combined results from both data sets using, for each subject, the MAEbody
averaged across replicates.For all statistical tests, P ¡ .05 was considered to be
statistically significant.

7 Results
In the human data set, 24 of 27 subjects met the inclusion criteria of the
study. All dogs were eligible except for 1 (17 of 18), whose MR acquisition
protocol did not follow the study design format. The training set in each
cross- validation fold contained 16 subjects for the human models and 11 for
the canine models. The remainder of this section presents the results of all 3
folds of the cross-validation for both data sets.
Results were inferred in two different ways:
• Per subject results
• Per Model results
7.1 Per subject results

Figure 5 presents the average MAEbody across replicates obtained for each
model. From this figure, it can be observed that :
• Within single-channel models, aIP-based models always outperformed

aOP-based models.
• All multichannel models but WF were equivalent to or outperformed

single-channel models.
• Differences were small among multichannel models.
These behaviors were observed in all subjects from both data sets.

Fig. 5: Variation of mean absolute error (MAE) per subject averaged across
the repeated experiments for each input configuration
[4]
7.2 Per Model results

Jointly performed on the canine and human data sets, the repeated-measure
analysis of variance demonstrated a significant difference between models,
which was categorized into 3 levels using post hoc t-tests.
• First, within single-channel models, MAEbody RS (aIP), the MAEbody

averaged across replicates and subjects for the aIP input configuration,
was lower than MAEbodyRS (aOP). by 5.5 HU, which renders the aIP
input configuration favorable over aOP .
• Second, aIP-based models, which have the lowest MAEbody RS of single-

channel models, were always outperformed by multichannel models with
differences of up to 15 percent for the canine population and almost 6
percent for the human population.
• Third, within multichannel input configurations, IPOP, WF, and Dual

models obtained similar results but were outperformed by Dixon models
by up to 1.3 HU difference.

• Apart from aOP, all input configurations achieved a repeatability under

1 HU for the humans and under 2 HU for the canines, as given in Table
3. This high repeatability corroborates the statistical significance of the
aforementioned differences.
Population Input MAEbody (HU) MEbody (HU) MAEbone (HU)

Human aIP 34.1 ± 7.9 1.2 ± 4.6 123 ± 54
Human aOP 40.4 ± 6.7 0.0 ± 9.2 158 ± 42
Human Dual 32.7 ± 8.3 2.0 ± 4.3 117 ± 56
Human IPOP 32.7 ± 8.1 1.4 ± 4.6 116 ± 56
Human WF 33.6 ± 9.2 1.2 ± 5.1 120 ± 58
Human Dixon 32.1 ± 8.3 1.5 ± 4.9 114 ± 56
Canine aIP 42.2 ± 8.1 0.9 ±14.1 144 ± 42
Canine aOP 47.2 ± 9.1 3.8 ±16.3 169 ± 37
Canine Dual 37.1 ± 4.3 0.4 ± 7.3 132 ± 36
Canine IPOP 37.2 ± 4.1 0.1 ± 6.8 134 ± 36
Canine WF 36.9 ± 4.1 0.6 ± 6.6 141 ± 34
Canine Dixon 35.8 ± 4.2 1.4 ± 6.8 131 ± 34
Table 1: Performance obtained for each model per data set - MAEbody (HU) ,
MEbody (HU), MAEbone (HU)
[4]
Population Input DSCbone Surface Distance(mm) PSNR(dB)

Human aIP 0.81 ± 0.11 0.45 ± 0.10 36.1 ± 2.3
Human aOP 0.78 ± 0.10 0.49 ± 0.10 34.7 ± 1.7
Human Dual 0.83 ± 0.11 0.42 ± 0.05 36.4 ± 2.4
Human IPOP 0.83 ± 0.11 0.42 ± 0.05 36.4 ± 2.4
Human WF 0.82 ± 0.11 0.42 ± 0.16 35.9 ± 2.5
Human Dixon 0.83 ± 0.11 0.40 ± 0.03 36.5 ± 2.5
Canine aIP 0.91 ± 0.03 0.38 ± 0.14 35.1 ± 1.6
Canine aOP 0.89 ± 0.04 0.57 ± 0.28 34.1 ± 1.5
Canine Dual 0.92 ± 0.03 0.38 ± 0.14 35.9 ± 1.5
Canine IPOP 0.92 ± 0.03 0.37 ± 0.14 35.8 ± 1.5
Canine WF 0.92 ± 0.03 0.37 ± 0.14 35.4 ± 1.7
Canine Dixon 0.92 ± 0.04 0.38 ± 0.14 36.1 ± 1.7
Table 2: Performance obtained for each model per data set -DSCbone , Surface
Distance(mm) , PSNR(dB)
[4]

Population Input Repeatability(HU)

Human aIP 0.6 ± 0.3
Human aOP 1.6 ± 1.3
Human Dual 0.7 ± 0.3
Human IPOP 1.0 ± 0.6
Human WF 0.9 ± 0.9
Human Dixon 0.7 ± 0.3
Canine aIP 1.9 ± 2.0
Canine aOP 2.1 ± 1.5
Canine Dual 1.1 ± 0.8
Canine IPOP 1.2 ± 1.3
Canine WF 0.8 ± 0.5
Canine Dixon 0.7 ± 0.5
Table 3: Repeatability obtained for each model per data set, averaged across
replicates and subjects Population Input R
[4]
7.3 Bone reconstruction

Regardless of the data set and input configuration, MAEbone was almost
4 times higher than MAEbody Figure 6 shows the sCTs obtained by each
model for 3 subjects. Although reflected by global metrics, the qualitative
differences between sCTs were primarily local and especially observed in
osseous structures such as the baculum or vertebrae.
Fig. 6: Computed tomography and synthesized CTs (sCTs) generated per

input configurations for 3 subjects.
[4]

8 Discussions
This study showed that the choice of MR images within an input influenced
sCT generation models in 3 ways.
• First, the aIP input configuration outperformed the aOP configuration.

This is most likely related to the lower degree of correlation between
MR and CT intensities in OP images.The OP images present destructive
interferences between water and fat protons, resulting in signal voids
in regions at the interface between fat and muscle. These interferences
are not present in IP images, rendering IP images favorable for sCT
generation. Another aspect that may have played a role in the
performance difference observed between aIP and aOP images is their
amount of T2 weighting.
• Second, presenting multiple channels as inputs to the model improved

sCT generation as compared with single-channel models. In particular,
the WF configurations outperformed the in-phase channel despite their
close linear relation. This result suggests that explicit water–fat
decomposition is favorable and important to the model.
• Third, within multichannel models, the Dixon input configuration

outperformed the others despite containing interdependent MR images.
Presumably, explicitly providing this additional information simplified
the learning task, improving processes such as soft-tissue discrimination

9 Applications and Future Scope
9.1 Applications
This study of influence of different MR contrasts input configurations in MR
to CT Translation has provided various relevant information that has varying
applications like :
• In Medical or clinical field

CT imaging exposes patients to potentially harmful sources of ionizing
radiation. The probability for absorbed x-rays to induce cancer or
heritable mutations, especially in offspring, even though small, does exist.
There also exists many side effects caused due to reactions to any dye
used. In order to circumvent these problems, and additionally save the
extra cost and time for multiple imaging modalities, Generating synthetic
CTs through the above proposed method is the main application and
objective of this study.
Also the above proposed method can be extended to other modalities
and has large importance in the field of Medical imaging and Radiology.
• Further Research
Various new architectures and methodologies are constantly being
developed in the field of medical Image Translation and the inferences
from this study could be used to improve upon the model build.
9.2 Future Scope

The systematic and statistically significant improvement that was demon-
strated in this study motivates the research of an optimal MR contrast or
combination of MR contrasts for a given task.

10 Conclusion
The study of influence of gradient echo–based contrasts as input to deep
learning–based sCT generation models in canine and human populations was
conducted. Two parameters were found to influence the performance and
repeatability of sCT generation.
• First, the TE-related water–fat interference of single MR images affected

the performance of a model. Overall, in-phase images outperformed
opposed-phase images because of their higher bone specificity.
• Second, the use of multiple related MR images combined in the input

as channels improved the performance and robustness of a model. In
particular, the Dixon input configuration showed the best results in terms
of performance and repeatability, although it contained interdependent
MR images
MRI and CT are two very relevant modalties in medical imaging and MR
to CT Translation is receiving increasing interest among researchers, with
increase in advancements in emerging technologies and increased application
of these technologies in medical imaging fields.The systematic and statistically
significant improvement that was demonstrated in this study motivates the
research of an optimal MR contrast or combination of MR contrasts for a
related future task.

References
[1] Sombir Singh Bisht et al. Image Registration Concept and Techniques:
A Review. Tech. rep. Journal of Engineering Research and Applications,
2014.
[2] Sacolick LI Delso G Wiesinger F. Clinical evaluation of zero-echo-time
MR imaging for the segmentation of the skull. Tech. rep. J Nucl Med.,
2015.
[3] Vandecaveye V Dirix P Haustermans K. The value of magnetic resonance
imaging for radiotherapy planning. Tech. rep. Semin Radiat Oncol., 2014.
[4] Matteo Maspero Mateusz C. Florkow Frank ZijlstraKoen Willemsen.
Deep learning-based MR-to-CT synthesis: The influence of varying
gradient echo-based MR images as input channels. Tech. rep. Wiley
Periodicals, 2019.
[5] LeCun Y Mathieu M Couprie C. Deep multi-scale video prediction beyond
mean square error. Tech. rep. arxiv, 2017.
[6] Gao Y Nie D Xiaohuan C. Estimating CT Image from MRI data
using 3D fully convolutional networks. Tech. rep. Springer International
Publishing, 2016.
[7] Philipp Fischer Olaf Ronneberger and Thomas Brox. U-Net: Convolu-
tional Networks for Biomedical Image Segmentation. Tech. rep. arxiv,
2015.
[8] Pluim J Staring M Klein S. I A rigidity penalty term for nonrigid
registration. Tech. rep. Med Phys, 2007.
[9] Marius Staring Stefan Klein. elastix: A Toolbox for Intensity-Based
Medical Image Registration. Tech. rep. IEEE Trans Med Imaging, 2010.
[10] Lempitsky V Ulyanov D Vedaldi A. Instance normalization: the missing
ingredient for fast stylization. Tech. rep. arXiv.org, 2016.
[11] Dixon WT. Simple proton spectroscopic imaging. Tech. rep. Radiology,
1984.
[12] Han X. MR-based synthetic CT generation using a deep convolutional
neural network method.. Tech. rep. Med Phys, 2017.

APPENDIX
Deep Learning: Deep learning is a subset of machine learning where artifi-
cial neural networks, algorithms inspired by the human brain, learn from large
amounts of data.
CNN: A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning

algorithm which can take in an input image, assign importance (learnable
weights and biases) to various aspects/objects in the image and be able to
differentiate one from the other.
Elastix: Elastix is a software used for Medical image registration
Hounsfield units (HU): Hounsfield units (HU) are a dimensionless unit

universally used in computed tomography (CT) scanning to express CT
numbers in a standardized and convenient form. Hounsfield units are obtained
from a linear transformation of the measured attenuation coefficients.
GANs: A generative adversarial network (GAN) is a machine learning

(ML) model in which two neural networks(a Generator and a Driscriminator)
compete with each other to become more accurate in their predictions.
Coregistration: coregistration refers to the spatial alignment of a series of

images, either from intra-subject or inter-subject image volumes and is utilized
in several steps of preprocessing.

CO–PO and CO–PSO Mapping
Course Outcome
Sl No. Description Blooms Taxonomy
Level
CS451.1 Analyse a current topic of profes-
Knowledge(Level1)
sional interest and present it Analyse(level4)
before an audience
CS451.2 Identify an engineering Evaluate(level5)
problem,analyse it and propose a Understand(level 2)
work plan to solve it
CO–PO Mapping
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CS451.1 - 3 - 2 - - 1 - 3 3 2 3
CS451.2 3 3 3 2 1 - - 3 3 3 - 1
CO–PSO Mapping
PO1 PO2 PO3
CS451.1 3 - 1
CS451.2 3 3 2
Justifications for CO–PO/PSO Mapping
Mapping Low/Medium/High Justification
CS451.1–PO2 H Problem analysis :
I was able to identify, formu-
late, review research literature,
and analyze problems with the
traditional learning environment,
reaching substantiated conclu-
sions using first principles of
mathematics, natural sciences,
and engineering sciences.
CS451.1–PO4 M Conduct investigations of
complex problems :
I used research based knowledge
and research methods including
design of experiments, analysis
and interpretation of data, and
synthesis of the information to
provide valid conclusions.
CS451.1–PO7 L Environment and sustainability :
I understood the impact of the
professional engineering solutions
in societal and environmental
contexts, and demonstrated the
knowledge of- and the need for-
sustainable developments.
CS451.1–PO9 H Individual:
I was able to function effec-
tively as an individual, in multi-
disciplinary settings.
CS451.1–PO10 H Communication :
I was able to communicate effec-
tively on complex Engineering
activities with the Engineering
Community and with society at
large, such as, being able to
comprehend and write effective
reports and design documenta-
tion, make effective presenta-
tions, and give and receive clear
instructions.
CS451.1–PO11 M Project Management and finance:
Demonstrated knowledge and
understanding of the Engineering
and management principles
and apply these to ones own
work, to manage projects and in
multi-disciplinary environments.
CS451.1–PO12 H Life-long learning :
Recognized the need for, and
have the preparation and ability
to engage in independent and
life-long learning in the broadest
context of technological change.
CS451.1–PSO1 H Computer Science Specific Skills :
Was able to identify, analyze and
design solutions for complex engi-
neering problems in multidisci-
plinary areas by understanding
the core principles and concepts
of computer science.
CS451.1–PSO3 L Professional Skills :
Was able to apply the funda-
mentals of computer science to
formulate competitive research
proposals and to develop inno-
vative products to meet the
societal needs thereby evolving
as an eminent researcher and
entrepreneur.
CS451.2–PO1 H Engineering Knowledge :
Applied the knowledge of Math-
ematics, Science, Engineering
fundamentals, and an Engi-
neering discipline to the solution
of complex engineering problems.
CS451.2–PO2 H Problem analysis :
I was able to identify, formu-
late, review research literature,
and analyze complex Engineering
problems reaching substantiated
conclusions using first principles
of mathematics, natural sciences,
and Engineering sciences.
CS451.2–PO3 H Design/Development of solutions:
Designed solutions for complex
Engineering problems and design
system components or processes
that meet the specified needs
with appropriate consideration
for the public health and safety,
and the cultural, societal, and
environmental considerations.
CS451.2–PO4 M Conduct investigations of
complex problems :
Used research based knowledge
and research methods including
design of experiments, analysis
and interpretation of data, and
synthesis of the information to
provide valid conclusions.
CS451.2–PO5 L Modern Tool usage :
Created, selected, and applied
appropriate techniques,
resources, and modern engi-
neering and IT tools including
prediction and modeling to
complex Engineering activities
with an understanding of the
limitations.
CS451.2–PO8 H Ethics :
Applied ethical principles and
commit to professional ethics and
responsibilities and norms of the
Engineering practice.
CS451.2–PO9 H Individual:
I was able to function effectively
as an individual,and in multi-
disciplinary settings.
CS451.2–PO10 H Communication :
Communicated effectively on
complex Engineering activities
with the Engineering Community
and with society at large, such
as, being able to comprehend and
write effective reports and design
documentation,make effective
presentations, and give and
receive clear instructions.
CS451.2–PO12 L Life-long learning :
Recognized the need for, and
have the preparation and ability
to engage in independent and
life-long learning in the broadest
context of technological change.
CS451.2–PSO1 H Computer Science Specific Skills :
I was able to identify, analyze
and design solutions for complex
engineering problems in multi-
disciplinary areas by under-
standing the core principles and
concepts of computer science.
CS451.2–PSO2 H Programming and Software
Development Skills:
Acquired programming efficiency
by designing algorithms and
applying standard practices in
software project development to
deliver quality software products.
CS451.2–PSO3 M Professional Skills :
Applied the fundamentals of
computer science to formulate
competitive research proposals
and to develop innovative prod-
ucts to meet the societal needs
thereby evolving as an eminent
researcher and entrepreneur.

Deep Learning-Based MR-to-CT Synthesis: The Influence of Varying Gradient Echo-Based MR Images As Input Channels

Uploaded by

Copyright:

Available Formats

Deep Learning-Based MR-to-CT Synthesis: The Influence of Varying Gradient Echo-Based MR Images As Input Channels

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning-Based MR-to-CT Synthesis: The Influence of Varying Gradient Echo-Based MR Images As Input Channels

Uploaded by

Copyright:

Available Formats

Seminar Report On

Deep learning–based MR-to-CT

Computer Science & Engineering

Department of Computer Science & Engineering

This is to certify that report entitled ”Deep learning–based MR-to-CT

Dr. Dhanya P M Mr.Harikrishnan M Ms. Meharban MS

I wish to express my sincere gratitude towards Dr.P.S.Sreejith,Princi

Magnetic Resonance Imaging and Computed tomography are two

1 Objective and Motivation 1

9 Applications and Future Scope 20

CO–PO and CO–PSO Mapping 24

1 Objective and Motivation

Computer Science & Engineering

2.1 MRI Sequence

• fluid (e.g. urine, CSF): low signal intensity (black)

• muscle: intermediate signal intensity (grey)

• fat: high signal intensity (white)

– grey matter: intermediate signal intensity (grey)

Fig. 1: Different MR contrasts

Computer Science & Engineering

T2 weighted sequences are part of almost all MRI protocols. Without

• fluid (e.g. urine, CSF): high signal intensity (white)

• muscle: intermediate signal intensity (grey)

• fat: high signal intensity (white)

– grey matter: intermediate signal intensity (grey)

2.2 Influence of different MR contrasts as input channel

Computer Science & Engineering

3.1 Data Collection

Computer Science & Engineering

3.2 Input configurations

Computer Science & Engineering

Fig. 3: Description of how MR images from the T1-weighted multiple gradient-

• The neural network needed to handle inputs containing different numbers

• To make repeatable predictions in order to identify input-related changes

Computer Science & Engineering

4.1 U-Net Architechure

4.1.1 Different parts of the U-Net Architecture

U-Net architecture is separated in 3 parts:

• The contracting or downsampling path.

• The expanding or upsampling path.

The contracting or downsampling path: The contracting path is

• 3x3 Convolution Layer + activation function (with batch normalization)

• 3x3 Convolution Layer + activation function (with batch normalization)

• 2x2 Max Pooling

Computer Science & Engineering

Bottleneck: This part of the network is between the contracting and

• Deconvolution layer with stride 2.

• Concatenation with the corresponding cropped feature map from the

• 3x3 Convolution layer + activation function (with batch normalization).

• 3x3 Convolution layer + activation function (with batch normalization).

The purpose of this expanding path is to enable precise localization

Fig. 4: U-Net Architecture

Computer Science & Engineering

4.1.2 Advantages of U-Net

U-Net has various advanatages :

• The U-Net combines the location information from the downsampling

• No dense layer, so images of different sizes can be used as input (since

• The use of massive data augmentation is important in domains like

Computer Science & Engineering