0% found this document useful (0 votes)

7 views15 pages

Brain Tumor Segmentation

Uploaded by

Maude Zed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views15 pages

Brain Tumor Segmentation

Uploaded by

Maude Zed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

A novel tridimensionnal multimodal brain segmentation approach

GBM6700E - 3D Reconstruction from Medical Images

Zerhouni, Maude
x

2409344
Sunday, December 15 2024

1
Contents

Contents 2

List of Figures 2

1 Introduction 3
1.1 Medical context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Research context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Objectives and methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Basics of deep learning principles and implementation 5

2.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 V-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.4 Transformers and Attention mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Tutorial Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 The 3DUV-NetR+ architecture 9

3.1 Implementation specifics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.2 Loss function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 Comparaison with other SOTA methods . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Limitations of the article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Re-implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Literature review 14

5 Conclusion 14

References 15

List of Figures
1 Deep learning architectures for medical image segmentation. . . . . . . . . . . . . . . . . . . . 4
2 Convolutional Neural Network Schematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 U-NET architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 V-Net architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Illustration of transformers and attention mechanisms’ effect . . . . . . . . . . . . . . . . . . 8
6 Proposed architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8 Comparison with SOTA methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2
1 Introduction
1.1 Medical context
Brain tumors are a complex and diverse group of highly lethal diseases. In 2024, more than 3,500 patients
in Canada were diagnosed with brain cancer, and alarmingly, 79% of them passed away within the same
year. This high mortality rate is often attributed to late diagnoses and improper treatment plans. Indeed,
their large variety, regrouping more than 120 types, makes distinguishing between tumor types particularly
challenging, complicating diagnosis and treatment decisions.

The diagnostic process typically begins with a preliminary clinical examination. In some cases, this is
followed by a biopsy, a procedure where a small sample of brain tissue is extracted and analyzed to confirm
the presence and type of a tumor. However, imaging techniques play a crucial role in brain tumor diagnosis.
These techniques include several modalities such as Magnetic Resonance Imaging (MRI), Magnetic Resonance
Angiography (MRA), Magnetic Resonance Spectroscopy (MRS), and Computed Tomography (CT). Among
these, MRI is particularly versatile, offering dozens of specialized modalities tailored to highlight different
features of brain anatomy and pathology. For example, diffusion MRI focuses on water molecule movement,
while T1- and T2-weighted MRI provide structural details.

Despite their potential, these imaging techniques require manual interpretation by experts, which is
time-consuming and demands high levels of expertise, particularly in handling and interpreting the various
modalities. This reliance on manual analysis not only delays diagnosis but also risks missing crucial details.
Automating this process could significantly improve diagnostic efficiency and accuracy. Additionally, relying
on a single imaging modality limits the understanding of the tumor, as each modality offers unique insights
into different aspects of brain structure and function. For instance, MRA emphasizes vascularization, while
MRI provides superior spatial resolution. Combining these two modalities has the potential to offer a more
comprehensive and functional understanding of the tumor, enabling more precise and personalized treatment
hence improving the patient’s outcome. For this exact reason, multimodality has gained interest in the field
of medical imaging, particularly for complex conditions such as brain tumors.

1.2 Research context

Medical Image Segmentation (MIS) is a computer vision task that involves dividing an medical image into
multiple segments, where each segment represents a different object or structure of interest in the image. The
goal is to provide a precise and accurate representation of the objects of interest within the image, typically
for the purpose of diagnosis, treatment planning, and quantitative analysis. Machine learning, particularly
deep learning, excels in recognizing patterns in large datasets and can adapt to the intricate and diverse
nature of medical images. Convolutional Neural Networks (CNNs) have emerged as the cornerstone for such
tasks, enabling robust and automated segmentation with high accuracy.

CNNs are specialized machine learning models designed for visual data processing, primarily used for
classification tasks, where the model assigns a single label to an entire image. Although CNNs were known
for decades, their potential for MIS was demonstrated in 2012 when Krizhevsky et al. trained one of the
largest CNNs of the time on ImageNet, a dataset with over a million labeled images. This network achieved
groundbreaking results in image classification, ushering in a new era for deep learning. Their work leveraged
improved GPU capabilities and techniques to address challenges like overfitting, making CNNs practical
for large-scale tasks. However, in medical imaging, localization is critical—each pixel in an image must
be assigned a label to identify structures like tumors, vessels, or lesions. This requirement, referred to as
semantic segmentation, presented unique challenges, such as the computation cost or even of the smaller
volume of available biomedical datasets compared to ImageNet.

3
To address this, the U-Net architecture, introduced in 2015 by Ronneberger et al. became a milestone in
medical image segmentation. U-Net builds upon the concept of fully convolutional networks by incorporating
a symmetric structure: a contracting path (to capture context) and an expanding path (to achieve precise
localization). High-resolution features from the contracting path are combined with upsampled outputs
to produce detailed segmentation maps, even with limited training data. In 2016, the V-Net architecture
introduced by Milletari et al. extended this idea to 3D medical imaging, enabling volumetric segmentation.
This was particularly important for tasks such as analyzing CT or MRI scans in three dimensions, where
spatial relationships across slices are crucial. U-Net and V-Net set the foundation for modern segmentation
tasks, inspiring countless adaptations and improvements. Today, these architectures remain at the core of
biomedical image processing, enabling efficient and accurate segmentation with limited data. By automating
segmentation, they reduce the time and expertise required for manual analysis, paving the way for more
personalized and precise medical treatments. A non-exhaustive list of architectures designed for biomedical
applications is presented in Figure 1.

Figure 1: Deep learning architectures for medical image segmentation.

Deep learning architectures in segmentation incorporate multimodality by combining data from various
sources to enhance analysis. Separate encoders extract specific features from each modality, which are then
merged at different levels within the architecture, either early in the process (early fusion) or after individual
extraction (late fusion). This approach captures complementary details, improves precision, and enhances
the robustness of segmentation by leveraging the strengths of each modality. In the medical field, this method
allows for more comprehensive and reliable analyses.

1.3 Objectives and methodology

Overall, brain tumors are a leading cause of cancer-related deaths highlighting the need for accurate seg-
mentation methods to aid diagnosis and treatment planning. In this context, combining imaging techniques
such as T1-weighted MRI, T2-weighted MRI, and FLAIR MRI holds great promise for delivering rich and
integrated information. As a result, 3D multi-modal image reconstruction is emerging as a technique that
offers a 3D visualization of the tumor, allowing us to enhance the diagnostic quality and facilitate more
precise assessments of tumor type and progression. However, automatic segmentation of brain tumors from
volumetric medical images poses significant challenges due to variations in tumor shapes, sizes, and intensi-
ties, as well as the non-standard characteristics of multimodal MRI images. Although convolutional neural
networks (CNNs) have shown promise in addressing these challenges, there remains a critical need for effective
integration of multimodal data to enhance segmentation accuracy.

4
The goal of this project is to evaluate 3DUV-NetR, a deep learning framework for multimodal brain
image analysis proposed by Aboussaleh et al. (February 2024) combining on U-Net, V-Net and transformers
architectures. To achieve this, this project was structured around three secondary objectives:

1. Developing a solid understanding of deep learning concepts.

2. Analyzing the methodology and results of the article.
3. Contextualizing the article within the broader landscape of 3D multimodal brain segmentation archi-
tectures.

To build foundational knowledge in deep learning, open-access courses on Coursera were utilized, notably
IBM’s Introduction to Deep Learning and Neural Networks with Keras and DeepLearning.AI’s AI for Med-
ical Diagnosis. These courses facilitated both the understanding of underlying concepts and the high level
implementation of code. The latter was more relevantly achieved using MONAI tutorials.

With these concepts mastered, the article was meticulously analyzed to assess the promise of its approach,
independently of its results. The implementation was then compared to MONAI tutorials, allowing for a
preliminary evaluation of the method’s relevance and depth.

Finally, a literature review was conducted to position the article within the context of recent advancements
in multimodal brain segmentation. This included comparing methods, results, and complexity across studies
from the past five years.

2 Basics of deep learning principles and implementation

2.1 Concepts
2.1.1 Convolutional Neural Networks
Convolutional Neural Networks (CNNs) refer to a long-time known deep learning architecture that works
by transforming an input image into a feature map through a series of convolutional, pooling, and activation
layers. These layers extract and refine features progressively to produce a predicted output.

On a high level, a typical CNN architecture comprises three components as described in Figure 2 : the
input layer, the hidden layers, and the output layer. The input layer receives the input image and passes it
to the hidden layers that extract relevant features or patterns. The output layer provides the predicted class
label probability scores for each potential class.

The hidden layers are critical for the CNN’s performance, and the number of hidden layers and the
number of filters in each layer can be adjusted to optimize it. In Figure 2, we can see four convolutional
layers. At each layer, one or several filters are applied, each one leading to a feature map. These filters
aim at recognizing a certain pattern such as an edge or a corner. Mathematically, this pattern corresponds
to a convolution, hence the name. Then, a pooling is applied in order to reduce the spatial dimensions of
the feature maps produced by the convolutional layer enabling to reduce computationnal complexity. An
exemple is provided in Figure 2. Finally, a non-linear activation function is applied in order to introduce
non-linearity into the model, allowing the network to model complex patterns. The subsequent feature maps
are passed through successive layers for more refined feature extraction.

5
Figure 2: Convolutional Neural Network Schematics

Finally, the output from the hidden layers is flattened and passed through traditional fully connected
layers. These layers combine extracted features to make predictions. For instance, in Figure 2, the fully
connected layer outputs probabilities, such as a 0.7 probability that the image represents a zebra. This layer
connects every neuron in one layer to every neuron in the next, synthesizing the learned features into the
final classification result.

Each filter is typically a 3x3 or 5x5 matrix with initially randomly distributed coefficients called weights.
In order to adjust these weights, the model needs to be trained. During training, these weights are adjusted
to minimize a loss function through its gradient. The loss function quantifies the difference between the
predicted and actual outputs, guiding the optimization process. The gradient of the loss function with
respect to each weight is computed, which indicates the direction and magnitude of the change needed for
each weight. If the gradient is negative, the weight is decreased, and conversely, if the gradient is positive,
the weight is increased. This adjustment is done iteratively to reduce the loss and improve the model’s
performance. To achieve optimization, the algorithm uses techniques such as gradient descent, where the
aim is to find the minimum of the loss function. The weights are updated in the direction that reduces the
loss, and this process is repeated until the model converges to an optimal set of weights.

2.1.2 U-Net
U-Net is a convolutional neural network (CNN) architecture developed specifically for biomedical 2D
image segmentation as introduced in 1.2. Since its introduction, U-Net has become one of the most popular
and widely used architectures in the field due to its ability to learn from relatively few training samples and
produce precise segmentations, addressing the issue of limited medical data and the unique label, most of
the time difficult to understand why attributed. The U-Net architecture is characterized by its distinctive
U-shaped structure as shown in Figure 3, composed of a contracting path known as the encoder and an
expansive path called the decoder, connected by skip connections.

The encoder follows the typical architecture of a convolutional network. It consists of the repeated
application of two 3x3 convolutions followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation
for downsampling, as described in ??. The decoder aims at recreating an image with labelled pixels. Every
step of it consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that
halves the number of feature channels, a concatenation with the correspondingly cropped feature map from
the contracting path, and two 3x3 convolutions, each followed by a ReLU. The decoder aims to reconstruct the
segmented image from the encoded features. It performs upsampling, which is achieved through transposed

6
Figure 3: U-NET architecture

convolutions, or deconvolutions. Finally, skip connections bridge the corresponding layers of the encoder
and decoder. They help to combine the low-level spatial features with the high-level abstract features, which
enhances the network’s ability to produce accurate segmentation masks. By skipping some layers and directly
connecting the output of one layer to a layer in the decoder, the network can leverage both the contextual
and detailed information for better segmentation.

Same as for CNNs, the model is trained in order to optimize the weigths and minimize the loss function.
By replacing the 2D operations by 3D operations in the U-Net architecture, the 3DU-Net architecture is
obtained and allows for a high resolution 3D MIS. However, doing so demultiplies the computation cost.
Also, the method is highly dependant on the loss function used, which on the one hand gives flexibility to
the model but on the other hand makes it less reliable.

2.1.3 V-Net
V-Net is an extension of the U-Net architecture, specifically designed for volumetric MIS. It adapts the
U-shaped structure to handle 3D data more effectively by incorporating 3D convolutional layers, 3D pooling
operations, and convolutional residual units to capture features across multiple spatial scales. Overall, it
is composed of a compression path and a decompression path connected by residual functions as shown in
Figure 4.

The compression path, or encoder, follows the architecture of a typical convolutional network. It consists
of multiple stages where convolutions are performed with volumetric kernels (5×5×5 voxels) and stride 2
for downsampling. This path progressively reduces the spatial resolution of the input while increasing the
depth of the feature maps, which allows the network to capture high-level features. Each stage uses residual
learning, where the input is added back to the output of the convolutional layers within the same stage.

The decompression path, or decoder, aims to recreate the original resolution of the input image. This is
achieved through upsampling using transposed convolutions. Each stage of the decompression path reduces
the number of feature channels while increasing the spatial resolution. The use of residual functions and skip

7
Figure 4: V-Net architecture

connections, which transfer detailed features from the encoder stages to the corresponding decoder stages,
helps in preserving spatial information and improving segmentation accuracy.

Like in other CNN architectures, the model is trained to optimize its weights and minimize the loss
function, ensuring that the segmentation masks produced are as accurate as possible. Unlike its predecessors,
V-Net does not concatenate the entire encoder counterpart with skip connection but uses residual connections
instead, enhancing its efficiency and gradient management. Despite its smaller resolution and rigidity, V-Net
optimizes the segmentation process, making it more suitable for 3D data and operations compared to the 3D
U-Net.

2.1.4 Transformers and Attention mechanisms

Initially used for language tasks like translation and ChatGPT, transformers are deep learning architec-
tures that leverage the attention mechanism, allowing the model to understand the entire context before
generating the next part, providing a form of ”long-term memory.” This idea is represented in the Figure ??
transfomers.

Figure 5: Illustration of transformers and attention mechanisms’ effect

8
The transformer architecture is an attention-based encoder-decoder model that maps input sequences
into continuous representations with the encoder, and the decoder generates outputs step-by-step, guided
by previously generated outputs. Their utilization in MIS is recent and seems promising in improving it
by capturing contextual information over long distances, enhancing global understanding, and optimizing
gradient flow for precise reconstruction. This is especially useful for complex structures like tumors.

The encoder’s role is to meticulously extract features from the input sequence. This is achieved through a
series of layers, each comprising a multi-head attention mechanism followed by a feed-forward neural network.
These layers are further enhanced with normalization and residual connections to ensure stability during
training. The decoder, on the other hand, is tasked with generating the output sequence. It mirrors the
encoder’s structure but includes an additional layer of cross-attention that allows it to focus on relevant parts
of the input sequence as it produces the output. Important to note here that the Transformer’s architecture
is not set in stone; it can manifest as Encoder-only, Decoder-only, or the classic Encoder-Decoder model.
Each architectural variation is tailored to specific learning objectives and tasks.

2.2 Tutorial Implementation

The second part of understanding the basics of deep learning involved coding. The open-access courses
provided by IBM and DeepLearning.Ai included practical exercises that helped in grasping the coding of
basic functions, which significantly contributed to the overall comprehension of the concepts. However, the
MONAI tutorials proved more relevant for understanding the output of a training algorithm, as they are
specifically tailored to medical imaging tasks. This project focused on implementing these tutorials, which
provided a hands-on approach to learning how deep learning can be applied to segmentation tasks.

Three tutorial models were followed, with the initial plan being to apply them using the BraTS2020 dataset
in order to compare the results. Unfortunately, this part had to be aborted due to the lack of resources at
the author’s disposal, including the absence of a graphical interface, which hindered both the training and
its validation. Despite this setback, this phase still offered great insights into the various packages that can
be used to process and read medical image files, such as nibabel, and deep learning libraries like MONAI.
MONAI, a specialized library for medical imaging, provided a comprehensive set of tools, including pre-built
networks like U-Net and V-Net, which are essential for segmenting 3D medical images.

3 The 3DUV-NetR+ architecture

To gain a more concrete understanding of how a deep learning architecture for 3D modality brain tumor
segmentation operates, a detailed review of a research article was undertaken. This article evaluates the
performance of a model derived from the classical architectures briefly mentioned in 1.2. The selection criteria
for the article included its recent publication date, its relevance, and the model’s depth and applicability in
light of the concepts introduced in 2.1. The chosen article, published by Aboussaleh et al. in 2024, is titled
”3DUV-NetR+: A 3D Hybrid Semantic Architecture Using Transformers for Brain Tumor Segmentation
with Multi-Modal MR Images.”

3.1 Implementation specifics

3.1.1 Architecture
As its title suggests, the study explores the combination of 3D-Unet, V-Net and Transformers architectures
hypothesizing that it would add up their their benefits and suppress their limitations. The architecture is
provided in Figure 6 By integrating three 3D MRI modalities , this study aims to explore the potential for

9
(a) 3DUVNetR+ Architecture (b) Transformer block

Figure 6: Proposed architecture

improving brain tumor diagnosis and paving the way for more accurate and targeted treatments. Indeed,
each modality provides specific information. Here, T1-weighted MRI, T2-weighted MRI, and FLAIR (Fluid-
Attenuated Inversion Recovery) MRI. T1-weighted MRI highlights anatomical structures and is particularly
effective at delineating the boundaries between gray and white matter, T2-weighted MRI is highly sensitive
to fluid content, making it useful for detecting edema and other fluid accumulations. Finally, FLAIR MRI
suppresses signals from free fluids, such as cerebrospinal fluid, to better highlight lesions adjacent to fluid
spaces.

One can recognize the (3D) U-Net and V-Net architectures in the top left and top right of the Figure,
respectively, where the outputs of the encoders are utilized and combined. The images are initially input into
the encoders of both architectures in parallel, and the results of each output are preserved, indicated by the
pink vertical lines. These features are then upsampled through convolution operations, dropout (dropout is
a regularization technique that prevents overfitting by randomly setting a fraction of input units to zero at
each update during training), and transformers. It is worth noted noted that in this image, the decoders of
the U-Net and V-Net architectures are only used for comparison with the classical architectures presented in
2.1.2 and 2.1.3.

The transformers consist only of a decoder, as shown in Figure 6b, where LN, MLP, and MSA stand
for Layer Normalization, Multi-Head Self Attention, and Multi-Perceptron Layer, respectively, introduced in
2.1.4.

This architecture is implemented using TensorFlow, which, like MONAI, is a comprehensive library with
numerous packages for deep learning including the operations utilized here.

3.1.2 Loss function

The model uses a combined loss function to achieve faster convergence and improved performance, lever-
aging the Adam optimizer with a weight decay of 1 × 104 . The combination involves two primary loss

10
functions: categorical focal loss and dice loss. The function is decrite dans l’équation suivante
N PN
X
γ 2 i=1 pi gi
L=− (gi (1 − pi ) log(pi )) + 1 − PN PN 2 (1)
2
i=1 i=1 pi + i=1 gi

where pi is the predicted probability for pixel i, gi is the ground truth label for pixel i and γ is the focusing
parameter..

Categorical focal loss addresses class imbalances in segmentation tasks where certain classes, such as
tumors, are much less prevalent than others. As we go deeper in the network, these classes tend to disappear,
which result in segmentation that don’t make sense anymore. By increasing the weight on these harder-
to-predict classes, the model focuses more on accurately identifying these critical areas, thus balancing the
learning process. Dice loss is particularly suited for both binary and multiclass image segmentation. It aims
to maximize the similarity between the model’s predictions and the ground truth by emphasizing the quality
of the predictions. This loss function is essential for ensuring precise segmentation.

3.1.3 Metrics
The quality of the segmentation has been evaluated in terms of four different metrics which are the accu-
racy, the Hausdorff95, the intersection over union and the dice similarity coefficient, respectively expressed
in equations (2) below.

|G ∩ S| + |Gc ∩ S c |
Acc = (2.1)
|G ∪ S|

dH (G, S) = max sup inf d(g, s), sup inf d(g, s) (2.2)
g∈G s∈S s∈S g∈G
|G ∩ S|
IoU = (2.3)
|G ∪ S|
2 × |G ∩ S|
DSC = (2.4)
|G| + |S|

Accuracy measures the proportion of correctly classified pixels by comparing each predicted pixel to its
ground truth counterpart. Hausdorff95 evaluates the boundary alignment between the predicted and ground
truth regions, representing the maximum distance between them. The Dice Similarity Coefficient (DSC) and
Intersection over Union (IoU) quantify the overlap between the predicted and expected regions. The main
difference is that the DSC is less sensitive to small errors compared to IoU.

3.2 Obtained results

The architecture was trained using the BraTS2020 dataset, which includes 497 MRI scans—369 for
training and 128 for validation. These scans were collected from numerous institutions across the United
States and underwent manual segmentation by expert radiologists. Four MRI modalities were available, but
the focus was placed on Post-Contrast T1, T2, and T2-FLAIR, as these are the most relevant for brain
tumors. The images were preprocessed and resized from 240 × 240 × 155 to 128 × 128 × 128, concentrating
on the critical voxels around the tumor in each image.

The authors compared their results with other architectures in two distinct ways. First, they conducted
an ablation study, re-running segmentations using only parts of their code—specifically, using just 3DU-Net,
just V-Net and UV-Net without transormers, as previously mentioned. Second, they compared their findings
with results from other state-of-the-art methods.

11
3.2.1 Ablation study

(a) DSC and HD95 comparison (b) Visualization of the segmentation

Figure 7: Ablation study

Figure 7a below transcribes the comparison of the 3DUVNetR+ architecture with their standalone U-Net
and V-Net in terms of DSC and HD95 for the whole tumor (WT), the enhanced tumor (ET) and the tumor
core (TC). WT refers to the entire tumor, including both the central core and surrounding areas affected
by the tumor. ET represents the tumor’s enhanced regions, highlighted after contrast agent administration,
which typically includes the most aggressive and vascular areas. TC, on the other hand, refers to the
central part of the tumor, often the densest and most homogeneous, excluding the peripheral regions. These
categories are critical for assessing the accuracy of tumor segmentation in medical imaging. Figure 7b gives
an overview of the code’s output.

The DSC for the 3DUV-NetR+ architecture is higher than that of all semi-complete architectures across
all tumor regions: whole tumor, tumor core, and enhanced tumor. The Hausdorff distance is also smaller for
the 3DUV-NetR+, indicating improved performance. The segmentation visualizations further support these
findings. The two leftmost images in the first row represent one of the ground truth multimodal scans and
its associated labels from manual segmentation. The remaining images show the labels obtained through
segmentation using only U-Net, only V-Net, both without transformers, and finally, the 3DUV-NetR+. The
3DUV-NetR+ results are the closest to the expected output, especially for the green class, which overextends
in the results from the incomplete architectures.

3.2.2 Comparaison with other SOTA methods

Figure 8: Comparison with SOTA methods

12
The results were quantitavely compared with other proposed methods referenced in Figure ?? below.

Once again, the architecture proposed by Aboussaleh et al. shows promising results. However, their
DSC is not always the highest, and their Hausdorff distance is not consistently the smallest. The nnFormer
architecture outperforms in all categories except for the WT DSC and the ET HD95. Wang et al.’s study also
presents competitive results, with better performance in TC DSC, WT HD95, and ET HD95. Additionally,
CU-Net, Lachinov et al., and Pereira et al. achieved better DSC values for both the WT and ET.

3.3 Limitations of the article

The authors claim to combine the favorable features of three deep learning architectures—U-Net, V-Net,
and Transformers—but it seems this goal has not been fully achieved. Specifically, the method of feature
extraction from the encoders, represented by the pink vertical lines in Figure 6, is unclear, making it difficult
for the reader to determine whether skip connections or residual connections are employed. Notably, one
of V-Net’s strengths lies in its use of residual connections instead of concatenations (skip connections) for
upsampling and regenerating each pixel of the input image, resulting in reduced redundancy and improved
efficiency at the expense of resolution. In contrast, U-Net’s simpler skip connections provide higher accuracy
but come at a significant computational cost. Another advantage of U-Net is its straightforward implementa-
tion, but this is not leveraged here, as the authors also incorporate the more complex V-Net architecture and
Transformers. Overall, the article does not clearly demonstrate how the positive features of each architecture
are integrated. One might infer that the higher resolution of U-Net, the robostness of the V-Net with respect
to the loss function used, contextual information provided by Transformers, and the optimized operations of
V-Net are combined to achieve superior results compared to using each structure independently.

Furthermore, the article lacks justification for its approach to multimodality. At each training or validation
step, the 3D image modalities appear to be concatenated, but the article would benefit from providing
additional explanations for this choice. One might infer that the way that the multimodality is managed
could also improve the efficiency of the algorithm.

The article does not explicitly explain why four metrics were selected for evaluation. However, it can be
inferred that these metrics were chosen to facilitate comparison of segmentation results with other architec-
tures that may have employed one or more of them. Nevertheless, only the results for HD95 and DSC were
reported, suggesting that the segmentation performance based on the other two metrics may not have met
the authors’ expectations.

Finally, the computation time remains excessively high. This is unsurprising, given that the architecture
requires twice as many operations in the encoder stage, as the U-Net and V-Net architectures function in
parallel without any simplification compared to their standalone implementations.

3.4 Re-implementation
The author of this project attempted to implement the described architecture using TensorFlow in order
to stay as faithful as possible to the original design. At that time, the author believed that access to the
necessary resources would be granted, but due to the lack of a suitable environment, this could not be tested.
Nonetheless, this experience allowed for a deeper understanding of the deep learning packages offered by
TensorFlow, such as tf.keras for building and training U-Net and V-Net architectures, which are widely
used for medical image segmentation tasks. Despite not being able to run the code, this phase provided
valuable insights into the design and deployment of deep learning models.

13
4 Literature review
To contextualize Aboussaleh et al.’s study within current research, it is useful to explore other brain
segmentation methods from the literature that have been implemented using the same dataset, BraTS2020.
As previously noted, an initial comparison has already been made, and the comparative list appears to be
relatively comprehensive. No other relevant studies have been identified. The goal of this literature review is
hence to compare architectures that have outperformed some of the results from Aboussaleh et al.’s study,
in order to understand why they performed better and to identify potential research directions in this field.

The architecture proposed by Pereira et al. leverages 3 × 3 kernels to design deeper networks with fewer
weights, enhancing resistance to overfitting. Intensity normalization is applied to standardize data from multi-
site acquisitions, and data augmentation via rotations improves segmentation of gliomas in MRI images.
The 3D transformer model, nnFormer, combines convolutional and self-attention operations to effectively
capture local and global dependencies. It employs Local and Global Volume-based Multi-head Self-attention
mechanisms to model representations robustly and uses skip attention in place of traditional skip connections
for precise segmentation. The cascaded CNN framework with uncertainty estimation, proposed by Wang
et al., uses three 2.5D CNNs to hierarchically segment brain tumor regions. This architecture balances
memory consumption, model complexity, and receptive field through anisotropic convolutions. Test-time
augmentation is employed for uncertainty estimation, aiding in identifying potentially mis-segmented areas.
The CU-Net framework features a successive segmentation approach for brain tumor structures. Between-
network connections are designed to transmit high-resolution features from shallow layers to deeper layers,
enhancing segmentation precision. A loss-weighted sampling scheme addresses class imbalance, improving
model performance in challenging datasets.

The reviewed studies highlight overlapping features such as hierarchical approaches, data augmentation,
and methods to balance complexity and precision, including the use of transformers. These commonali-
ties underscore their effectiveness and point to promising directions for further exploration in brain tumor
segmentation methodologies. Additionally, they emphasize the significance of Aboussaleh et al.’s approach,
which integrates all these common features.

5 Conclusion
The majority of the objectives for this project were achieved, except for the implementation, which was
hindered by limited resources at the time of completion. The understanding of deep learning concepts was
successfully attained, allowing for a comprehensive grasp of the implementation details of a promising method
for 3D multimodal brain segmentation. This also provided the tools to critically evaluate the results.

A brief literature review of studies with better results in the metrics used in the project’s focal article
further justified the relevance of Aboussaleh et al.’s approach. These reviews underscored the benefits of
using transformers and a hierarchical structure with encoder-decoder architectures for brain segmentation.
However, each method remains computationally expensive and would benefit from optimization at this scale.

14
References
[1] Ilyasse Aboussaleh, Jamal Riffi, Khalid el Fazazy, Adnane Mohamed Mahraz, and Hamid Tairi. 3duv-
netr+: A 3d hybrid semantic architecture using transformers for brain tumor segmentation with multi-
modal mr images. Results in Engineering, 21:101892, 2024.

[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional
neural networks. Advances in neural information processing systems, 25, 2012.
[3] Dmitry Lachinov, Evgeny Vasiliev, and Vadim Turlapov. Glioma segmentation with cascaded unet. In
International MICCAI Brainlesion Workshop, pages 189–198. Springer, 2018.
[4] Hongying Liu, Xiongjie Shen, Fanhua Shang, Feihang Ge, and Fei Wang. Cu-net: Cascaded u-net
with loss weighted sampling for brain tumor segmentation. In Multimodal Brain Image Analysis and
Mathematical Foundations of Computational Anatomy: 4th International Workshop, MBIA 2019, and
7th International Workshop, MFCA 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China,
October 17, 2019, Proceedings 4, pages 102–111. Springer, 2019.
[5] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks
for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV),
pages 565–571. Ieee, 2016.
[6] Project MONAI. 3d segmentation - brats tutorial, 2023. Accessed: 2024-12-12.
[7] Project MONAI. 3d segmentation - swin unetr brats21, 2023. Accessed: 2024-12-12.

[8] Project MONAI. 3d segmentation - unet with ignite, 2023. Accessed: 2024-12-12.
[9] Sérgio Pereira, Adriano Pinto, Victor Alves, and Carlos A Silva. Brain tumor segmentation using
convolutional neural networks in mri images. IEEE transactions on medical imaging, 35(5):1240–1251,
2016.

[10] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical
image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015:
18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages
234–241. Springer, 2015.
[11] Guotai Wang, Wenqi Li, Sébastien Ourselin, and Tom Vercauteren. Automatic brain tumor segmentation
based on cascaded convolutional neural networks with uncertainty estimation. Frontiers in computational
neuroscience, 13:56, 2019.
[12] Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Xiaoguang Han, Lequan Yu, Liansheng Wang, and Yizhou
Yu. nnformer: Volumetric medical image segmentation via a 3d transformer. IEEE Transactions on
Image Processing, 2023.

Final Year Project
No ratings yet
Final Year Project
14 pages
Medical Image Segmentation With 3D Convolutional Neural Networks: A Survey
No ratings yet
Medical Image Segmentation With 3D Convolutional Neural Networks: A Survey
34 pages
Brain Tumor Detection
100% (1)
Brain Tumor Detection
3 pages
M Tech. Oral Examination: School of Computer Science & Engineering
No ratings yet
M Tech. Oral Examination: School of Computer Science & Engineering
20 pages
U-Net: Convolutional Networks For Biomedical Image Segmentation
No ratings yet
U-Net: Convolutional Networks For Biomedical Image Segmentation
8 pages
Gpucoder Ug
No ratings yet
Gpucoder Ug
560 pages
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
No ratings yet
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
8 pages
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
No ratings yet
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
452 pages
Previously
No ratings yet
Previously
49 pages
Thesis Abhishek Singh Final
No ratings yet
Thesis Abhishek Singh Final
65 pages
Lecture 5 - CNNs For Detection and Segmentation
No ratings yet
Lecture 5 - CNNs For Detection and Segmentation
62 pages
Lagergren Rosengren 2020
No ratings yet
Lagergren Rosengren 2020
108 pages
Optimizing A Deep Learning Approach For Automatic Segmentations For White Matter Lesions
No ratings yet
Optimizing A Deep Learning Approach For Automatic Segmentations For White Matter Lesions
73 pages
B.E Cse Batchno 95
No ratings yet
B.E Cse Batchno 95
53 pages
PUBLICATION
No ratings yet
PUBLICATION
26 pages
Deep Learning in Medical Image Analysis
No ratings yet
Deep Learning in Medical Image Analysis
28 pages
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
No ratings yet
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
22 pages
Final Mushroom
No ratings yet
Final Mushroom
24 pages
Article 1
No ratings yet
Article 1
28 pages
Brain Tumor Detection Using Deep Learnin
No ratings yet
Brain Tumor Detection Using Deep Learnin
62 pages
Application of Segment Anything Model For Civil Infrastructure Defect Assessment
No ratings yet
Application of Segment Anything Model For Civil Infrastructure Defect Assessment
31 pages
Report Format
No ratings yet
Report Format
14 pages
UNet Deep Learning Architecture For Segmentation of Vascular and Non-Vascular Images A Microscopic Look at UNet Components Buffered With Pruning Explainable Artificial Intelligence and Bias
No ratings yet
UNet Deep Learning Architecture For Segmentation of Vascular and Non-Vascular Images A Microscopic Look at UNet Components Buffered With Pruning Explainable Artificial Intelligence and Bias
51 pages
2020 - Singh - 3D Deep Learning On Medical Images
No ratings yet
2020 - Singh - 3D Deep Learning On Medical Images
26 pages
Stroke Detection With Deep Learning: SRH Hochschule Heidelberg
No ratings yet
Stroke Detection With Deep Learning: SRH Hochschule Heidelberg
78 pages
Medical Image Segmentation - MAK WENG HOU&&TAN HONGYE
No ratings yet
Medical Image Segmentation - MAK WENG HOU&&TAN HONGYE
20 pages
A Deep Learning Approach For Brain Tumor Classification Using
No ratings yet
A Deep Learning Approach For Brain Tumor Classification Using
18 pages
Deep Learning For Multigrade Brain Tumor Classification in Smart Healthcare Systems: A Prospective Survey
No ratings yet
Deep Learning For Multigrade Brain Tumor Classification in Smart Healthcare Systems: A Prospective Survey
16 pages
Brain Tumor MRI Detection
No ratings yet
Brain Tumor MRI Detection
39 pages
Automated Brain Tumor Segmentation Using Attention Gate 1p74xy52
No ratings yet
Automated Brain Tumor Segmentation Using Attention Gate 1p74xy52
11 pages
Symmetry 13 02395
No ratings yet
Symmetry 13 02395
23 pages
Report .Final
No ratings yet
Report .Final
58 pages
Modality Preserving U-Net For Segmentation of Multimodal Medical Images
No ratings yet
Modality Preserving U-Net For Segmentation of Multimodal Medical Images
16 pages
Research Paper Final Report
No ratings yet
Research Paper Final Report
39 pages
Course Schedule - Sept 16 2020
No ratings yet
Course Schedule - Sept 16 2020
12 pages
Paper 02 Vol 2 Is 2 TMT
No ratings yet
Paper 02 Vol 2 Is 2 TMT
18 pages
Sensors: Learning To Detect Cracks On Damaged Concrete Surfaces Using Two-Branched Convolutional Neural Network
No ratings yet
Sensors: Learning To Detect Cracks On Damaged Concrete Surfaces Using Two-Branched Convolutional Neural Network
18 pages
Bioengineering 12 00140 v2
No ratings yet
Bioengineering 12 00140 v2
16 pages
U-Net Architectures For Fast Prediction of Incompressible Laminar Flows
No ratings yet
U-Net Architectures For Fast Prediction of Incompressible Laminar Flows
12 pages
Research
No ratings yet
Research
14 pages
Unleashing The Power of Open Source Transformers in Medical Imaging Insights From A Brain Abir at El 24
No ratings yet
Unleashing The Power of Open Source Transformers in Medical Imaging Insights From A Brain Abir at El 24
15 pages
Article - 1 - Automated Segmentation of Multiple Sclerosis Lesions Based On Convolutional Neural Networks
No ratings yet
Article - 1 - Automated Segmentation of Multiple Sclerosis Lesions Based On Convolutional Neural Networks
20 pages
Sat - 61.Pdf - Detection of Abnormalities in Brain Using Machine Learning in Medical Image Analysis
No ratings yet
Sat - 61.Pdf - Detection of Abnormalities in Brain Using Machine Learning in Medical Image Analysis
11 pages
Research Paper 5
No ratings yet
Research Paper 5
8 pages
IET Image Processing - 2022 - Wei - RAG Net ResNet 50 Attention Gate Network For Accurate Iris Segmentation
No ratings yet
IET Image Processing - 2022 - Wei - RAG Net ResNet 50 Attention Gate Network For Accurate Iris Segmentation
10 pages
500 Citation
No ratings yet
500 Citation
27 pages
Automated Scoring of RA YOLO
No ratings yet
Automated Scoring of RA YOLO
15 pages
Wa0006.
No ratings yet
Wa0006.
8 pages
Cancerous and Non-Cancerous MRI Classification Using Dual DCNN Approach
No ratings yet
Cancerous and Non-Cancerous MRI Classification Using Dual DCNN Approach
18 pages
Brain Sciences: State-of-the-Art CNN Optimizer For Brain Tumor Segmentation in Magnetic Resonance Images
No ratings yet
Brain Sciences: State-of-the-Art CNN Optimizer For Brain Tumor Segmentation in Magnetic Resonance Images
20 pages
An Attention-Based Deep Convolutional Neural Network For Brain Tumor and Disorder Classification and Grading in Magnetic Resonance Imaging
No ratings yet
An Attention-Based Deep Convolutional Neural Network For Brain Tumor and Disorder Classification and Grading in Magnetic Resonance Imaging
14 pages
App Project Report Template
No ratings yet
App Project Report Template
29 pages
Brain Tumor
No ratings yet
Brain Tumor
50 pages
04 Manuscript
No ratings yet
04 Manuscript
15 pages
Multi Attention Network For Automatic Liver Tumor Segmentation
No ratings yet
Multi Attention Network For Automatic Liver Tumor Segmentation
20 pages
Transparency in Diagnosis Unveiling The Power of Deep Learning and Explainable AI For Medical Image Interpretation
No ratings yet
Transparency in Diagnosis Unveiling The Power of Deep Learning and Explainable AI For Medical Image Interpretation
17 pages
A Hybrid Machine Learning Method For Image Classification
No ratings yet
A Hybrid Machine Learning Method For Image Classification
15 pages
T-4 Vision Transformers, Ensemble Model, and Transfer Learning Leveraging Explainable AI For Brain Tu
No ratings yet
T-4 Vision Transformers, Ensemble Model, and Transfer Learning Leveraging Explainable AI For Brain Tu
11 pages
Multimodal Brain MRI Tumor Segmentation Via Convolutional Neural Network PDF
No ratings yet
Multimodal Brain MRI Tumor Segmentation Via Convolutional Neural Network PDF
10 pages
Minor
No ratings yet
Minor
7 pages
Segmentation of Diabetic Retinopathy Images Using Dee - 2023 - Alexandria Engine
No ratings yet
Segmentation of Diabetic Retinopathy Images Using Dee - 2023 - Alexandria Engine
19 pages
2017 Article 9983-Read
No ratings yet
2017 Article 9983-Read
11 pages
22 - Predictive Modelling of Brain Tumor
No ratings yet
22 - Predictive Modelling of Brain Tumor
9 pages
1 s2.0 S2665917423002416 Main
No ratings yet
1 s2.0 S2665917423002416 Main
13 pages
MIMO: Controllable Character Video Synthesis With Spatial Decomposed Modeling
No ratings yet
MIMO: Controllable Character Video Synthesis With Spatial Decomposed Modeling
10 pages
S3D-UNet Separable 3D U-Net For Brain Tumor Segmentation
No ratings yet
S3D-UNet Separable 3D U-Net For Brain Tumor Segmentation
11 pages
Unet++ Architecture
No ratings yet
Unet++ Architecture
10 pages
Exploring Fusion Techniques in U-Net and DeepLab V3 Architectures For Multi-Modal Land Cover Classification
No ratings yet
Exploring Fusion Techniques in U-Net and DeepLab V3 Architectures For Multi-Modal Land Cover Classification
12 pages
Brain Tumor Conference
No ratings yet
Brain Tumor Conference
6 pages
4.RMTF-Net - Residual Mix Transformer Fusion Net
No ratings yet
4.RMTF-Net - Residual Mix Transformer Fusion Net
18 pages
1 s2.0 S0010482524015178 Main
No ratings yet
1 s2.0 S0010482524015178 Main
10 pages
Disaster Assessment From Satellite Imagery Using Deep Learning
No ratings yet
Disaster Assessment From Satellite Imagery Using Deep Learning
8 pages
Osei2024 - MULTIMODAL BRAIN TUMOR SEGMENTATION USING TRANSFORMER AND UNET
No ratings yet
Osei2024 - MULTIMODAL BRAIN TUMOR SEGMENTATION USING TRANSFORMER AND UNET
6 pages
A Comprehensive Analysis of Neural Network Techniques in Medical Image Processing
No ratings yet
A Comprehensive Analysis of Neural Network Techniques in Medical Image Processing
9 pages
Survey On AI-Based Polyp Localization and Segmentation For Enhanced Colonoscopy Diagnosis
No ratings yet
Survey On AI-Based Polyp Localization and Segmentation For Enhanced Colonoscopy Diagnosis
5 pages
Devisha
No ratings yet
Devisha
20 pages
Projectppt Edited
No ratings yet
Projectppt Edited
12 pages
Thesis
No ratings yet
Thesis
38 pages
SEMU Net
No ratings yet
SEMU Net
10 pages
Dip 4
No ratings yet
Dip 4
4 pages
Comparison of Tissue Segmentation Performance Between 2D U-Net and 3D U-Net On Brain MR Images
No ratings yet
Comparison of Tissue Segmentation Performance Between 2D U-Net and 3D U-Net On Brain MR Images
4 pages
Deep Learning An Update For Radiologists
No ratings yet
Deep Learning An Update For Radiologists
19 pages
Deep Learning For Smart Healthcare-A Survey On Brain Tumor Detection From Medical Imaging - Copie
No ratings yet
Deep Learning For Smart Healthcare-A Survey On Brain Tumor Detection From Medical Imaging - Copie
27 pages
BNU Net A Novel Deep Learning Approach For LV MRI Analysis in Short
No ratings yet
BNU Net A Novel Deep Learning Approach For LV MRI Analysis in Short
6 pages
Attention Unet A Nested Attention-Aware U-Net For Liver CT Image Segmentation
No ratings yet
Attention Unet A Nested Attention-Aware U-Net For Liver CT Image Segmentation
5 pages
(Ebook PDF) Deep Learning For Medical Image Analysis by S. Kevin Zhou Install Download
No ratings yet
(Ebook PDF) Deep Learning For Medical Image Analysis by S. Kevin Zhou Install Download
55 pages
Brain Tumor Segmentation
No ratings yet
Brain Tumor Segmentation
3 pages
Brain Tumor Segmentation
No ratings yet
Brain Tumor Segmentation
3 pages
Deep Learning Driven Image Segmentation Transforming Medical Imaging With Precision and Efficiency
No ratings yet
Deep Learning Driven Image Segmentation Transforming Medical Imaging With Precision and Efficiency
6 pages
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet

Brain Tumor Segmentation

Uploaded by

Brain Tumor Segmentation

Uploaded by

A novel tridimensionnal multimodal brain segmentation approach

GBM6700E - 3D Reconstruction from Medical Images

2 Basics of deep learning principles and implementation 5

3 The 3DUV-NetR+ architecture 9

1.2 Research context

Figure 1: Deep learning architectures for medical image segmentation.

1.3 Objectives and methodology

1. Developing a solid understanding of deep learning concepts.

2 Basics of deep learning principles and implementation

2.1.4 Transformers and Attention mechanisms

Figure 5: Illustration of transformers and attention mechanisms’ effect

2.2 Tutorial Implementation

3 The 3DUV-NetR+ architecture

3.1 Implementation specifics

Figure 6: Proposed architecture

3.1.2 Loss function

3.2 Obtained results

(a) DSC and HD95 comparison (b) Visualization of the segmentation

Figure 7: Ablation study

3.2.2 Comparaison with other SOTA methods

Figure 8: Comparison with SOTA methods

3.3 Limitations of the article

You might also like