0% found this document useful (0 votes)
20 views16 pages

Mri Brain Tumor Segmentation and Uncertainty Estimation Using 3D-Unet Architectures

Uploaded by

Khalil Lairedj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views16 pages

Mri Brain Tumor Segmentation and Uncertainty Estimation Using 3D-Unet Architectures

Uploaded by

Khalil Lairedj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MRI Brain Tumor Segmentation and

Uncertainty Estimation using 3D-UNet


architectures

?
Laura Mora Ballestar and Veronica Vilaplana
arXiv:2012.15294v1 [eess.IV] 30 Dec 2020

Signal Theory and Communications Department, Universitat Politècnica de


Catalunya. BarcelonaTech, Spain
[email protected], [email protected]

Abstract. Automation of brain tumor segmentation in 3D magnetic


resonance images (MRIs) is key to assess the diagnostic and treatment of
the disease. In recent years, convolutional neural networks (CNNs) have
shown improved results in the task. However, high memory consumption
is still a problem in 3D-CNNs. Moreover, most methods do not include
uncertainty information, which is especially critical in medical diagnosis.
This work studies 3D encoder-decoder architectures trained with patch-
based techniques to reduce memory consumption and decrease the effect
of unbalanced data. The different trained models are then used to create
an ensemble that leverages the properties of each model, thus increas-
ing the performance. We also introduce voxel-wise uncertainty informa-
tion, both epistemic and aleatoric using test-time dropout (TTD) and
data-augmentation (TTA) respectively. In addition, a hybrid approach
is proposed that helps increase the accuracy of the segmentation. The
model and uncertainty estimation measurements proposed in this work
have been used in the BraTS’20 Challenge for task 1 and 3 regarding
tumor segmentation and uncertainty estimation.

Keywords: brain tumor segmentation · deep learning · uncertainty · 3d


convolutional neural networks

1 Introduction

Brain tumors are categorized into primary, brain originated; and secondary, tu-
mors that have spread from elsewhere and are known as brain metastasis tumors.
Among malignant primary tumors, gliomas are the most common in adults, rep-
resenting 81% of brain tumors [7]. The World Health Organization (WHO) cat-
egorizes gliomas into grades I-IV which can be simplified into two types (1) “low
grade gliomas” (LGG), grades I-II, which are less common and are character-
ized by low blood concentration and slow growth and (2) “high grade gliomas”
(HGG), grades III-IV, which have a faster growth rate and aggressiveness.
?
This work has been partially supported by the project MALEGRA TEC2016-75976-
R financed by the Spanish Ministerio de Economı́a y Competitividad.
2 L. Mora et al.

The extend of the disease is composed of four heterogeneous histological


sub-regions, i.e. the peritumoral edematous/invaded tissue, the necrotic core
(fluid-filled), the enhancing and non-enhancing tumor (solid) core. Each region is
described by varying intensity profiles across MRI modalities (T1-weighted, post-
contrast T1-weighted, T2-weighted, and Fluid-Attenuated Inversion Recovery-
FLAIR), which reflect the diverse tumor biological properties and are commonly
used to assess the diagnosis, treatment and evaluation of the disease. These MRI
modalities facilitate tumor analysis, but at the expense of performing manual
delineation of the tumor regions which is a challenging and time-consuming
process. For this reason, automatic mechanisms for region tumor segmentation
have appeared in the last decade thanks to the advancement of deep learning
models in computer vision tasks. Despite these recent advances, the segmentation
of brain tumors in multimodal MRI scans is still a challenging task in medical
image analysis due to the highly heterogeneous appearance and shape of the
problem.
The Brain Tumor Segmentation (BraTS) [1–5] challenge started in 2012 with
a focus on evaluating state-of-the-art methods for glioma segmentation in multi-
modal MRI scans. BraTS 2020 training dataset includes 369 cases (293 HGG and
76 LGG), each with four 3D MRI modalities rigidly aligned, re-sampled to 1mm3
isotropic resolution and skull-stripped with size 240x240x155. Each provides
a manual segmentation approved by experienced neuro-radiologists. Training
annotations comprise the enhancing tumor (ET, label 4), the peritumoral edema
(ED, label 2), and the necrotic and non-enhancing tumor core (NCR/NET,
label 1). The nested sub-regions considered for evaluation are: whole tumor WT
(label 1, 2, 4), tumor core TC (label 1, 4) and enhancing tumor ET (label 4).
The validation set includes 125 cases, with unknown grade nor ground truth
annotation. The test set is composed of 166 cases.
The goal of this work is to develop a 3D convolutional neural network (CNN)
for brain tumor segmentation from 3D MRIs and provide an uncertainty mea-
sure to assess the confidence on the model predictions. The proposed methods
are used to participate in BraTS’20 Challenge for tasks 1 and 3, respectively.
In task 1, we explore the use of two well-known 3D-CNN for medical imaging
–V-Net [6] and 3D-UNet [27]– and apply some modifications to their baselines.
With both networks, the usage of sampling techniques is necessary due mem-
ory limitations as well as data augmentation to prevent over-fitting. For task
3, the work provides voxel-wise uncertainty measures computed at test time,
with global and per sub-region information. Uncertainty is estimated using both
epistemic and aleatoric [8] uncertainties using test-time dropout (TTD) [9] and
data augmentation, respectively.

2 Related Work
2.1 Semantic Segmentation
Brain tumor segmentation methods include generative and discriminative ap-
proaches. Generative methods try to incorporate prior knowledge and model
MRI Brain Tumor Segmentation and Uncertainty Estimation using 3D-UNet 3

probabilistic distributions whereas discriminative methods extract features from


image representations. This latter approach has thrived in recent years thanks
to the advancement in CNNs, as demonstrated in the winners of the previous
BraTS. The biggest break through in this area was introduced by DeepMedic [10]
a 3D CNN that exploits multi-scale features using parallel pathways and incor-
porates a fully connected conditional random field (CRF) to remove false pos-
itives. [11] compares the performances of three 3D CNN architectures showing
the importance of the multi-resolution connections to obtain fine details in the
segmentation of tumor sub-regions. More recently, EMMA [12] creates and en-
semble at inference time which reduces overfitting but at high computational
cost, and [13] proposes a cascade of two CNNs, where the first network produces
raw tumor masks and the second network is trained on the vecinity of the tumor
to predict tumor regions. BraTS 2018 winner [14] proposed an asymmetrically
encoder-decoder architecture with a variational autoencoder to reconstruct the
image during training, which is used as a regularizer. Isensee, F [15] uses a regu-
lar 3D-U-Net optimized on the evaluation metrics and co-trained with external
data. BraTS 2019 winners [16] use a two-stage cascade U-Net trained end-to-
end. Finally, [17] applies several tricks in three categories: data processing, model
devising and optimization process to boost the model performance.

2.2 Uncertainty

Uncertainty information of segmentation results is important, specially in medi-


cal imaging, to guide the clinical decisions and help understand the reliability of
the provided segmentation, hence being able to identify more challenging cases
which may require expert review. Segmentation models for brain tumor MRIs
tend to label voxels with less confidence in the surrounding tissues of the segmen-
tation targets [19], thus indicating regions that may have been miss-segmented.
Last year’s BraTS challenge already started introducing uncertainty measure-
ments. [18] computes epistemic uncertainty using TTD. They obtain a posterior
distribution generated after running several epochs for each image at test-time.
Then, mean and variance are used to evaluate the model uncertainty. A different
approach is proposed by Wang G [19], who uses TTD and data augmentation to
estimate the voxel-wise uncertainty by computing the entropy instead of the vari-
ance. Finally, [20] proposes to incorporate uncertainty measures during training
as they define a loss function that models label noise and uncertainty.

3 Method

3.1 Dataset Statistics

The biggest complexity for brain tumor segmentation is derived from the class
imbalance. The tumor regions account for a 5-15% of the brain tissue and each
tumor region is an even smaller portion. Fig. 1 provides a graphical represen-
tation of the distribution per each tumor class: ET, NCR, ED; without healthy
4 L. Mora et al.

tissue. It can be seen, that ED is more probable than ET and NCR and that
there is high variability between subjects in the NCR label. Another complexity
is the difference between glioma grades as LGG patients are characterized by
low blood concentration which is translated to low appearance of ET voxels and
higher number of voxels for NCR and NET regions.

Fig. 1: Distribution of each class ED, ET, NCR. From left to right, (1) number
of voxels in all cases, (2) number of voxels for the HGG and (3) number of voxels
for the LGG

3.2 Data Pre-processing and Augmentation


MRI intensity values are not standardized as the data is obtained from different
institutions, scanners and protocols. Therefore we normalize each modality of
each patient independently to have zero mean and unit std based on non-zero
voxels only, which represent the brain region.
We also apply data augmentation techniques to prevent over-fitting by trying
to disrupt minimally the data. For this, we apply Random Flip (for all 3 axes)
with a 50% probability, Random 90◦ Rotation on two axis with a 50% probability,
Random Intensity Shift between (−0.1..0.1 of data std) and Random Intensity
Scale on all input channels at range (0.9..1.1).

3.3 Sampling Strategy


3D-CNNs are computationally expensive and in many cases, the input data can-
not be fed directly to the network. Patch-wise training helps to free memory
resources so more images can be fed in one batch. However, there is a trade-off
between patch size and batch size. Bigger batches will have a more accurate
representation of the data but will require smaller patches (due to memory con-
straints) that provide local information but lack contextual knowledge.
Another key aspect to consider when selecting the patching strategy is to
maintain the class distribution. Losing this distribution can generate a biased
model, i.e. if the model only sees small patches with tumor it will likely miss-
classify healthy tissue.
MRI Brain Tumor Segmentation and Uncertainty Estimation using 3D-UNet 5

In this work, we have used two approaches depending on the patch size.

– Binary Distribution: Small patches, equal or lower than 643 are randomly
selected with a 50% probability of being centred on healthy tissue and 50%
probability on tumor [10].
– Random Tumor Distribution: Bigger sizes, 1123 or 1283 , are selected ran-
domly but always centred in tumor region, as the patches will contain more
healthy tissue and background information.

3.4 Loss

The Dice score coefficient (DSC) is a measure of overlap widely used to assess
segmentation performance when ground truth is available. Proposed in Milletari
et al. [6] as a loss function for binary classification, it can be written as:

PN
2∗ i=1 pi gi
Ldice = 1 − PN PN 2 (1)
2
i pi + =1i gi + 

where N is the number of voxels, pi and gi correspond to the predicted and


ground truth labels per voxel respectively, and  is added to avoid zero division.
Many variations of the dice loss have been proposed in the literature. For
instance, the Generalized Dice Loss (GDL) [26] which is based on the generalized
dice score (GDS) [28] for multiple class evaluation. Its goal is to correct the
correlation between region size and dice score, by weighting the contribution of
each label with the inverse of its volume. It is described as:

PL PN
l=1wl i=1 pli gli
LdiceGDL = 1 − 2 PL PN (2)
l=1 wl i=1 pli + gli + 

where L represents the number of classes and w l the weight given to each
class. We use the GDL variant as it is more suited for unbalanced segmentation
problems.

3.5 Network Architecture

This work proposes three networks, variations of V-Net [6] and 3U-Net [27]
architectures, for brain tumor segmentation and creates an ensemble to mitigate
the bias in each independent model.
The different models are trained using the ADAM optimizer, with start learn-
ing rate of 1e − 4, decreased by a factor of 5 whenever the validation loss has
not improved in the past 30 epochs and regularized with a l2 weight decay of
1e − 5. They all use the GDL loss.
6 L. Mora et al.

V-Net The V-Net implementation has been adapted to use four output chan-
nels (Non-Tumor, ED, NCR/NET, ET) and uses Instance Normalization [21] in
contrast to Batch Normalization, which normalizes across each channel for each
training example instead of the whole batch. Also, as proposed in [15], we have
increased the number of feature maps to 32 at the highest resolution, instead
of 16 as proposed by the original implementation. Figure 2 shows the network
architecture with an input patch size of 64x64x64.
The network has been trained using a patch size of 963 and the random
tumor distribution strategy (see 3.3). The maximum batch size due to memory
constraints is 2.

Fig. 2: V-Net [6] architecture with instance normalization, PreLU non-linearities,


32 feature channels at the highest resolution. Feature dimensionality is denoted
at each block. The network outputs the segmentation and the softmax prediction.

3D-UNet We use the original implementation with some minor modifications.


Batch Normalization is changed for Group Normalization and, as in V-Net, we
use 32 feature maps at the highest resolution.
The network architecture is divided into symmetric Encoder and Decoder
parts. The Encoder is composed of two convolutional blocks - with 3DConv +
ReLu + GroupNorm structure. The downsampling is performed with 23 Max-
Pooling and the corresponding upsampling is performed with interpolation. All
convolutional layers have kernel size 33 , except for the last one that has 1x1x1
kernel and 4 feature maps as output. In this case, we use ReLu non-linearity and
the skip-connections are joined with a concatenation step. The network outputs
a four-channel segmentation map with the training labels as well as a softmax.
The detailed architecture can be seen in Figure 3.
The Basic 3D-UNet is trained with a patch size of 1123 and a batch size of
2.
MRI Brain Tumor Segmentation and Uncertainty Estimation using 3D-UNet 7

Fig. 3: 3D-Unet [27] architecture with Group Normalization, MaxPooling and


Interpolation Upsampling and ReLU non-linearity

Residual 3D-UNet Expands the previous network with residual connections


to allow having a deeper network with less risk of suffering from vanishing gra-
dient. Adding to the residual blocks, the network also introduces some mod-
ifications w.r.t the basic 3D-UNet: (1) it uses element-wise sum to join the
skip-connections, (2) it changes upsampling with interpolation for transposed
convolutions and (3) it adds more depth to the network thanks to the resnet
connections.

Fig. 4: 3D-Unet [27] architecture with RestNet blocks at each level, MaxPooling,
TransposedConvolutions and ReLU non-linearity

This network is trained following two different strategies. The first one, 3D-
UNet-residual uses a patch size of 1123 and a batch size of 2 for the whole
training, whereas 3D-UNet-residual-multiscale varies the sampling strategy so
8 L. Mora et al.

the network sees local and global information. For that, the first half of the
training uses a patch size of 1283 with a batch size of 1. Then, the patch size is
reduced to 1123 and the batch increased to 2.

3.6 Post-Processing
In order to correct the appearance of false positives in the form of small and
separated connected components, this work uses a post-processing step that
keeps the two biggest connected components if their proportion is bigger than
some threshold -obtained by analysing the training set. With this process, small
connected components that may be false positives are removed but big enough
components are kept as some of the subjects may have several tumors.
Moreover, one of the biggest difficulties of this challenge is to provide an
accurate segmentation of the smallest sub-region, ET, which is particularly dif-
ficult to segment in LGG patients, as almost 40% have no enhancing tumor in
the training set. In the evaluation step, BraTS awards a Dice score of 1 if a
label is absent in both the ground truth and the prediction. Conversely, only
a single false positive voxel in a patient where no enhancing tumor is present
in the ground truth will result in a Dice score of 0. Therefore, some previous
works [15, 16] propose to replace enhancing tumor voxels for necrosis if the total
number of enhancing voxels is smaller than some threshold, which is found for
each experiment independently. However, we were not able to find a threshold
that improved the performance as it helped for some subjects but made some
other results worse.

3.7 Uncertainty
This year’s BraTS includes a third task to evaluate the model uncertainty and
reward methods with predictions that are: (a) confident when correct and (b)
uncertain when incorrect. In this work, we model the voxel-wise uncertainty of
our method at test time, using test time dropout (TTD) and test-time data
augmentation (TTA) for epistemic and aleatoric uncertainty respectively.
We compute epistemic uncertainty as proposed in Gal et.al [23], who uses
dropout as a Bayesian Approximation in order to simplify the task. Therefore,
the idea is to use dropout both at training and testing time. The paper suggests
to repeat the prediction a few hundred times with random dropout. Then, the
final prediction is the average of all estimations and the uncertainty is modelled
by computing the variance of the predictions. In this work, we perform B = 20
iterations and use dropout with a 50% probability to zero out a channel. The un-
certainty map
 is estimated with the variance for each sub-region independently.
Let Y i = y1i , y2i ...yB
i
be the vector that represents the i-th voxel’s predicted
labels, the voxel-wise uncertainty map, for each evaluation region, is obtained as
the variance:
B
1 X i i
var = (yb − ymean )2 (3)
B
b=1
MRI Brain Tumor Segmentation and Uncertainty Estimation using 3D-UNet 9

Uncertainty can also be estimated with the entropy, as [19] showed. However,
the entropy will provide a global measure instead of map for each sub-region. In
this case, the voxel-wise uncertainty is calculated as:
M
X
H(Y i |X) ≈ − p̂im ln(p̂im ) (4)
m=1

where p̂im is the frequency of the m-th unique value in Y i and X represents
the input image.
To model aleatoric uncertainty we apply the same augmentation techniques
from the training step plus random Gaussian noise, in order to add modifications
not previously seen by the network. The final prediction and uncertainty maps
are computed following the same strategies as in the epistemic uncertainty.
All that begin said, we hope to evaluate the model’s behaviour w.r.t to input
and model variability by defining the several experiments:
– Aleatoric Uncertainty: model aleatoric uncertainty with (1) TTA-variance,
providing three uncertainty maps (ET, TC, WT) and (2) TTA-entropy, with
one global map.
– Epistemic Uncertainty: model epistemic uncertainty with (1) TTD-variance,
providing three uncertainty maps (ET, TC, WT) and (2) TTD-entropy, with
one global map.
– Hybrid (Aleatoric + Epistemic) Uncertainty: model both aleatoric and epis-
temic uncertainty together with (1) TTD+TTA-variance, providing three
uncertainty maps (ET, TC, WT) and (2) TTD+TTA-entropy, with one
global map.

4 Results
The code1 has been implemented in Pytorch [24] and trained on the GPI2 servers,
based on 2 Intel(R) Xeon(R) @ 2.40GHz CPUs using 16GB RAM and a 12GB
NVIDIA GPU, using BraTS 2020 training dataset. We report results on training,
validation and test datasets. All results, prediction and uncertainty maps, are
uploaded to the CBICA’s Image Processing Portal (IPP) for evaluation of Dice
score, Hausdorff distance (95th percentile), sensitivity and specificity per each
class. Specific uncertainty evaluation metrics are the ratio of filtered TN (FTN)
and the ratio of filtered TP (FTP).

4.1 Segmentation
The principal metrics to evaluate the segmentation performance are the Dice
Score, which is an overlap measure for pairwise comparison of segmentation
mask X and ground truth Y:
1
Github repository: https://github.com/imatge-upc/mri-braintumor-segmentation
2
The Image and Video Processing Group (GPI) is a research group of the Signal
Theory and Communications Department, Universitat Politècnica de Catalunya.
10 L. Mora et al.

|X ∩ Y |
DSC = 2 ∗ (5)
|X| + |Y |
and the Hausdorff distance, which is the maximum distance of a set to the
nearest point in the other set, defined as:
 
DH (X, Y ) = max supxX inf d(x, y)), supyY inf d(x, y)) (6)
yY xX

where sup represents the supremum and inf the infimum. In order to have
more robust results and to avoid issues with noisy segmentation, the evaluation
scheme uses the 95th percentile.
Tables 1 and 2 show Dice and Hausdorff Distance (95th percentil) scores for
training and validation sets respectively.

Table 1: Segmentation Results on Training Dataset (369 cases).


Dice Hausdorff (mm)
Method
WT TC ET WT TC ET
V-Net 0.87 0.83 0.74 10.19 12.89 35.96
Basic 3D-UNet 0.85 0.84 0.76 6.97 10.13 28.23
Residual 3D-UNet 0.82 0.82 0.76 8.56 12.11 28.93
Residual 3D-UNet-multiscale 0.84 0.84 0.76 7.43 12.37 27.09
Ensemble - mean 0.85 0.85 0.77 10.46 6.90 29.03

Table 2: Segmentation Results on Validation Dataset (125 cases)


Dice Hausdorff (mm)
Method
WT TC ET WT TC ET
V-Net + post 0.86 0.78 0.69 14.50 16.15 43.52
Basic 3D-UNet +post 0.81 0.78 0.67 13.10 14.01 43.89
Residual 3D-UNet + post 0.81 0.78 0.71 11.85 18.82 34.97
Residual 3D-UNet-multiscale + post 0.83 0.77 0.72 12.34 13.11 37.42
Ensemble mean + post 0.84 0.79 0.72 10.93 12.24 37.97

The model used with the test set is the Residual 3D-UNet-multiscale with
post-processing. Table 3 shows the results in the training, validation and test
sets for comparison.
MRI Brain Tumor Segmentation and Uncertainty Estimation using 3D-UNet 11

Table 3: Segmentation Results for model Residual 3D-UNet-multiscale + post


on the three datasets
Dice Hausdorff (mm)
Dataset
WT TC ET WT TC ET
Train 0.84 0.84 0.76 7.43 12.37 27.09
Valid 0.83 0.77 0.72 12.34 13.11 37.42
Test 0.81 0.82 0.77 12.59 19.73 21.96

All the proposed models are greatly penalized when no ET is present on


the ground truth. In addition, the V-Net suffers more from false positives and
3D-UNet based models from false negatives. The excess of false positives may
be caused due to the usage of small patches instead of using the whole volume,
which provokes a variation in the proportion of healthy tissue against tumor
regions. On the other hand, 3D-UNet models use bigger patch sizes and pooling
layers instead of strided convolutions which may be the cause of having a larger
number of false negatives. Increasing the patch size helps reduce false positives
but it misses local information, which is reflected in label miss-classification on
the region’s boundaries. Figure 5 shows a visual comparison of the models with
a representation on the explained behaviours.

Fig. 5: Training results on patients: 280, 010, 331 and 178 (top-bottom). Image
order: (1) Flair (2) GT (3) Residual 3D-UNet-multiscale (4) Residual 3D-UNet
(5) Basic 3D-UNet (6) V-Net (7) Ensemble mean
12 L. Mora et al.

4.2 Uncertainty

BraTS requires to upload three uncertainty maps, one for each subregion (WT,
TC, ET) together with the prediction map. Values must be normalized between
0-100 such that ”0” represents the most certain prediction and ”100” represents
the most uncertain. The metrics used are the FTP ratio defined as F T P =
(T P100 − T PT )/T P100 , where T represents the threshold used to filter the more
uncertain values. The ratio of filtered true negatives (FTN) is calculated in a
similar manner. The integrated score will be calculated as follows:

score = AU C1 + (1 − AU C2 ) + (1 − AU C3 ). (7)

From this point forward all experiments are performed on the model Residual
3D-UNet-multiscale, as it is the one with more balanced results across the dif-
ferent regions. Table 4 shows the results for the epistemic, aleatoric and hybrid
uncertainties when computed with entropy or variance. As a general overview,
we can see that the AUC-Dice, which is computed by averaging the segmenta-
tion results for several thresholds that filter uncertain predictions, improves 2
to 3 points w.r.t the results obtained in the segmentation task (W T : 0.8172,
T C : 0.7664, ET : 0.7071). Although the metrics are not the same, it indicates
that the model is more certain on the TP and less certain on FP and FN. More-
over, the AUC-Dice is higher when using entropy as the uncertainty measure.
Our results show that the model is more uncertain in LGG patients, par-
ticularly on epistemic uncertainty; meaning the model requires more data to
achieve a more confident prediction. If we compare the behaviour between the
uncertainty types, we see that (1) aleatoric focuses on the region boundaries,
with small variations (2) epistemic improves results on the ET region but fil-
ters more TP and TN and(2) the hybrid approach achieves the best Dice-AUC
results when using entropy as the uncertainty measurement.

Table 4: Validation results on the Residual 3D-UNet-multiscale for the followed


approaches to estimate uncertainty.
Dice Score Ratio FTP Ratio FTN
Measure Method
WT TC ET WT TC ET WT TC ET
TTA 0.83 0,77 0,71 0,05 0,05 0,04 9.0e-4 2.0e-4 1.0e-4
Variance TTD 0.83 0,76 0,73 0,17 0,16 0,09 2.4e-3 1.5e-3 4.0e-4
Hybrid 0,83 0,76 0,73 0,18 0,16 0,10 3.6e-3 2.0e-3 5.0e-4

TTA 0,83 0,78 0,71 0,06 0,05 0,04 1.1e-3 4.7e-3 6.3e-3
Entropy TTD 0,82 0,78 0,74 0,15 0,13 0,07 2.1e-3 8.2e-3 1.22e-2
Hybrid 0,83 0,79 0,77 0,15 0,12 0,07 3.0e-3 1.01e-3 1.39e-2
MRI Brain Tumor Segmentation and Uncertainty Estimation using 3D-UNet 13

We participate in the challenge using test time augmentation (TTA) when


uncertain values are computed using variance as it achieves the highest integrated
scores per sub-region on the validation set. Table 5 shows the obtained results in
both validation and test sets. The achieved integrated scores for validation are
0.93, 0.91 and 0.89 and for test 0.93, 0.93, 0.91 for WT, TC and ET respectively.
We see a two point improvement on the ET and TC sub-regions for the test set.

Table 5: Uncertainty Results for the Residual 3D-UNet-multiscale model com-


puted using TTA and variance for each sub-region independently. We show re-
sults for validation and test set for comparison
DICE AUC FTP RATIO AUC FTN RATIO AUC
Dataset
WT TC ET WT TC ET WT TC ET
Valid 0.8316 0.7715 0.7088 0.0449 0.0538 0.0380 0.0009 0.0002 0.0001
Test 0.8299 0.8124 0.7654 0.0332 0.0537 0.0395 0.0020 0.0005 0.0003

5 Discussion and Conclusions

This work proposes a set of models based on two 3D-CNNs specialized in medical
imaging, V-Net and 3D-UNet. As each of the trained models performs better in
a particular tumor region, we define an ensemble of those models in order to
increase the performance. Moreover, we analyze the implication of uncertainty
estimation on the predicted segmentation in order to understand the reliability
of the provided segmentation and identify challenging cases, but also as a means
of improving the model accuracy by filtering uncertain voxels that should refer
to wrong predictions. We use the Residual 3D-UNet-multiscale as our model to
participate at the BraTS’20 challenge.
The best results in the validation set are obtained when creating an ensemble
of the proposed models, as we can leverage the biases of each model, but are
still far from the current state the art. These results may be caused by a bad
training strategy where the sampling technique does not reflect the correct label
distribution, thus providing more false detections. This is reflected more in the
ET region as all models predict more tumor voxels of this label, which is greatly
penalized when the ground truth does not contain it. In order to improve results,
future work should try to provide a better representation of the labels, not just
increase the patch size, but maybe let the network see both local and more global
information.
Another potential problem is the model’s simplicity. Although previous works
achieve good results using a 3D-UNet, i.e. [15], adding more complexity to the
network may help boost the performance. Therefore a possible line of work would
be to extend the proposed models into a cascaded network, where each nested
14 L. Mora et al.

evaluation region –WT, TC and ET– is learnt as a binary problem. Also, LGG
subjects usually achieve lower accuracy on the prediction. In order to improve
the results, we could research other post processing techniques and design them
specifically to target each one of the glioma grades, as they may be differentiated
by the sub-region distribution.
For uncertainty estimation, the work evaluates the usage of aleatoric, epis-
temic and a hybrid approach using the entropy as a global measure and variance
to evaluate uncertainty on each evaluation region. In the provided results, it
has been seen that using uncertainty information actually helps improve the
accuracy of the network, achieving the best Dice Score (AUC, estimated from
filtering uncertain voxels) when using the hybrid approach and entropy as the
uncertainty measure. Our method achieves a score of 0.93, 0.93, 0.91 for WT,
TC and ET respectively on the test set.

References
1. B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, et
al.: ”The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)”,
IEEE Transactions on Medical Imaging 34(10), 1993-2024 (2015) https://doi.org/
10.1109/TMI.2014.2377694
2. S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J.S. Kirby, et al.: ”Ad-
vancing The Cancer Genome Atlas glioma MRI collections with expert segmen-
tation labels and radiomic features”, Nature Scientific Data, 4:170117 (2017)
https://doi.org/10.1038/sdata.2017.117
3. S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, et al.: ”Identifying
the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progres-
sion Assessment, and Overall Survival Prediction in the BRATS Challenge”, arXiv
preprint arXiv:1811.02629 (2018)
4. S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. Kirby, et al.,
”Segmentation Labels and Radiomic Features for the Pre-operative Scans
of the TCGA-GBM collection”, The Cancer Imaging Archive, 2017. DOI:
10.7937/K9/TCIA.2017.KLXWJJ1Q
5. S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. Kirby, et al.,
”Segmentation Labels and Radiomic Features for the Pre-operative Scans
of the TCGA-LGG collection”, The Cancer Imaging Archive, 2017. DOI:
10.7937/K9/TCIA.2017.GJQ7R0EF
6. Milletari, Fausto, Nassir Navab, and Seyed-Ahmad Ahmadi. ”V-net: Fully convo-
lutional neural networks for volumetric medical image segmentation.” 3D Vision
(3DV), 2016 Fourth International Conference on. IEEE, 2016.
7. Morgan, L Lloyd: The epidemiology of glioma in adults: A ”state of the science”
review. Neuro-oncology vol.17 01-2015 https://doi.org/10.1093/neuonc/nou358
8. Armen Der Kiureghian and Ove Ditlevsen: Aleatory or epistemic? does it matter?
Structural safety, 31(2):105–112, 2009.
9. Yarin Gal and Zoubin Ghahramani: Dropout as a bayesian approximation: Repre-
senting model uncertainty in deep learning. arXiv preprint arXiv:1506.02142, 2015
10. Konstantinos Kamnitsas, Christian Ledig, Virginia F.J. Newcombe, Joanna P.
Simpson, Andrew D. Kane, David K. Menon, Daniel Rueckert, Ben Glocker:
Efficient multi-scale 3D CNN with fully connected CRF for accurate brain
MRI Brain Tumor Segmentation and Uncertainty Estimation using 3D-UNet 15

lesion segmentation, Medical Image Analysis, Volume 36, 2017, pages 61-78,
https://doi.org/10.1016/j.media.2016.10.004
11. Casamitjana, A., Puch, S., Aduriz, A., Vilaplana, V., ”3D Convolutional Neural
Networks for Brain Tumor Segmentation: a comparison of multi-resolution archi-
tectures”. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain
Injuries. BrainLes 2016. Lecture Notes in Computer Science, vol 10154. Springer,
2017.
12. Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S., Sinclair, M., Pawlowski, N:
Ensembles of Multiple Models and Architectures for Robust Brain Tumour Seg-
mentation in International MICCAI Brainlesion Workshop (Quebec, QC), 450–462
arXiv preprint arXiv:1711.01468, 2017
13. Casamitjana, A., Catà, M., Sánchez, I., Combalia, M., Vilaplana, V., ”Cascaded
V-Net Using ROI Masks for Brain Tumor Segmentation”. In: Brainlesion: Glioma,
Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2017. Lecture
Notes in Computer Science, vol 10670. Springer, 2018.
14. Andriy Myronenko: 3D MRI brain tumor segmentation using autoencoder regu-
larization. arXiv preprint arXiv:1810.11654, 2016
15. Isensee, F., et al.: No new-net. International MICCAI Brainlesion Workshop, pp.
234–244. Springer (2018)
16. Jiang Z., Ding C., Liu M., Tao D. (2020) Two-Stage Cascaded U-Net: 1st Place
Solution to BraTS Challenge 2019 Segmentation Task. In: Crimi A., Bakas S. (eds)
Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Brain-
Les 2019. Lecture Notes in Computer Science, vol 11992. Springer, Cham
17. Zhao YX., Zhang YM., Liu CL. (2020) Bag of Tricks for 3D MRI Brain Tumor
Segmentation. In: Crimi A., Bakas S. (eds) Brainlesion: Glioma, Multiple Sclerosis,
Stroke and Traumatic Brain Injuries. BrainLes 2019. Lecture Notes in Computer
Science, vol 11992. Springer, Cham
18. Natekar Parth, Kori Avinash, Krishnamurthi Ganapathy AUTHOR=Natekar
Parth, Kori Avinash, Krishnamurthi Ganapathy: Demystifying Brain Tumor Seg-
mentation Networks: Interpretability and Uncertainty Analysis. Frontiers in Com-
putational Neuroscience vol.14 page 6 https://doi.org/10.3389/fncom.2020.00006,
2020
19. Wang G., Li W., Ourselin S. and Vercauteren T. Automatic Brain Tumor
Segmentation Based on Cascaded Convolutional Neural Networks With Un-
certainty Estimation. Frontiers in Computational Neuroscience vol.13 pages 56
https://doi.org/10.3389/fncom.2019.00056, 2019
20. McKinley R., Meier R., Wiest R. (2019) Ensembles of Densely-Connected CNNs
with Label-Uncertainty for Brain Tumor Segmentation. In: Crimi A., Bakas S.,
Kuijf H., Keyvan F., Reyes M., van Walsum T. (eds) Brainlesion: Glioma, Multiple
Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2018. Lecture Notes in
Computer Science, vol 11384. Springer
21. Dmitry Ulyanov and Andrea Vedaldi and Victor Lempitsky. Instance Normaliza-
tion: The Missing Ingredient for Fast Stylization. arXiv preprint arXiv:1607.08022,
2016
22. Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D., Menon,
D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected
CRF for accurate brain lesion segmentation. Med. Image Anal. 36 (2017) 61–78
23. Yarin Gal and Zoubin Ghahraman.Dropout as a Bayesian Approximation: Rep-
resenting Model Uncertainty in Deep Learning. arXiv preprint arXiv:1506.02142,
2015
16 L. Mora et al.

24. Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and
Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and
Antiga, Luca and Lerer, Adam. Automatic differentiation in PyTorch, NIPS-W 2017
25. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. ”U-net: Convolu-
tional net- works for biomedical image segmentation.” MICCAI. Springer, 2015.
https://doi.org/10.1007/978-3-319-24574-4 28
26. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.Generalised
Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmenta-
tions. Lecture Notes in Computer Science 240-248, Springer International Publishing
2017. https://doi.org/10.1007/978-3-319-67558-9 28
27. Özgün Çiçek and Ahmed Abdulkadir and Soeren S. Lienkamp and Thomas Brox
and Olaf Ronneberger, ”3D U-Net: Learning Dense Volumetric Segmentation from
Sparse Annotation”, arXiv preprint arXiv:1606.06650, 2016.
28. W. R. Crum and O. Camara and D. L. G. Hill, ”Generalized Overlap Measures
for Evaluation and Validation in Medical Image Analysis”, IEEE Transactions on
Medical Imaging vol. 25, no. 11, pp. 1451–1461, 2006

You might also like