0% found this document useful (0 votes)
46 views4 pages

Comparison of Tissue Segmentation Performance Between 2D U-Net and 3D U-Net On Brain MR Images

Uploaded by

Gyani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views4 pages

Comparison of Tissue Segmentation Performance Between 2D U-Net and 3D U-Net On Brain MR Images

Uploaded by

Gyani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Comparison of tissue segmentation performance

between 2D U-Net and 3D U-Net on brain MR


Images

Boyeong Woo Myungeun Lee*


School of Information Technology & Electrical Engineering Advanced Institutes of Convergence Technology,
The University of Queensland Seoul National University
Brisbane, Australia Suwon-si, Gyeonggi-do, Korea
2021 International Conference on Electronics, Information, and Communication (ICEIC) | 978-1-7281-9161-4/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICEIC51217.2021.9369797

[email protected] [email protected]

Abstract— In this paper, we compare the tissue segmentation


performance of the 2D U-Net and 3D U-Net in brain MR images II. MATERIALS AND METHODS
including Alzheimer’s disease. The Open Access Series of
Imaging Studies dataset used in this experiment consists of a A. Dataset
cross-sectional collection of 416 subjects aged 18 to 96. The final The dataset was obtained from the Open Access Series of
segmentations were classified into 4 classes: background, Imaging Studies (OASIS) (http://oasis-brains.org). The
cerebrospinal fluid, gray matter, and white matter. In the OASIS-1 dataset [5] used in this experiment consists of a
experiment, 3D U-Net showed highly stable segmentation results.
cross-sectional collection of 416 subjects aged 18 to 96. One
Particularly, the evaluation of the 2D/3D U-Net segmentation has
shown that the 3D U-Net provided higher dice similarity
hundred of the subjects have been clinically diagnosed with
coefficients (94.4±4.5%~95.6±3.0%) and lower Hausdorff mild to moderate Alzheimer’s disease (AD). For each subject,
distances (7.5±3.2mm~12.3±3.9mm) than the 2D U-Net T1-weighted MRI scans were included. The MRIs were
(89.6±4.9%~96.22 ±3.1% and 9.1±5.5mm~14.4±7.4mm). And the accompanied by segmentation masks produced through
3D U-Net provided a better performance despite having less FreeSurfer processing (http://surfer.nmr.mgh.harvard.edu). The
training examples to learn from. The simple image processing images were segmented to classify brain tissue as cerebrospinal
pipeline was an added advantage. However, the number of fluid (CSF), gray matter, or white matter. The dimension of
parameters and the amount of memory required was a major each image volume was 176 x 208 x 176.
limitation of using 3D U-Net.

Keywords—Deep learning; U-net; 2D/3D; brain; MRI;


segmentation; Alzheimer’s disease

I. INTRODUCTION
Many techniques of deep learning have been demonstrating
remarkable performance in various applications such as
segmentation and registration for medical image analysis.
However, the selection of the most appropriate technique for a
given data is still challenging. Considerable interest has been
given to deep neural networks (DNNs), particularly
convolutional neural networks (CNNs), to resolve the problems
associated with medical imaging segmentation. Until recently,
two-dimensional CNN techniques have been applied mostly. Fig. 1. The architecture of 3D U-Net.
Recently, different CNN architectures [1-3] have been
proposed that feed through entire images. In particular, U-Net B. Methods
[4] is known for producing excellent performance for image
segmentation. Therefore, in this paper, we extend this to a 3D U-Net is a fully convolutional neural network developed for
convolutional network for the segmentation of MR images. biomedical image segmentation first proposed by Olaf et al. [4]
Then, we compare tissue segmentation performance of the 2D in 2015. The network architecture is very similar to
U-Net and 3D U-Net in brain MR images including convolutional autoencoders, consisting of a contracting path
Alzheimer’s disease. (“encoder”) and an expansive path (“decoder”). The main
feature of U-Net which distinguishes it from a standard
* corresponding author autoencoder is the skip connections between the encoder and

Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on October 18,2024 at 06:06:31 UTC from IEEE Xplore. Restrictions apply.
the decoder. The skip connections recover spatial information The mean DSCs for CSF, gray matter, and white matter
lost during down-sampling, which is critical for segmentation were 94.4%, 94.7% and 95.6% respectively. For the 2D U-Net,
tasks. predictions were made on a slice-by-slice basis, and the final
volumetric segmentation mask was produced by stacking all
The original U-Net was a 2D convolutional network. Here, the slices together. The mean DSCs from the 2D U-Net were
we extend this to a 3D convolutional network for segmentation very similar to those from the 3D U-Net for gray matter and
of MR images. The network architecture is shown in Fig. 1. white matter, but the mean DSC for CSF was noticeably lower
Each convolution block consists of two 3x3x3 convolutional (89.6% vs 94.4%). Also, the mean HDs from the 2D U-Net
layers with rectified linear unit (ReLU) activation. In the were about 2 mm higher compared to those from the 3D U-Net
contracting path, max-pooling operations were used to reduce for all classes.
the resolution of feature maps and allow for more features. In
the expansive path, spatial resolution was recovered by up- Fig. 2 shows the graphs of loss versus epoch for 2D U-Net
sampling and concatenating with high-resolution features from and 3D U-Net. While the 2D U-Net (Fig.2a) seemed to show a
the contracting path. The final output layer was a 1x1x1 more stable convergence behavior, overfitting was more severe
convolutional layer with appropriate number of output with the 2D U-Net than the 3D U-Net (Fig.2b) as well.
channels and softmax activation. The number of output
channels was 4 in this case since there were 4 classes including
the background. The classes were: background (0), CSF (1),
gray matter (2), and white matter (3).
The dataset was split into a training set and a test set by
setting aside 20% of the whole dataset as the test set and using
the other 80% as the training set. The network was trained with
the brain MRIs as the input images and the accompanying
volumetric segmentation files (produced using FreeSurfer
software) as the ground truth segmentation masks. The input
images were Z-normalized for network input. A multiclass
Dice loss, similar to the one used in [6], was used as the loss
function. Training was done using the Adam optimizer [7] with
a learning rate of 0.0005, batch size of 1, and for 25 epochs.
For comparison, a 2D U-Net was also trained on slices
extracted from the image volumes. The network architecture
was the same as the 3D U-Net except that the convolutional a)
layers were 2-dimensional. The 2D U-Net was trained with a
learning rate of 0.0005, batch size of 128, and for 25 epochs.
The networks were implemented using TensorFlow [8]
version 2.2 with Keras API (http://tensorflow.org/guide/keras)
and was trained on a high-performance computer with an
NVIDIA Titan Xp 12GB.

III. RESULTS
The trained 3D U-Net was evaluated by making predictions
on the held-out test set. Dice similarity coefficient (DSC) and
Hausdorff distance (HD) were calculated for each of the
predicted segmentation mask, and the results are summarized
in Table I.

TABLE I. PERFORMANCE OF 2D U-NET AND 3D U-NET b)


Gray White
Method Metric CSF Fig. 2. Loss vs epoch graph for a) 2D U-Net and b) 3D U-Net.
Matter Matter

DSC (%) 89.6 ± 4.9 94.3 ± 3.9 96.2 ± 3.1


2D U-Net Experimental results are shown in Fig. 3. Fig. 3a shows an
HD (mm) 11.6 ± 6.7 9.1 ± 5.5 14.4 ± 7.4
example where there was not much difference between the
outputs from 2D U-Net and 3D U-Net. Fig. 3b is an example
DSC (%) 94.4 ± 4.5 94.7 ± 4.0 95.6 ± 3.0
3D U-Net
where there was a notable difference between the two outputs.
HD (mm) 9.9 ± 3.6 7.5 ± 3.2 12.3 ± 3.9 In this example, it was observed that there was some erroneous
segmentation of CSF with 2D U-Net.

Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on October 18,2024 at 06:06:31 UTC from IEEE Xplore. Restrictions apply.
segmentation; (bottom left) output from 2D U-Net; (bottom right) output from
3D U-Net.

IV. DISCUSSION
There are some benefits and drawbacks to using a fully
volumetric neural network for processing 3D images. The
results from this experiment suggest that networks with 3D
convolutions produce robust segmentation outputs, even for
imbalanced classes such as CSF. While slice-by-slice
segmentation using 2D U-Net also produced decent outputs, it
seems to be more vulnerable to class imbalance. In addition,
3D U-Net outperformed 2D U-Net in terms of HD for all
classes. This is probably due to the ability of the 3D U-Net to
leverage context from adjacent slices.
Another advantage of 3D U-Net is the simplicity of the
segmentation pipeline. Most tomographic images are available
as a single 3D image file rather than as a collection of multiple
2D images. The 3D U-Net can process a volumetric image with
minimal pre-processing while the 2D U-Net requires us to
extract slices from the image before feeding them into the
network. Also, during inference, the 3D U-Net will produce a
volumetric segmentation output directly, but the 2D U-Net will
produce slices of segmentation masks which need to be
combined into a 3D volume in order to overlay the mask on the
original image volume.

a) However, the benefits of 3D U-Net come at a


computational cost due to the increased number of parameters.
Also, the processing of 3D volume requires a large amount of
memory, which is a major burden in training a 3D
convolutional neural network and often restricts the batch size
to be very small. Another disadvantage of using 3D volumes is
that the number of training examples is usually limited. There
are often >100 slices in medical tomographic images,
providing a plenty of training examples for 2D U-Net to learn
from even if the number of volumetric images is small. This
may explain the slightly higher DSC achieved by 2D U-Net for
the segmentation of white matter (Table I). For 3D U-Net, data
augmentation might be necessary to reduce overfitting due to
the relatively small number of training examples.

V. CONCLUSION
The comparison of 2D U-Net and 3D U-Net for
segmentation of brain MRI demonstrated that 3D U-Net
provides a better performance despite having less training
examples to learn from. The simple image processing pipeline
is an added advantage. However, the number of parameters and
the amount of memory required is a major limitation of using
3D U-Net. Although it is expected that hardware will keep
improving in coming years, the cost of advanced hardware can
still be a barrier to clinical use. It could be beneficial to devise
a method which can perform 3D image recognition tasks
without excessive computational cost and memory usage.

b) ACKNOWLEDGMENT
Fig. 3. Segmentation results. a) Example where the outputs from 2D U-Net This work was supported by the National Research
and 3D U-Net were similar. b) Example where there was a notable difference Foundation of Korea (NRF) grant funded by the Korea
between 2D U-Net and 3D U-Net. There was some erroneous segmentation of government (MIST). (No. 2019R1A2C1008115)
CSF with 2D U-Net. (top left) Input image; (top right) ground truth

Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on October 18,2024 at 06:06:31 UTC from IEEE Xplore. Restrictions apply.
※MSIT: Ministry of Science and ICT [4] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional
networks for biomedical image segmentation,” MICCAI, vol. 9351, pp.
234-241, November 2015.
[5] D.S. Marcus, T.H. Wang, J. Parker, J.G. Csernansky, J.G. Morris, and
REFERENCES R.L. Buckner, “Open Access Series of Imaging Studies (OASIS): Cross-
sectional MRI data in young, middle aged, nondemented, and demented
[1] E SHelhamer, J Long, T Darrell, “Fully convolutional networks for
older adults,” J. Cogn. Neurosci., vol. 19, pp. 1498-1507, September
semantic segmentation,” IEEE Trans Pattern Anal Mach Intell, vol. 39,
2007.
pp.640-651, 2017.
[6] F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K.H. Maier-
[2] K Kang, X Wang, “Fully convolutional neural networks for crowd
Hein, “Brain tumor segmentation and radiomics survival prediction:
segmentaiton, ArXiv.org Web site. http://arxiv.org/pdf/1411.4464.pdf,
Contribution to the BRATS 2017 challenge,” International MICCAI
Accessed April 1, 2017.
Brainlesion Workshop, vol. 10670, pp. 287-297, February 2018.
[3] T Brosch, Y Yoo, LYW Tang, Li DKB, A Traboulsee, R Tam, “Deep
[7] D.P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,”
convolutional encoder networks for multiple sclerosis lesion
arXiv preprint, arXiv:1412.6980, December 2014.
segmentation,” Medical Image Computing and Computer-Assisted
Intervention-MICCAI 2015, New York, Spinger, pp.3-11, 2015. [8] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, et al.,
“TensorFlow: Large-scale machine learning on heterogeneous
distributed systems,” arXiv preprint, arXiv:1603.04467, March 2016.

Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on October 18,2024 at 06:06:31 UTC from IEEE Xplore. Restrictions apply.

You might also like