Comparison of Tissue Segmentation Performance Between 2D U-Net and 3D U-Net On Brain MR Images
Comparison of Tissue Segmentation Performance Between 2D U-Net and 3D U-Net On Brain MR Images
[email protected] [email protected]
I. INTRODUCTION
Many techniques of deep learning have been demonstrating
remarkable performance in various applications such as
segmentation and registration for medical image analysis.
However, the selection of the most appropriate technique for a
given data is still challenging. Considerable interest has been
given to deep neural networks (DNNs), particularly
convolutional neural networks (CNNs), to resolve the problems
associated with medical imaging segmentation. Until recently,
two-dimensional CNN techniques have been applied mostly. Fig. 1. The architecture of 3D U-Net.
Recently, different CNN architectures [1-3] have been
proposed that feed through entire images. In particular, U-Net B. Methods
[4] is known for producing excellent performance for image
segmentation. Therefore, in this paper, we extend this to a 3D U-Net is a fully convolutional neural network developed for
convolutional network for the segmentation of MR images. biomedical image segmentation first proposed by Olaf et al. [4]
Then, we compare tissue segmentation performance of the 2D in 2015. The network architecture is very similar to
U-Net and 3D U-Net in brain MR images including convolutional autoencoders, consisting of a contracting path
Alzheimer’s disease. (“encoder”) and an expansive path (“decoder”). The main
feature of U-Net which distinguishes it from a standard
* corresponding author autoencoder is the skip connections between the encoder and
Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on October 18,2024 at 06:06:31 UTC from IEEE Xplore. Restrictions apply.
the decoder. The skip connections recover spatial information The mean DSCs for CSF, gray matter, and white matter
lost during down-sampling, which is critical for segmentation were 94.4%, 94.7% and 95.6% respectively. For the 2D U-Net,
tasks. predictions were made on a slice-by-slice basis, and the final
volumetric segmentation mask was produced by stacking all
The original U-Net was a 2D convolutional network. Here, the slices together. The mean DSCs from the 2D U-Net were
we extend this to a 3D convolutional network for segmentation very similar to those from the 3D U-Net for gray matter and
of MR images. The network architecture is shown in Fig. 1. white matter, but the mean DSC for CSF was noticeably lower
Each convolution block consists of two 3x3x3 convolutional (89.6% vs 94.4%). Also, the mean HDs from the 2D U-Net
layers with rectified linear unit (ReLU) activation. In the were about 2 mm higher compared to those from the 3D U-Net
contracting path, max-pooling operations were used to reduce for all classes.
the resolution of feature maps and allow for more features. In
the expansive path, spatial resolution was recovered by up- Fig. 2 shows the graphs of loss versus epoch for 2D U-Net
sampling and concatenating with high-resolution features from and 3D U-Net. While the 2D U-Net (Fig.2a) seemed to show a
the contracting path. The final output layer was a 1x1x1 more stable convergence behavior, overfitting was more severe
convolutional layer with appropriate number of output with the 2D U-Net than the 3D U-Net (Fig.2b) as well.
channels and softmax activation. The number of output
channels was 4 in this case since there were 4 classes including
the background. The classes were: background (0), CSF (1),
gray matter (2), and white matter (3).
The dataset was split into a training set and a test set by
setting aside 20% of the whole dataset as the test set and using
the other 80% as the training set. The network was trained with
the brain MRIs as the input images and the accompanying
volumetric segmentation files (produced using FreeSurfer
software) as the ground truth segmentation masks. The input
images were Z-normalized for network input. A multiclass
Dice loss, similar to the one used in [6], was used as the loss
function. Training was done using the Adam optimizer [7] with
a learning rate of 0.0005, batch size of 1, and for 25 epochs.
For comparison, a 2D U-Net was also trained on slices
extracted from the image volumes. The network architecture
was the same as the 3D U-Net except that the convolutional a)
layers were 2-dimensional. The 2D U-Net was trained with a
learning rate of 0.0005, batch size of 128, and for 25 epochs.
The networks were implemented using TensorFlow [8]
version 2.2 with Keras API (http://tensorflow.org/guide/keras)
and was trained on a high-performance computer with an
NVIDIA Titan Xp 12GB.
III. RESULTS
The trained 3D U-Net was evaluated by making predictions
on the held-out test set. Dice similarity coefficient (DSC) and
Hausdorff distance (HD) were calculated for each of the
predicted segmentation mask, and the results are summarized
in Table I.
Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on October 18,2024 at 06:06:31 UTC from IEEE Xplore. Restrictions apply.
segmentation; (bottom left) output from 2D U-Net; (bottom right) output from
3D U-Net.
IV. DISCUSSION
There are some benefits and drawbacks to using a fully
volumetric neural network for processing 3D images. The
results from this experiment suggest that networks with 3D
convolutions produce robust segmentation outputs, even for
imbalanced classes such as CSF. While slice-by-slice
segmentation using 2D U-Net also produced decent outputs, it
seems to be more vulnerable to class imbalance. In addition,
3D U-Net outperformed 2D U-Net in terms of HD for all
classes. This is probably due to the ability of the 3D U-Net to
leverage context from adjacent slices.
Another advantage of 3D U-Net is the simplicity of the
segmentation pipeline. Most tomographic images are available
as a single 3D image file rather than as a collection of multiple
2D images. The 3D U-Net can process a volumetric image with
minimal pre-processing while the 2D U-Net requires us to
extract slices from the image before feeding them into the
network. Also, during inference, the 3D U-Net will produce a
volumetric segmentation output directly, but the 2D U-Net will
produce slices of segmentation masks which need to be
combined into a 3D volume in order to overlay the mask on the
original image volume.
V. CONCLUSION
The comparison of 2D U-Net and 3D U-Net for
segmentation of brain MRI demonstrated that 3D U-Net
provides a better performance despite having less training
examples to learn from. The simple image processing pipeline
is an added advantage. However, the number of parameters and
the amount of memory required is a major limitation of using
3D U-Net. Although it is expected that hardware will keep
improving in coming years, the cost of advanced hardware can
still be a barrier to clinical use. It could be beneficial to devise
a method which can perform 3D image recognition tasks
without excessive computational cost and memory usage.
b) ACKNOWLEDGMENT
Fig. 3. Segmentation results. a) Example where the outputs from 2D U-Net This work was supported by the National Research
and 3D U-Net were similar. b) Example where there was a notable difference Foundation of Korea (NRF) grant funded by the Korea
between 2D U-Net and 3D U-Net. There was some erroneous segmentation of government (MIST). (No. 2019R1A2C1008115)
CSF with 2D U-Net. (top left) Input image; (top right) ground truth
Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on October 18,2024 at 06:06:31 UTC from IEEE Xplore. Restrictions apply.
※MSIT: Ministry of Science and ICT [4] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional
networks for biomedical image segmentation,” MICCAI, vol. 9351, pp.
234-241, November 2015.
[5] D.S. Marcus, T.H. Wang, J. Parker, J.G. Csernansky, J.G. Morris, and
REFERENCES R.L. Buckner, “Open Access Series of Imaging Studies (OASIS): Cross-
sectional MRI data in young, middle aged, nondemented, and demented
[1] E SHelhamer, J Long, T Darrell, “Fully convolutional networks for
older adults,” J. Cogn. Neurosci., vol. 19, pp. 1498-1507, September
semantic segmentation,” IEEE Trans Pattern Anal Mach Intell, vol. 39,
2007.
pp.640-651, 2017.
[6] F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K.H. Maier-
[2] K Kang, X Wang, “Fully convolutional neural networks for crowd
Hein, “Brain tumor segmentation and radiomics survival prediction:
segmentaiton, ArXiv.org Web site. http://arxiv.org/pdf/1411.4464.pdf,
Contribution to the BRATS 2017 challenge,” International MICCAI
Accessed April 1, 2017.
Brainlesion Workshop, vol. 10670, pp. 287-297, February 2018.
[3] T Brosch, Y Yoo, LYW Tang, Li DKB, A Traboulsee, R Tam, “Deep
[7] D.P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,”
convolutional encoder networks for multiple sclerosis lesion
arXiv preprint, arXiv:1412.6980, December 2014.
segmentation,” Medical Image Computing and Computer-Assisted
Intervention-MICCAI 2015, New York, Spinger, pp.3-11, 2015. [8] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, et al.,
“TensorFlow: Large-scale machine learning on heterogeneous
distributed systems,” arXiv preprint, arXiv:1603.04467, March 2016.
Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on October 18,2024 at 06:06:31 UTC from IEEE Xplore. Restrictions apply.