2017 Article 9983-Read
2017 Article 9983-Read
DOI 10.1007/s10278-017-9983-4
outlining structures slice-by-slice, and is not only expensive allowed training of deep learning algorithms with millions of
and tedious, but also inaccurate due to human error. Therefore, images and provided robustness to variations in images.
there is a need for automated segmentation methods to provide There are several types of deep learning approaches that
accuracy close to that of expert raters’ with a high consistency. have been developed for different purposes, such as object
As 3D and 4D imaging are becoming routine, and with detection and segmentation in images, speech recognition,
physiological and functional imaging increasing, medical im- and genotype/phenotype detection and classification of dis-
aging data is increasing in size and complexity. Therefore, it is eases. Some of the known deep learning algorithms are
essential to develop tools that can assist in extracting informa- stacked auto-encoders, deep Boltzmann machines, deep neu-
tion from these large datasets. Machine learning is a set of ral networks, and convolutional neural networks (CNNs).
algorithmic techniques that allow computer systems to make CNNs are the most commonly applied to image segmentation
data-driven predictions from large data. These techniques and classification.
have a variety of applications that can be tailored to the med- CNNs were first introduced in 1989 [6], but gained great
ical field. interest after deep CNNs achieved spectacular results in
There has been a significant effort in developing classical ImageNet [7, 8] competition in 2012 [9]. Applied on a dataset
machine learning algorithms for segmentation of normal (e.g., of about a million images that included 1000 different classes,
white matter and gray matter) and abnormal brain tissues (e.g., CNNs nearly halved the error rates of the previously best
brain tumors) in MRI. However, creation of the imaging fea- computing approaches [9].
tures that enable such segmentation requires careful engineer- CNN architectures are increasingly complex, with some sys-
ing and specific expertise. Furthermore, traditional machine tems having more than 100 layers, which means millions of
learning algorithms do not generalize well. Despite a signifi- weights and billions of connections between neurons. A typical
cant effort from the medical imaging research community, CNN architecture contains subsequent layers of convolution,
automated segmentation of the brain structures and detection pooling, activation, and classification (fully connected).
of the abnormalities remain an unsolved problem due to nor- Convolutional layer produces feature maps by convolving a ker-
mal anatomical variations in brain morphology, variations in nel across the input image. Pooling layer is used to downsample
acquisition settings and MRI scanners, image acquisition im- the output of preceding convolutional layers by using the maxi-
perfections, and variations in the appearance of pathology. mum or average of the defined neighborhood as the value passed
An emerging machine learning technique referred to as to the next layer. Rectified Linear Unit (ReLU) and its modifi-
deep learning [1], can help avoid limitations of classical ma- cations such as Leaky ReLU are among the most commonly
chine learning algorithms, and its self-learning of features may used activation functions. ReLU nonlinearly transforms data
enable identification of new useful imaging features for quan- by clipping any negative input values to zero while positive
titative analysis of brain MRI. Deep learning techniques are input values are passed as output [10]. To perform a prediction
gaining popularity in many areas of medical image analysis of an input data, the output scores of the final CNN layer are
[2], such as computer-aided detection of breast lesions [3], connected to loss function (e.g., cross-entropy loss that normal-
computer-aided diagnosis of breast lesions and pulmonary izes scores into multinomial distribution over labels). Finally,
nodules [4], and in histopathological diagnosis [5]. In this parameters of the network are found by minimizing a loss
survey, we provide an overview of state-of-the-art deep learn- function between prediction and ground truth labels with reg-
ing techniques in the field of brain MR segmentation and ularization constraints, and the network weights are updated at
discuss remaining gaps that have a potential to be fulfilled each iteration (e.g., using stochastic gradient descent – SGD)
by the use of deep learning techniques. using backpropagation until convergence (see Fig. 1).
Deep learning refers to neural networks with many layers We performed a thorough analysis of the literature using the
(usually more than five) that extract a hierarchy of features Google Scholar and NLM Pubmed search engines. We includ-
from raw input images. It is a new and popular type of ma- ed all found peer reviewed journal publications and confer-
chine learning techniques that extract a complex hierarchy of ence proceedings that describe applying deep learning to brain
features from images due to their self-learning ability as op- MRI segmentation. Since a large fraction of deep learning
posed to the hand-crafted feature extraction in classical ma- works are submitted to Arxiv (http://arxiv.org) first, we also
chine learning algorithms. They achieve impressive results included relevant Arxiv preprints. Conference proceedings
and generalizability by training on large amount of data. The that had a follow-up journal publication were included only
rapid increase in GPU processing power has enabled the de- in their final publication form. We divided papers into two
velopment of state-of-the-art deep learning algorithms. This groups: works on normal structures and on brain lesions. In
J Digit Imaging (2017) 30:449–459 451
59 ± 0.31 (Dice similarity coefficient, DSC) and 37.88 ± 30.06 methods for multifocal brain lesions often also include
(Hausdorff Distance, HD). lesion-wise metrics, such as lesion-wise true positive rate
(LTPR) and lesion-wise positive predictive value (LPPV).
mTOP This challenge calls for methods that focus on finding Measures such as accuracy and specificity tend to be avoided
differences between healthy subjects and Traumatic Brain in the lesion segmentation context since these measures do not
Injury (TBI) patients and sort the given data in distinct catego- discriminate between different segmentation outputs when the
ries in an unsupervised manner. Publicly available MRI data can object (lesion) is considerably smaller than the background
be downloaded from https://tbichallenge.wordpress.com/data. (normal-appearing brain tissue). In addition, measures of clin-
ical relevance are also commonly incorporated. These include
MSSEG The goals of this challenge are evaluating state-of- such measures as correlation analysis of total lesion load or
the-art and advanced segmentation methods from the partici- count as detected by automated and manual segmentation and
pants on MS data. For this, they evaluate both lesion detection volume or volume change correlation. Significance tests com-
(how many lesions are detected) and lesion segmentation monly accompany contributions that build on or compare to
(how precisely the lesions are delineated) on a multicenter other methods, most often nonparametric tests such as
database (38 patients from four different centers, imaged on Wilcoxon’s signed rank of Wilcoxon’s rank sum tests are
1.5 or 3T scanners, each patient being manually annotated by preferred.
seven experts). In addition to this classical evaluation, they
provide a common infrastructure to evaluate the algorithms Image Preprocessing
such as running time comparison and the degree of automa-
tion. The data can be obtained from https://portal.fli-iam.irisa. Automated analysis of MR images is challenging due to in-
fr/msseg-challenge/data. tensity inhomogeneity, variability of the intensity ranges and
contrast, and noise. Therefore, prior to automated analysis,
NeoBrainS12 The aim of the NeoBrainS12 challenge is to certain steps are required to make the images appear more
compare algorithms for segmentation of neonatal brain tissues similar, and these steps are commonly referred to as prepro-
and measurement of corresponding volumes using T1 and T2 cessing. Typical preprocessing steps for structural brain MRI
MRI scans of the brain. The comparison is performed for the include the following key steps.
following structures: cortical and central gray matter, non-
myelinated and myelinated white matter, brainstem and cere- Registration Registration is spatial alignment of the images to
bellum, and cerebrospinal fluid in the ventricles and in the a common anatomical space [14]. Interpatient image registra-
extracerebral space. Training set includes T1 and T2 MR im- tion aids in standardizing the MR images onto a standard
ages of two infants at 30 and 40 weeks ages. Test set includes stereotaxic space, commonly MNI or ICBM. Intrapatient reg-
T1 and T2 MRI of five infants. The data and evaluation results istration aims to align the images of different sequences, e.g.,
of algorithms that has been submitted to the challenge can be T1 and T2, to obtain a multi-channel representation for each
downloaded from http://neobrains12.isi.uu.nl/. location within the brain.
MRBrainS The aim of the MRBrainS evaluation framework Skull Stripping Skull stripping is the process of removing the
is to compare algorithms for segmentation of gray matter, skull from images to focus on intracranial tissues. The most
white matter, and cerebrospinal fluid on multi-sequence (T1- common methods used for this purpose have been BET [15],
weighted, T1-weighted-inversion recovery, and FLAIR) 3 Robex [16], and SPM [16, 17].
Tesla MRI scans of the brain. Five brain MRI scans with
manual segmentations are provided for training and 15 only Bias Field Correction Bias Field Correciton is the correction
MRI scans are provided for testing. The data can be of the image contrast variations due to magnetic field inhomo-
downloaded from http://mrbrains13.isi.uu.nl. The geneity [18]. The most commonly adopted approach is N4
performance (DSC) of the current winner algorithm on this bias field correction.
dataset is 86.15% for gray matter, 89.46% for white matter,
and 84.25% for cerebrospinal fluid segmentation. Intensity Normalization Intensity Normalization is the pro-
The most common quantitative measures used for evalua- cess of mapping intensities of all images into a standard or
tion brain MRI segmentation methods are listed below and reference scale, e.g., between 0 and 4095. The algorithm by
shown in Table 1. Typically, the methods for normal structure Nyul et al. [19], which uses piecewise linear mapping of im-
or tumor segmentation include voxel-wise metrics, such as age intensities into a reference scale, is one of the most popular
DSC, true positive rate (TPR), positive predictive value normalization techniques. In the context of deep learning
(PPV), and lesion surface metrics, such as HD and average frameworks, computing z-scores, where one subtracts the
symmetric surface distance (ASSD). On the other hand, mean image intensity from all pixels in an image and divides
J Digit Imaging (2017) 30:449–459 453
level and lesion level (FPL, TPL, Dice similarity coefficient, DSC
and FNL, respectively). ∂S and DSC ¼ 2TPþFPþFN
2TP
pixels by the standard deviation of intensities, is another pop- semantic segmentation [30, 31]. Similar to autoencoders, they
ular normalization technique. include encoder part that extracts features and decoder part that
upsamples or deconvolves the higher level features from the
Noise Reduction Noise reduction is the reduction of the encoder part and combines lower level features from the encod-
locally-variant Rician noise observed in MR images [20]. er part to classify pixels. The input image is mapped to the
With advent of deep learning techniques, some of the pre- segmentation labels in a way that minimizes a loss function.
processing steps became less critical for the final segmentation
performance. For instance, bias correction and quantile-based Cascaded CNN Architecture This type of architecture com-
intensity normalization are often successfully replaced by the bines two CNN architectures [32]. The output of the first CNN
z-score computation alone [2, 21]; however, another work is used as an input to the second CNN to obtain classification
shows improvement when applying normalization prior to results. The first CNN is used to train the model with initial
deep learning based segmentation procedure [22]. At the same prediction of class labels while second CNN is used to further
time, the new methods for these preprocessing routines are tune the results of the first CNN.
also arising, including deep learning based registration [23],
skull stripping [24], and noise reduction [25]. Segmentation of Normal Brain Structure
Current CNN Architecture Styles Accurate automated segmentation of brain structures, e.g.,
white matter (WM), gray matter (GM), and cerebrospinal fluid
Patch-Wise CNN Architecture This is a simple approach to (CSF), in MRI is important for studying early brain develop-
train a CNN algorithm for segmentation. An NxN patch ments in infants and quantitative assessment of the brain tissue
around each pixel is extracted from a given image, and the and intracranial volume in large scale studies. Atlas-based ap-
model is trained on these patches and given class labels to proaches [33–36], which match intensity information between
correctly identify classes such as normal brain and tumor. an atlas and target images and pattern recognition approaches
The designed networks contain multiple convolutional, acti- [37–39], which classify tissues based on a set of local intensity
vation, pooling, and fully connected layers sequentially. Most features, are the classical approaches that have been used for
of the current popular architectures [21, 22, 26, 27] use this brain tissue segmentation. In recent years, CNNs have been
approach. To improve the performance of patch-wise architec- adopted for segmentation of brain tissues, which avoid the ex-
tures, multiscale CNNs [28, 29] use multiple pathways, where plicit definition of spatial and intensity features and provide
each uses a patch of different size around the same pixel. The better performance than classical approaches, as we describe
output of these pathways are combined by a neural network next (see Table 2 for the list of studies).
and the model trained to correctly identify the given class Zhang et al. [27] presented a 2D (input patch size 13 × 13
labels (Figs. 2, 3, and 4). pixels) patch-wise CNN approach to segment WM, GM, and
CSF from multimodal (i.e., T1, T2, and fractional anisotropy)
Semantic-Wise CNN Architecture This type of architecture MR images of infants. They showed that their CNN approach
makes predictions for each pixel of the whole input image like outperforms prior methods and classical machine learning
454 J Digit Imaging (2017) 30:449–459
algorithms using support vector machine (SVM) and random volume, count, and progression, to quantify treatment response
forest (RF) classifiers (overall DSC performance of the associated diseases, such as brain cancer, MS, and stroke.
85.03% ∓ 2.27% (CNN) vs. 76.95% ∓ 3.55% (SVM), Reliable extraction of these biomarkers depends on prior accu-
83.15% ∓ 2.52% (RF)). Nie et al. [30] presented a semantic- rate segmentation. Despite the significant effort in brain lesion
wise fully convolutional networks (FCNs) to segment infant segmentation and advanced imaging techniques, accurate seg-
brain images from the same dataset that Zhang et al. [27] used mentation of brain lesions remains a challenge. Many automat-
in their study. They obtained improved results compared to [27]. ed methods have been proposed for lesion segmentation prob-
Their overall DSC were 85.5% (CSF), 87.3% (GM), and 88.7% lem, including unsupervised modeling methods that aim to au-
(WM) vs. 83.5% (CSF), 85.2 (GM), and 86.4 (WM) by [27]. De tomatically adapt to new image data [43–45] supervised ma-
Brebisson et al. [40] presented a 2D (I = 292) and 3D (I = 133) chine learning methods that, given a representative dataset,
patch-wise CNN approach to segment human brain to learn the textural and appearance properties of lesions [46],
anatomical regions. They achieved competitive results and atlas-based methods that combine both supervised and un-
(DSC = 72.5% ∓ 16.3%) in MICCAI 2012 challenge on multi- supervised learning into a unified pipeline by registering la-
atlas labeling as the first CNN approach applied to the task. beled data or a known cohort data into a common anatomical
Moeskops et al. [28] presented a multi-scale (252,512,752 pixels) space [47–49]. Several review papers provide overview of clas-
patch-wise CNN approach to segment brain images of infants sical methods for brain tumor segmentation [50], and MS lesion
and young adults. They obtained overall DSC = 73.53% vs. segmentation [51, 52]. For more information and detail on the
72.5% by [40] in MICCAI challenge on multi-atlas labeling. classical approaches, we refer the reader to those studies.
Bao et al. [41] also presented a multi-scale patch-wise CNN Several deep learning studies have shown superior perfor-
together with dynamic random walker with decay region of in- mances to the classical state-of-art methods (see Table 4).
terest to obtain smooth segmentation of subcortical structures in Havaei et al. [26] presented a 2D (33 × 33 pixels) patch-
IBSR (developed by the Centre for Morphometric Analysis at wise architecture using local and global CNN pathways,
Massachusetts General Hospital-available at https://www.nitrc. which exploits local and global contextual features around a
org/projects/ibsr to download) and LPBA40 [42] datasets. They pixel to segment brain tumors. The local pathway includes
reported overall DSC of 82.2 and 85% for IBSR and LPBA40, two convolutional layers with kernel sizes of 7 × 7 and
respectively. CNN-based deep learning approaches have shown 5 × 5, respectively, while the global pathway includes one
the top performances on NeoBrainS12 and MRBrainS (see convolutional layer with kernel size of 11 × 11. To tackle the
Table 3) challenges. Their computation time at testing phase difficulties raised by imbalance of tumor vs. normal brain
was also much less than classical machine learning algorithms. labels, where the fraction of latter is above 90% of total sam-
ples, they introduced two phase training which included train-
Segmentation of Brain Lesions ing first with data that had equal class probability and then
training only the output layer with the unbalanced data (i.e.,
Quantitative analysis of brain lesions include measurement of keeping the weights of all the other layers unchanged). They
established imaging biomarkers such as the largest diameter, also explored cascaded architectures in their study. They
reported that their CNN approach outperformed and was 2D patch-wise CNN approach that mapped input patches to
much faster at testing phase (3 vs. 100 min) than the winner n groups of structured local predictions that took into account
of BRATS 2013 competition. the labels of the neighboring pixels. They reported results on
In another study, Havaei et al. [56] presented an overview Brats 2014 data that were comparable to those of state-of-art
of brain tumor segmentation with deep learning, which also approaches. Most of these studies have also been presented in
described the use of cascaded architecture. Pereira et al. [22] last two MICCAI conference as part of the BRATS challenge.
presented a 2D patch-wise architecture, but compared to We refer the reader to BRATS proceedings 2015–2016 [57]
Havaei et al., they used small 3 × 3 convolutional kernels for further details such as performance comparison and
which allowed deeper architectures, patch intensity normali- ranking.
zation, and data augmentation by rotation of patches. They CNN-based deep learning architectures have also been
also designed two separate models for each grade—high- used for segmentation of stroke and MS lesions, detection of
grade (HG) and low-grade (LG) tumors. The model for HG cerebral microbleeds, and prediction of therapy response.
tumors included six convolutional layers and three fully con- Brosch et al. [31] presented a 3D semantic-wise CNN to seg-
nected layers while the model for LG included four ment MS lesions from MRI. They evaluated their method on
convolutional layers and three fully connected layers. They two publicly available datasets, MICCAI 2008 and ISBI 2015
also used leaky ReLU for activation function, which allowed challenges, and compared their method to freely available and
gradient flow in contrast to rectified linear units that impose widely used segmentation methods. They reported perfor-
constant zero to negative values. Their method showed the mance comparable to the state of the art methods and superior
best performance on the Brats 2013 data – DSC values of to the publicly available MS segmentation methods. Dou et al.
0.88, 083, 0.77 for complete, core, and enhancing regions, [32] presented a cascaded framework that included 3D
respectively. They were also ranked as second place in Brats semantic-wise CNN and a 3D patch-wise CNN to detect ce-
2015 data. Zhao and Jia [53] also used a patch-wise CNN rebral microbleeds (CM) from MRI. They reported their meth-
architecture using triplanar (axial, sagittal, coronal) 2D slices od outperformed previous studies with low level descriptors
to segment brain tumors. They have obtained comparable re- and provided a high sensitivity of 93.2% for detecting CM.
sults to state-of-art machine learning algorithms on Brats 2013 Maier et al. [55] presented a comparison study that evaluated
data. Kamnitsas et al. [21] presented a 3D dense-inference and compared nine classification methods (e.g., naive Bayes,
patch-wise and multi-scale CNN architecture that uses 3D random forest, and CNN) for ischemic stroke lesion segmen-
(3 × 3 × 3 pixels) convolutional kernels and two pathway tation. Their results showed that cascaded CNN and random
learning similar to [26]. They also used a 3D fully connected decision forest approaches outperforms all other methods.
conditional random field to effectively remove false positives, Akkus et al. [29] presented prediction of 1p19q chromosomal
which is an important post-processing step that was not de- co-deletion, which is associated with positive response to
scribed in previous studies. They reported the top ranking treatment in low grade gliomas from MRI using a 2D patch-
performance on Brats 2015. Dvorak et al. [54] presented a wise and multi-scale CNN. The performance of their CNN
Zhang et al. 2015 [27] Patch-wise 2D DSC 83.5% (CSF), 85.2% (GM), 86.4% (WM) Private data (10 healthy infants)
Nie et al. 2016 [30] Semantic-wise 2D DSC 85.5% (CSF), 87.3% (GM), 88.7% (WM) Private data (10 healthy infants)
de Brebisson et al. 2015 [40] Patch-wise 2D/3D Overall DSC 72.5% ∓ 16.3% MICCAI 2012-multi-atlas labeling
Moeskops et al. 2016 [28] Patch-wise 2D/3D Overall DSC 73.53% MICCAI 2012-multi-atlas labeling
Bao et al. 2016 [41] Patch-wise 2D DSC 82.2%/85% IBSR/LPBA40
456 J Digit Imaging (2017) 30:449–459
Table 3 Top ten ranking of algorithms in MRBrainS challenge (Complete list is available at: http://mrbrains13.isi.uu.nl/results.php)
approach on an unseen test set was 93.3% (sensitivity) and techniques such as deep learning that would handle these
82.22% (specificity) for detection of 1p19q status from MRI. variabilities.
Despite a significant breakthrough, the potential of deep
learning is limited because the medical imaging datasets are
Discussion relatively small, and this limits the ability of the methods to
manifest their full power, compared to what they have dem-
The recent advances reported in literature indicate significant onstrated on large-scale datasets (e.g., millions of images)
potential for deep learning techniques in the field of quantita- such as ImageNet. While some authors report that their super-
tive brain MR image analysis. Even though deep learning vised frameworks require only one training sample [28], most
approaches have been applied to brain MRI only recently, they researchers report that their results were consistently improv-
tend to outperform previous state of the art classical machine ing with an increase in size of training datasets [58, 59]. There
learning algorithms and are becoming more mature. Brain is high demand for large-scale datasets for effective applica-
image analysis has been a great challenge to computer-aided tion of deep learning methods. Alternatively, the size of the
techniques due to complex brain anatomy and variability of its dataset can be effectively increased by applying random trans-
appearance, non-standardized MR scales due to variability in formations to the original data such as flipping, rotation, trans-
imaging protocols, image acquisition imperfection, and pres- lation, and deformation. This is commonly used in machine
ence of pathology. Therefore, there is a need for more generic learning and known as data augmentation. Data augmentation
Havaei et al. 2016 [26] Tumor segmentation Patch-wise 2D DSC 0.88 (complete), 0.79 BRATS-2013
(core), 0.73 (enhancing)
S. Pereira et al. 2016 [22] Tumor segmentation Patch-wise 2D DSC 0.88 (complete), 0.83 BRATS-2013
(core), 0.77 (enhancing)
Zhao and Jia 2015 [53] Tumor segmentation Patch-wise 2D Overall accuracy 0.81 BRATS-2013
Kamnitsas et al. 2016 [21] Tumor segmentation Patch-wise 3D DSC 0.9 (complete), 0.75 BRATS-2015
(core), 0.73 (enhancing)
Dvorak et al. 2015 [54] Tumor segmentation Patch-wise 2D DSC 0.83 (complete), 0.75 BRATS-2014
(core), 0.77 (enhancing)
Brosch et al. 2016 [31] MS segmentation Semantic-wise 3D DSC 0.68 (ISBI); DSC 0.84 MICCAI
(MICCAI) 2008-ISBI 2015
Dou et al. 2016 [32] Cerebral microbleed Cascaded 3D Sensitivity 98.29% Private data
detection (semantic/patch-wise) (320 subjects)
Maier et al. 2015 [55] Ischemic stroke detection Patch-wise 2D DSC 0.67 ± 0.18; HD 29.64 Private data
± 24.6 (37 subjects)
Akkus et al. 2016 [29] Tumor genomic prediction Patch-wise 2D 0.93 (sensitivity), 0.82 Private data
(specificity), and 0.88 (159 subjects)
(accuracy)
J Digit Imaging (2017) 30:449–459 457
helps increase the size of training examples and reduce architectures also are more susceptible to class imbalance but
overfitting by introducing random variations to the original this can be solved by weighting the classes in the loss function
data. Multiple studies have reported the data augmentation [31]. Cascaded architectures such as a patch-wise architecture
to be very useful in their studies [9, 22, 29]. following a semantic architecture as used in [32] would re-
Several steps are crucial to improve the learning with deep solve the issues raised by each approach and refine the output
learning approaches, including data preprocessing, data post- results.
processing, network weight initialization, and strategies to Developing a generic deep learning approach that will
prevent overfitting. Image preprocessing plays a key role in work on datasets from different machines and institutions is
learning. Multiple preprocessing steps have been applied in challenging due to limited training and ground truth data, var-
current studies to improve learning process, as presented in iations and image acquisition protocols, imperfections of each
Sections 2.5 and 2.6. For example, it is important to have MRI scanner, and variations in appearance of healthy and
intensities of input brain MR images in a reference scale and pathological brain tissue. So far, currently available methods
normalized for each modality. This avoids suppression of true were randomly initialized and trained on a limited data. To
patterns of structures by any modality and intensity differ- improve the generalization of deep learning architectures,
ences in the output of the model. Post-processing of the output one can adapt a well performing deep learning network trained
of model is also an important step to refine the segmentation on a large dataset and fine-tune that network on a smaller
results. The goal of any learning method is to have a perfect dataset specific to the problem, which is called transfer learn-
classification, but there are always regions in images that over- ing. It has been shown that transferring the weights (network
lap between classes, known as partial volume effect, which parameters) from a pre-trained generic network to train on a
unavoidably leads to false positives or negatives. These re- specific dataset is better than random weight initialization of
gions require additional processing for accurate quantifica- the network [61]. The usefulness and success of transfer learn-
tion. Another important step is proper network parameter ini- ing depends on similarity between datasets. For instance,
tialization in the neural network to maintain the gradient flow using pre-trained models from ImageNet, which is trained
through network and to achieve convergence. Otherwise, the on a large RGB image database, might not perform well on
activations and gradient flow can vanish and result in no con- medical images without further training. Shin et al. [62] re-
vergence and learning. Random weight initialization has been ported that they obtained best performance with transfer learn-
used in most of the current studies. Lastly, preventing ing from pre-trained model on ImageNet dataset and fine-
overfitting is critical to learn the true information in images, tuning on lymph node and interstitial lung disease rather than
and avoiding overfitting to specific training examples provid- training from scratch. On the other hand, the nature of the
ed. Deep networks are particularly susceptible to overfitting ImageNet dataset is much different than medical image dataset
because several thousands or millions of parameters are used and therefore transfer learning from ImageNet might not the
in the networks and limited training data is available. Several best choice for medical images as shown in [63].
strategies have been used to prevent overfitting such as data
augmentation that introduces random variations to input data
[9, 22, 29], using dropout that randomly removes nodes from Summary
network during training [22, 32, 54], and L1/L2 regularization
that introduces weight penalties [26]. In current deep learning Despite the significant impact of deep learning techniques in
architectures, one or more of these strategies are used to pre- quantitative brain MRI, it is still challenging to have a generic
vent overfitting. method that will be robust to all variations in brain MR images
Semantic-wise architectures take inputs of any size and from different institutions and MRI scanners. The perfor-
produce a classification map while patch-wise CNN architec- mance of the deep learning methods depends highly on sev-
tures take fixed-sized inputs and produce non-spatial outputs. eral key steps such as preprocessing, initialization, and post-
Therefore, semantic-wise architectures produce results for processing. Also, training datasets are relatively small com-
each pixel/voxel of an image much faster than patch-wise pared to large-scale ImageNet dataset (e.g., millions of im-
architectures. As presented in [60], it takes 22 ms to produce ages) to achieve generalization across datasets. Moreover, cur-
10 × 10 grid of output from 500 × 500 input image for rent deep learning architectures are based on supervised learn-
semantic-wise FCN while it takes 1.2 ms for patch-wise ing and require generation of manual ground truth labels,
AlexNet [9] to infer a single value classification output of a which is tedious work on a large-scale data. Therefore, deep
227 × 227 image, which is more than five times improvement learning models that are highly robust to variations in brain
in computation speed (22 vs. 120 ms). On the other hand, MRI or have unsupervised learning capability with less re-
random sampling of patches over a dataset potentially results quirement on ground truth labels are needed. In addition, data
in faster convergence (LeCun et al. 1998) compared to full augmentation approaches that realistically mimic variations in
image training in semantic-wise architectures. Semantic-wise brain MRI data could alleviate the need of large amount of
458 J Digit Imaging (2017) 30:449–459
data. Transfer learning could be used to share well-performing validation of image segmentation,^ IEEE Trans. Med. Imaging, vol.
23, no. 7, pp. 903–921, 2004.
deep learning models, which are trained on normal and path-
13. A. Akhondi-Asl, L. Hoyte, M. E. Lockhart, and S. K. Warfield, BA
ological brain MRI data, among brain imaging research com- logarithmic opinion pool based STAPLE algorithm for the fusion of
munity and improve the generalization ability of these models segmentations with associated reliability weights,^ IEEE Trans.
across datasets with less effort than learning from scratch. Med. Imaging, vol. 33, no. 10, pp. 1997–2009, 2014.
14. A. Klein et al., BEvaluation of 14 nonlinear deformation algorithms
applied to human brain MRI registration,^ Neuroimage, vol. 46, no.
Acknowledgements This work was supported by National Institutes of 3, pp. 786–802, 2009.
Health 1U01CA160045, U01CA142555, 1U01CA190214, and 15. S. M. Smith, BFast robust automated brain extraction,^ Hum. Brain
1U01CA187947. Mapp., vol. 17, no. 3, pp. 143–155, 2002.
16. J. E. Iglesias, C.-Y. Liu, P. M. Thompson, and Z. Tu, BRobust brain
Compliance with Ethical Standards extraction across datasets and comparison with publicly available
methods,^ IEEE Trans. Med. Imaging, vol. 30, no. 9, pp. 1617–
Conflict of Interest The authors declare that they have no conflict of 1634, 2011.
interest. 17. J. Ashburner and K. J. Friston, BUnified segmentation,^
Neuroimage, vol. 26, no. 3, pp. 839–851, 2005.
18. U. Vovk, F. Pernus, and B. Likar, BA review of methods for correc-
Open Access This article is distributed under the terms of the Creative
tion of intensity inhomogeneity in MRI,^ IEEE Trans. Med.
Commons Attribution 4.0 International License (http://
Imaging, vol. 26, no. 3, pp. 405–421, 2007.
creativecommons.org/licenses/by/4.0/), which permits unrestricted use,
19. L. G. Nyúl and J. K. Udupa, BOn standardizing the MR image
distribution, and reproduction in any medium, provided you give appro-
intensity scale,^ Magn. Reson. Med., vol. 42, no. 6, pp. 1072–
priate credit to the original author(s) and the source, provide a link to the
1081, 1999.
Creative Commons license, and indicate if changes were made.
20. P. Coupe, P. Yger, S. Prima, P. Hellier, C. Kervrann, and C. Barillot,
BAn optimized blockwise nonlocal means denoising filter for 3-D
magnetic resonance images,^ IEEE Trans. Med. Imaging, vol. 27,
no. 4, pp. 425–441, 2008.
21. Kamnitsas K et al.: Efficient multi-scale 3D CNN with fully con-
References
nected CRF for accurate brain lesion segmentation. Med. Image
Anal. 36:61–78, 2016
1. Y. LeCun, Y. Bengio, and G. Hinton, BDeep learning,^ Nature, vol. 22. Pereira S, Pinto A, Alves V, Silva CA: BBrain Tumor Segmentation
521, no. 7553, pp. 436–444, 2015. using Convolutional Neural Networks in MRI Images,^ IEEE
2. Lin D, Vasilakos AV, Tang Y, Yao Y: Neural networks for Trans. Med. Imaging, Mar. 2016.
c o m p u t e r - a i d e d d i a g n o s i s i n m e d i c i n e : A r e v i e w. 23. G. Wu, M. Kim, Q. Wang, Y. Gao, S. Liao, and D. Shen,
Neurocomputing 216:700–708, 2016 BUnsupervised deep feature learning for deformable registration
3. Kooi T et al.: Large scale deep learning for computer aided detec- of MR brain images,^ Med. Image Comput. Comput. Assist.
tion of mammographic lesions. Med. Image Anal. 35:303–312, Interv., vol. 16, no. Pt 2, pp. 649–656, 2013.
2017 24. Kleesiek J et al.: Deep MRI brain extraction: A 3D convolutional
4. Cheng J-Z et al.: Computer-aided diagnosis with deep learning neural network for skull stripping. Neuroimage 129:460–469, 2016
architecture: Applications to breast lesions in US images and pul- 25. Gondara L: BMedical image denoising using convolutional
monary nodules in CT scans. Sci. Rep. 6:24454, 2016 denoising autoencoders,^ arXiv [cs.CV], 2016.
5. Litjens G et al.: Deep learning as a tool for increased accuracy and 26. Havaei M et al.: Brain tumor segmentation with deep neural net-
efficiency of histopathological diagnosis. Sci. Rep. 6:26286, 2016 works. Med. Image Anal. 35:18–31, 2016
6. Y. LeCun et al., BBackpropagation applied to handwritten zip code 27. Zhang W et al.: Deep convolutional neural networks for multi-
recognition,^ Neural Comput., vol. 1, no. 4, pp. 541–551, 1989. modality isointense infant brain image segmentation. Neuroimage
108:214–224, 2015
7. Deng J, et al.: BImageNet: A large-scale hierarchical image data-
28. P. Moeskops et al., BAutomatic segmentation of MR brain images
base,^ in 2009 I.E. Conference on Computer Vision and Pattern
with a convolutional neural network,^ IEEE Trans. Med. Imaging,
Recognition, 2009.
vol. 35, no. 5, pp. 1252–1261, 2016.
8. O. Russakovsky et al., BImageNet large scale visual recognition 29. Akkus Z, et al.: BPredicting 1p19q Chromosomal Deletion of Low-
challenge,^ Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015. Grade Gliomas from MR Images using Deep Learning,^ arXiv
9. Krizhevsky A, Sutskever I, Hinton GE: ImageNet classification [cs.CV], 2016.
with deep convolutional neural networks. In: Pereira F, Burges 30. Nie D, Dong N, Li W, Yaozong G, Dinggang S: BFully
CJC, Bottou L, Weinberger KQ Eds. Advances in neural informa- convolutional networks for multi-modality isointense infant brain
tion processing systems 25. USA: Curran Associates, Inc., 2012, image segmentation,^ in 2016 I.E. 13th International Symposium
pp. 1097–1105 on Biomedical Imaging (ISBI), 2016.
10. He K, Zhang X, Ren S, Sun J: BDelving Deep into Rectifiers: 31. T. Brosch et al., BDeep 3D convolutional encoder networks with
Surpassing Human-Level Performance on ImageNet Classification,^ shortcuts for multiscale feature integration applied to multiple scle-
in 2015 I.E. International Conference on Computer Vision (ICCV), rosis lesion segmentation,^ IEEE Trans. Med. Imaging, vol. 35, no.
2015. 5, pp. 1229–1239, 2016.
11. G. P. Mazzara, R. P. Velthuizen, J. L. Pearlman, H. M. Greenberg, 32. Q. Dou et al., BAutomatic detection of cerebral Microbleeds from
and H. Wagner, BBrain tumor target volume determination for radi- MR images via 3D convolutional neural networks,^ IEEE Trans.
ation treatment planning through automated MRI segmentation,^ Med. Imaging, vol. 35, no. 5, pp. 1182–1195, 2016.
Int. J. Radiat. Oncol. Biol. Phys., vol. 59, no. 1, pp. 300–312, 2004. 33. Srhoj-Egekher V, Manon JN, Viergever MA, Išgum I: BAutomatic
12. S. K. Warfield, K. H. Zou, and W. M. Wells, BSimultaneous truth neonatal brain tissue segmentation with MRI,^ in Medical Imaging
and performance level estimation (STAPLE): An algorithm for the 2013: Image Processing, 2013.
J Digit Imaging (2017) 30:449–459 459
34. P. Anbeek et al., BAutomatic segmentation of eight tissue classes in 49. M. Prastawa, E. Bullitt, S. Ho, and G. Gerig, BA brain tumor seg-
neonatal brain MRI,^ PLoS One, vol. 8, no. 12, p. e81895, 2013. mentation framework based on outlier detection,^ Med. Image
35. H. A. Vrooman et al., BMulti-spectral brain tissue segmentation Anal., vol. 8, no. 3, pp. 275–283, 2004.
using automatically trained k-nearest-neighbor classification,^ 50. S. Bauer, R. Wiest, L.-P. Nolte, and M. Reyes, BA survey of MRI-
Neuroimage, vol. 37, no. 1, pp. 71–81, 2007. based medical image analysis for brain tumor studies,^ Phys. Med.
36. A. Makropoulos et al., BAutomatic whole brain MRI segmentation Biol., vol. 58, no. 13, pp. R97–129, 2013.
of the developing neonatal brain,^ IEEE Trans. Med. Imaging, vol. 51. X. Lladó et al., BAutomated detection of multiple sclerosis lesions
33, no. 9, pp. 1818–1831, 2014. in serial brain MRI,^ Neuroradiology, vol. 54, no. 8, pp. 787–807,
37. Wang L et al.: LINKS: Learning-based multi-source IntegratioN 2012.
frameworK for segmentation of infant brain images. Neuroimage 52. D. García-Lorenzo, S. Francis, S. Narayanan, D. L. Arnold, and D.
108:160–172, 2015 L. Collins, BReview of automatic segmentation methods of multiple
38. Moeskops P et al.: Automatic segmentation of MR brain images of sclerosis white matter lesions on conventional magnetic resonance
preterm infants using supervised classification. Neuroimage 118: imaging,^ Med. Image Anal., vol. 17, no. 1, pp. 1–18, 2013.
628–641, Sep. 2015 53. Zhao L, Jia K: BDeep Feature Learning with Discrimination
39. Chiţă SM, Benders M, Moeskops P, Kersbergen KJ, Viergever MA, Mechanism for Brain Tumor Segmentation and Diagnosis,^ in
Išgum I: BAutomatic segmentation of the preterm neonatal brain 2015 International Conference on Intelligent Information Hiding
with MRI using supervised classification,^ in Medical Imaging and Multimedia Signal Processing (IIH-MSP), 2015.
2013: Image Processing, 2013. 54. Dvořák P, Pavel D, Bjoern M: BLocal Structure Prediction with
40. A. de Brebisson, M. Giovanni: BDeep neural networks for anatom- Convolutional Neural Networks for Multimodal Brain Tumor
ical brain segmentation,^ in 2015 I.E. Conference on Computer Segmentation,^ in Lecture Notes in Computer Science pp. 59–71,
Vision and Pattern Recognition Workshops (CVPRW), 2015. 2016.
41. Bao S, Siqi B, Chung ACS: BMulti-scale structured CNN with label 55. O. Maier, C. Schröder, N. D. Forkert, T. Martinetz, and H. Handels,
consistency for brain MR image segmentation,^ Computer BClassifiers for ischemic stroke lesion segmentation: A comparison
Methods in Biomechanics and Biomedical Engineering: Imaging
study,^ PLoS One, vol. 10, no. 12, p. e0145118, 2015.
& Visualization, pp. 1–5, 2016.
56. Havaei M, Guizard N, Larochelle H, Jodoin PM: BDeep Learning
42. D. W. Shattuck et al., BConstruction of a 3D probabilistic atlas of
Trends for Focal Brain Pathology Segmentation in MRI,^ in
human cortical structures,^ Neuroimage, vol. 39, no. 3, pp. 1064–
Lecture Notes in Computer Science pp. 125–148, 2016.
1080, 2008.
57. B. H. Menze et al., BThe multimodal brain tumor image segmenta-
43. C. H. Sudre, M. J. Cardoso, W. H. Bouvy, G. J. Biessels, J. Barnes,
tion benchmark (BRATS),^ IEEE Trans. Med. Imaging, vol. 34, no.
and S. Ourselin, BBayesian model selection for pathological neuro-
10, pp. 1993–2024, 2015.
imaging data applied to white matter lesion segmentation,^ IEEE
Trans. Med. Imaging, vol. 34, no. 10, pp. 2079–2102, 2015. 58. Cho J, Lee K, Shin E, Choy G, Do S: BHow much data is needed to
44. A. Galimzianova, F. Pernuš, B. Likar, and Ž. Špiclin, BStratified train a medical image deep learning system to achieve necessary
mixture modeling for segmentation of white-matter lesions in brain high accuracy?,^ arXiv [cs.LG], 2015.
MR images,^ Neuroimage, vol. 124, no. Pt A, pp. 1031–1043, 59. Lekadir K, et al.: BA Convolutional Neural Network for Automatic
2016. Characterization of Plaque Composition in Carotid Ultrasound,^
45. N. Weiss, D. Rueckert, and A. Rao, BMultiple sclerosis lesion seg- IEEE J Biomed Health Inform, 2016.
mentation using dictionary learning and sparse coding,^ Med. 60. Long J, Shelhamer E, Darrell T: BFully convolutional networks for
Image Comput. Comput. Assist. Interv., vol. 16, no. Pt 1, pp. semantic segmentation,^ in 2015 I.E. Conference on Computer
735–742, 2013. Vision and Pattern Recognition (CVPR), 2015.
46. Z. Karimaghaloo, H. Rivaz, D. L. Arnold, D. L. Collins, and T. 61. Yosinski J, Clune J, Bengio Y, Lipson H: How transferable are
Arbel, BTemporal hierarchical adaptive texture CRF for automatic features in deep neural networks? In: Ghahramani Z, Welling M,
detection of gadolinium-enhancing multiple sclerosis lesions in Cortes C, Lawrence ND, Weinberger KQ Eds. Advances in neural
brain MRI,^ IEEE Trans. Med. Imaging, vol. 34, no. 6, pp. 1227– information processing systems 27. USA: Curran Associates, Inc.,
1241, 2015. 2014, pp. 3320–3328
47. X. Tomas-Fernandez and S. K. Warfield, BA model of population 62. H.-C. Shin et al., BDeep convolutional neural networks for
and subject (MOPS) intensities with application to multiple sclero- computer-aided detection: CNN architectures, dataset characteris-
sis lesion segmentation,^ IEEE Trans. Med. Imaging, vol. 34, no. 6, tics and transfer learning,^ IEEE Trans. Med. Imaging, vol. 35, no.
pp. 1349–1361, 2015. 5, pp. 1285–1298, 2016.
48. N. Shiee, P.-L. Bazin, A. Ozturk, D. S. Reich, P. A. Calabresi, and 63. van Ginneken B, Setio AAA, Jacobs C, Ciompi F: BOff-the-shelf
D. L. Pham, BA topology-preserving approach to the segmentation convolutional neural network features for pulmonary nodule detec-
of brain images with multiple sclerosis lesions,^ Neuroimage, vol. tion in computed tomography scans,^ in 2015 I.E. 12th
49, no. 2, pp. 1524–1535, 2010. International Symposium on Biomedical Imaging (ISBI), 2015.