Deep Convolutional Neural Networks For Mammography
Deep Convolutional Neural Networks For Mammography
R ES EA R CH Open Access
Abstract
Background: The limitations of traditional computer-aided detection (CAD) systems for mammography, the extreme
importance of early detection of breast cancer and the high impact of the false diagnosis of patients drive researchers
to investigate deep learning (DL) methods for mammograms (MGs). Recent breakthroughs in DL, in particular,
convolutional neural networks (CNNs) have achieved remarkable advances in the medical fields. Specifically, CNNs are
used in mammography for lesion localization and detection, risk assessment, image retrieval, and classification tasks.
CNNs also help radiologists providing more accurate diagnosis by delivering precise quantitative analysis of
suspicious lesions.
Results: In this survey, we conducted a detailed review of the strengths, limitations, and performance of the most
recent CNNs applications in analyzing MG images. It summarizes 83 research studies for applying CNNs on various
tasks in mammography. It focuses on finding the best practices used in these research studies to improve the diagnosis
accuracy. This survey also provides a deep insight into the architecture of CNNs used for various tasks. Furthermore, it
describes the most common publicly available MG repositories and highlights their main features and strengths.
Conclusions: The mammography research community can utilize this survey as a basis for their current and future
studies. The given comparison among common publicly available MG repositories guides the community to select
the most appropriate database for their application(s). Moreover, this survey lists the best practices that improve the
performance of CNNs including the pre-processing of images and the use of multi-view images. In addition, other
listed techniques like transfer learning (TL), data augmentation, batch normalization, and dropout are appealing
solutions to reduce overfitting and increase the generalization of the CNN models. Finally, this survey identifies the
research challenges and directions that require further investigations by the community.
Keywords: Mammograms (MGs), Breast cancer, Deep learning (DL), Convolutional neural networks (CNNs), Machine
learning (ML), Transfer learning (TL), Computer-aided detection (CAD), Classification, Feature detection
*Correspondence: [email protected]
1
Department of Computer Science and Engineering, University of
Connecticut, 06269 Storrs, CT, USA
2
The Informatics Research Institute (IRI), City of Scientific Research and
Technological Application (SRTA-City), New Borg El-Arab, Egypt
Full list of author information is available at the end of the article
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Fig. 1 A breakdown of the studies included in this survey in the year of publication grouped by their neural network task. Since 2016 the number of
studies on CNN for MGs has increased significantly
• Does this study focus on using a CNN for detecting of data needed to train a DL network is massive compared
abnormalities in MGs? to the data needed to train traditional neural networks.
• What is the task of the implemented CNN? The availability of comprehensive annotated databases is
• What are the databases, database size, image critical for advancing DL development in medical imag-
resolution, image type, abnormalities involved in the ing. The most common findings seen on mammography
development of the CNN? are abnormal areas of mass, calcifications (MCs), archi-
• What are the methodologies used for the setup and tectural distortion (AD), and asymmetries. There are
pre-processing of the data-set? common publicly available databases for MGs: the Mam-
• Can deep networks perform well on medical images mographic Image Analysis Society (MIAS) database [36],
specifically MGs? Digital Database for Screening Mammography (DDSM)
• What are the learning methods used for training the [19], INbreast database [37], Breast Cancer Digital Repos-
CNNs? itory (BCDR) [38], Image Retrieval in Medical Applica-
• What are the best practices that were applied to tions (IRMA) [39].
increase the accuracy of detection of abnormalities? Table 2 compares the publically available MG databases
• What are the advantages and limitations presented by according to the origin, the number of images, size of
the methodologies employed in CNNs? images, views (CC, MLO), digital or film database, the
• Is it an end-to-end (E2E) training method? format of images, resolution of images, and the distri-
• Is transfer learning from natural imagery to the bution of normal, benign and malignant images. Other
medical domain relevant? databases used in literature are private and restricted to
• Is combining learned features with hand-crafted individual organizations [21, 26, 27, 31, 34, 40–46]. The
features will enhance the accuracy of certain public databases present a wide variability of patients’
mammographic task? cases and a mixture of normal, benign, and malig-
• What are the common toolkits used in nant cases. Annotations include the location and bound-
mammography? aries of the lesions performed by imaging specialists.
• What are the challenges to train deep neural network The public repositories have collected film screen MGs
for mammography data-set? (FSMs) [36, 38, 39], and/or digital mammography (FFDM)
• How imbalanced data-sets impact the performance [37–39, 47] with different resolutions. Digital MG images
of CNNs? are usually saved in the DICOM format that gathers not
• What is the common cross-validation method used only the image but also some related meta-data as in
with MGs? [37, 38, 47]; however, some databases use different formats
• Which activation functions are commonly used for [36, 38, 39, 48].
training MGs? The images of the MIAS database are of low res-
olution and have strong noise. The MIAS database
Breast cancer digital repositories is an old database that contains a limited number of
Mammographic databases play an important role in train- images. Despite all these drawbacks, it has been widely
ing, testing, and evaluation of DL methods. The amount used in literature until now [49–51]. DDSM is a huge
Table 2 Comparison between widely used databases in literature respect to size of images, views (CC, MLO), digital or film databases,
the format of images, bits/pixel (bpp) and the distribution of normal, benign and malignant images
Database Image-size Views Type Format bpp #Normal #Benign #Malignant
DDSM 3118×5001 Both FSM LJPEG 12 914 870 695
IRMA Several Both Both PNG 12 1108 1284 1284
INbreast Several Both FFDM DICOM 16 67 220 49
MIAS 1024×1024 MLO FSM PGM 8 207 69 56
BCDR-F01 720×1168 Both FSM TIF 8 0 187 175
BCDR-F02 720×1168 Both FSM TIF 8 0 426 90
BCDR-F03 720×1168 Both FSM TIF 8 0 426 310
BCDR-D01 Several Both FFDM DICOM 14 0 85 58
BCDR-D02 Several Both FFDM DICOM 14 0 405 51
BCDR-DN01 Several Both FFDM DICOM 14 200 0 0
repository used in many studies [23, 24, 32, 49, 52–65]. cost hardware, open source algorithms, and the rise of
DDSM images are saved in non-standard compression big data [16].
files that require use of decompression codes. More- The structure of CNNs is very similar to that of ordi-
over, the Region of Interest (ROI) annotations for the nary neural networks. The basic CNN architecture is a
abnormalities in the DDSM images indicate general posi- stack of convolutional layer (Conv), nonlinear layer (e.g.
tion of lesions, without precise segmentation of them. ReLU), pooling layer (e.g. Max-pooling), and a loss func-
The IRMA project is a combination of a number of tion (e.g. SVM/Softmax) on the last fully connected (FC)
databases of different resolution and sizes. The ROI (Fig. 2). The output can be a single class (e.g. normal,
annotations for these databases are more precise mak- benign, malignant) or a probability of classes that best
ing them more accurate for supervised DL methods.
The INbreast database is gaining more attention nowa-
days and used in [25, 32, 57, 66–70]. Its advantages are
high resolution and accurate segmentation of lesions; Table 3 A summary for the strengths and limitations of the
however, its small size and the limited shape variations DDSM, IRMA, INbreast, MIAS and BCDR databases
of the mass are its drawbacks. BCDR is a promising Database Strength Limitation
database but still is in its development phase. BCDR DDSM Big widely used database. Non-standard format.
has been used in few studies [71–74]. The strengths Shape variations of different lesions. Not precise position
and limitations of these databases are summarized of lesions.
in Table 3. IRMA Accurate position of lesions. Non-standard format.
High resolution.
Convolutional neural networks
INbreast Accurate position of lesions. Limited size.
In fact, DL is not a new idea, which even dates back
Limited mass shape
to 1940s [7, 75] for medical images. Shallow layer variations.
CNNs were used to investigate breast cancer in 1995
Standard file format.
[40, 76]. Famous CNNs such as Alex-Net [16], ZF-Net
[77], GoogLeNet [78], VGG-Net [79] and ResNet [80] Old database.
have brought about breakthroughs in processing images. No more supported.
Alex-Net architecture is extensively used in medical imag- MIAS Still widely used. Limited size.
ing for breast cancer detection. DL is a subset of machine Images are of low
learning that requires a huge number of labeled data resolution.
to train the models. The term “deep” usually indicates Has MLO view only.
the number of hidden layers in neural networks, e.g. Different resolutions.
ResNet has a depth of 152 layer which is 8× deeper
BCDR Accurate position of lesions. Limited size.
than VGG-Net. Since 2012, CNNs have become more
Standard file format.
popular and have attracted more attention because of
the increasing computing power, availability of lower Still in their development phase.
Fig. 2 The CNN architecture is a stack of Convolutional layer (Conv), Nonlinear layer (e.g. ReLU), Pooling layer (Pool), and a Loss function (e.g.
SVM/Softmax) on the last (Fully connected) layer. The output can be a single class (e.g. Normal, Benign, Malignant)
describes the image. The input to a convolutional layer is non-linear activation function leaves the size of the vol-
a W1×H1×D1 image where W1 is the width and H1 is ume unchanged (W2×H2×D2). After applying ReLU, a
the height of the image and D1 is the number of channels, down-sampling operation called Pool is applied along the
e.g. an RGB image has D1=3. The convolutional layer will spatial dimensions (width, height) of the result feature
have F filters (e.g. 12 filters) of size N×N×D1 where N map. After pooling, there may be any number of fully con-
is smaller than the dimension of the image and D1 is the nected layers that compute the class scores (Fig. 2). More
same as the number of channels (e.g. 5×5×3 (i.e. 5 pixels details about the architecture of CNNs can be found in
width and height, and 3 because images have depth 3, the [16, 81].
color channels).
During the convolution operation, each of the F fil- Popular CNNs
ters convolves with the image to produce K feature Alex-Net [16], ZF-Net [77], GoogLeNet [78], VGG-Net
maps of volume size W2×H2×D2 where: W2=H2=(W1- [79] and ResNet [80] have been extensively used as a pre-
F+2P)/S+1, S is the number of strides, D2=F, and P is trained networks to classify images for medical domains
the amount of zero padding. For each feature map, a instead of training a network from scratch. Table 4 shows
non-linear activation function is applied (e.g. ReLU). A the configurations of the most popular CNNs. Generally,
Table 4 The configurations of AlexNET, ZF-NET, GoogLeNET, VGG-NET and ResNET models
AlexNet [16] ZF-Net [77] GoogLeNet [78] VGG-Net [79] ResNet [80]
Year 2012 2013 2014 2014 2015
Image Resolution 227×227 227×227 224×224 224×224 2244×224
Number of layers 8 8 22 19 152
Number of Conv-Pool layers 5 5 21 16 151
Number of FC layers 3 3 1 3 1
Full connected layer size 4096,4096,1000 4096,4096,1000 1000 4096,4096,1000 1000
Filter Sizes 3, 5, 11 3, 5, 11 1,3,5,7 3 1,3,7
Number of Filters 96 - 384 96 - 384 64 - 384 64 - 512 64 - 2048
Strides 1, 4 1, 4 1, 2 1 1, 2
Data Augmentation + + + + +
Dropout + + + + +
Batch Normalization - - - - +
Number of GPU 2 GTX 1 GTX A few high-end 4 Nvidia
580 GPUs 580 GPUs GPUs Titan Black GPUs Titan Black GPUs 8 GPUs
Training Time 5:6 days 12 days 1 week 2:3 weeks 2:3 weeks
Top-5 error 16.40% 11.2% 6.70% 7.30% 3.57%
training a deep CNN requires extensive computational highlight and focus on some of them that show significant
and memory resources. Training these networks from changes in the classification accuracy when applied to
scratch typically takes days or weeks on modern GPUs MG images. Recent survey papers [7, 8, 85] discuss more
(Table 4). All these networks were trained on the 1000 trends for natural images.
object category classification on the ImageNet data-set
[82]. The ImageNet data-set consists of a 1.2M image Data preparation
training set, a 50K image validation set, and a 100K image Pre-processing of MG images
test set. Two error rates are reported for these networks: Pre-processing of MG images is an essential task before
top-1 and top-5, where the top-5 error rate is the fraction training CNNs [63, 66, 71, 72, 86]. The pre-processing
of test images for which the correct label is not among consists of contrast enhancement, noise removal, and
the five labels considered most probable by the model. breast segmentation. Breast segmentation includes the
All these network architectures use the data augmentation remove of the background area, labels, artifacts, and pec-
technique to prevent overfitting with dropout initially set toral muscle which disturb the detection of Mass/MCs
to 0.5. [45, 50]. It is important to have good separation between
Alex-Net [16] was the first CNN to win the ImageNet foreground and background pixels and do not remove
Challenge in 2012. AlexNet’s CNN consists of five Conv the important information in images [59, 87, 88]. The
layers and three fully connected (FC) layers. Within each commonly used filters for image enhancement and noise
Conv layer, there are 96 to 384 filters and the filter size reduction are the adaptive mean filter, median filter,
3×3, 5×5, 11×11, with 3 to 256 channels each. A ReLU and contrast limited adaptive histogram equalization
non-linearity is used in each layer. Max-pooling of 3×3 (CLAHE) [62, 89–92].
is applied to the outputs of layers 1, 2 and 5. Alex- Image size, cropping, and down-sampling
Net used a stride of 4 at the first layer of the network. Most studies have used segmented ROIs in order to
AlexNet’s model requires 61M weights to process one reduce the computation of the CNNs and to avoid the
227×227 input image (top-5 error of 16.40%). ZF-Net [77] issue of small training data. These ROIs can be obtained
is a slightly modified version of Alex-Net model and uses by a manual segmentation of the images using the
an interesting way of visualizing their feature maps. In available ground truth data, or an automatic detection sys-
ZF-Net, the used visualization technique give insight into tem. The ROIs are cropped and re-scaled to r×r pixels
the function of intermediate feature layers and the oper- with the lesion centered within the image. However, using
ation of the used classifier. The VGG-Net [79] model very small subsampled (e.g. 32×32) patches may not con-
reinforces that the CNNs have to have a deep network tain enough detail to improve the classification results as
of layers. GoogLeNet [78] has 22 layers. It introduced in [40, 41, 44, 63, 66, 67, 70, 74, 93].
an inception module to the CNN model. It has pieces of Two strategies have been utilized to use full image size
the network that are working in parallel in contrast to for training CNNs on MGs instead of ROIs. The first strat-
previous CNN models, which have only a single serial con- egy, down-sample high resolution images to ≈ 250 × 250.
nection. ResNet [80], also known as Residual Net, uses However, the requirement to find small mass regions or
residual connections to go even deeper. ResNet deter- MCs clusters in down-sampled high resolution images
mines an object’s exact location, which is a huge jump in is unlikely to be successful for MGs [65]. The second
CNNs. ResNet is 8× deeper than VGG-Net with lower strategy, train a patch-level CNN classifier, which is then
complexity. The ResNet with 152 layers was the win- used as feature extractor to an image-level model. In the
ner of the ImageNet challenge 2015 [82] (top-5 error of image-level model, each image is partitioned into a set
3.57%). it has 60M weights. YOLO is another famous of patches with a minimal overlap such that each patch
CNN named that is recently used for object classification is contained entirely within the image. Final classification
and localization while processing the image only once, as involves aggregation across patches and the CC & MLO
is implied by it’s name, You Only Look Once [83, 84]. views [65].
Table 4 shows that the number of layers are going deeper Mixing databases
and deeper within the newer implementations as in In literature, researchers mix several databases to anal-
ResNet. ysis their CNNs. The fusion from different image type
(FSM and FFDM) assists CNNs in term of detection rate.
Results Researchers in [32, 49, 51, 52, 57, 94, 95] compared both
CNNs best practices image quality and detection on FFDM and FSM databases.
In this section, we explain the practices that contribute They have shown that a CNN using FFDM images gives better
to improve the performance of CNNs for MGs. It goes detection rate than using FSM images. Moreover, these
beyond the scope of this paper to discuss all the best studies show that DL training using the fusion of both FFDM
practices done in CNNs in general, but we are going to and FSM lower the number of false detections [93, 94].
Learned and hand-crafted features rotation to the actual data. Common data augmentation
The hand-crafted features (i.e. Haar-like features, his- techniques for mammography images are horizontally
togram of oriented gradients (HOG), and histogram flipping, rotations (90, 180, and 270 degrees), jittering,
of the gradient divergence (HGD)) are commonly used and random scaling. Such data augmentation generates
with traditional machine learning approaches for object relevant training samples because tumors may present
recognition like support vector machines. CNNs are able in various orientations and sizes. Thus, augmentation
to extract features from the input image data-sets. Thus, techniques do not change the underlying pathology of the
CNNs remove the necessity of the time-consuming hand- masses. Data augmentation has been employed by many
crafted features. studies [12, 22, 23, 30, 33, 34, 40–45, 50, 53–55, 57, 58, 63,
However, the authors in [21, 23, 54, 71, 87, 96–100] have 66, 70–73, 96, 97, 101–110].
demonstrated the importance of combining the extracted Going deeper
features using deep CNNs with hand-crafted features In CNN, the design of the network architecture com-
like texture, and shape. Interestingly, the combination pletely depends on the model requirements and the size
of both representations (learned and hand-crafted fea- of the data-set. The CNNs in [53, 66, 96] have a fewer
tures) resulted in a better descriptor for Mass/MCs lesion number of layers but show good accuracy. However, the
classification [71, 100]. The reason behind using hand- work done in [63, 72, 73, 78] shows that we can get bet-
crafted features is that the learning process should be ter performance in term of higher area under the ROC
guided by a training data-set that has a wide variability curve (AUC) as the architecture goes deeper and trained
of texture and shape features. For example, Dhungel et al. on more data. Deep architectures can lead to abstract
[97] proposed two-step training process involving pre- representations because more abstract shapes can often
training based on a large set of hand-crafted features. The be constructed in terms of less abstract ones captured
second stage fine-tunes the features learned in the first in earlier layers. Adding more layers will help the model
stage to become more specialized for the classification to extract more features. But adding more layers can be
problem. done to a certain extent and there is a limit. After that,
Using hand-crafted features depends on the size of instead of extracting features, it results in overfitting the
the data-set. With small training data-set, generating network that can lead to false positives. Adding more hid-
hand-crafted images could result in a better model for den layers will promote the accuracy for large data-sets.
Mass/MCs lesion classification. Also, employing some Adding layers unnecessarily to a CNN will increase the
hand-crafted features that specifically target small and number of parameters, and for a smaller data-set, it will
missed lesions is a more effective strategy than adding reduce accuracy of the test data. Deep architectures are
extra cases to the train a data-set. Thus, the performance often challenging to train effectively, and this has been the
of CNNs trained with small data-set can be improved by subject of more recent research. Choosing a smaller net-
incorporating hand-specified features to deal with cases work or a larger one cannot be estimated theoretically. A
that cause false positives or false negatives [23]. trade-off between accuracy and deep networks need to be
done with trial and error method and some experience
Hyper-parameters and practice on the basis of the data-set.
Hyper-parameters are variables which determine the net- Learning rate
work structure (e.g. number of hidden layers), and the Learning rate (LR) is one of the most important hyper-
variables which determine how the network is trained (e.g. parameters, which influences the CNNs’ performance.
learning rate). Hyper-parameters are manually chosen Deep learning models are typically trained by a stochastic
before training the CNNs. gradient descent optimizer. There are many variations of
Data augmentation stochastic gradient descent as Adam, RMS Prop, Adagrad,
Data augmentation is an appealing solution to reduce etc. All these optimizers let users set the learning rate.
overfitting and increase the generalization of the model Learning rate controls how much the network parameters
and boost the performance. Overfitting happens in CNNs are adjusted in order to minimize the network’s loss func-
when the models learn too well the details from training tion. If the LR is too small, the CNN will converge after
data, but they do not generalize well from the training many iterations to the best values. However, if LR is too
data, in order to make good predictions about the future high, it can cause undesirable divergent behavior in the
unseen data. As a result, the performance of the trained loss function. Famous learning rate policies are step decay,
model is poor for testing data. That usually happens when quadratic decay, square root decay and linear decay [85]. A
the size of training data-set is too small compared with the common practice when dealing with MG images, is to use
number of model parameters that need to be learned. a step decay rate where the LR is reduced by some percent-
Data augmentation artificially creates new sample age after a set number of training epochs. For example, Yi
images by applying transformations like flipping, and et al. [23] used a learning rate of 0.001 with decay rate of
0.99 per epoch, and a regularization coefficient of 10−5 for forward pass and any weight updates are not applied to
training their CNN. Another common practice is to use a these neurons on the backward pass [16]. Smirnov [117]
small learning rate (e.g. 0.001) to train a pre-trained net- has shown a comparison of regularization methods with
work, since we expect well-adjusted pre-trained weights deep CNNs and showed that the dropout technique is in
compared to randomly initialized weights. general better than other regularization techniques. The
Activation functions authors in [12, 22, 25, 44, 58, 70, 73, 94, 106, 106, 108] have
Recently, many variations of rectified linear unit (ReLU) used dropout in their work with MGs. The dropout of 0.5
function have been proposed for activation function such is a common value for mammography images.
as leaky ReLU, parametric ReLU, and randomized ReLU Batch normalization
[111]. There are other popular activation functions such In a CNN model, a batch normalization (BN) layer nor-
as sigmoid, and tanh. The activation functions bring non- malizes input variables across a mini-batch (a subset of
linearity into CNNs. Sigmoid presents a serious disadvan- the training data-set). First, the BN layer normalizes the
tage called the vanishing gradient problem. In the vanish- activations of each channel by subtracting the mini-batch
ing gradient problem, the gradient of small input values mean and dividing by the mini-batch standard deviation.
to sigmoid functions tends to get smaller (close to zero) Then, the BN layer shifts the input by a learnable offset β
as gradients are computed backward through the hidden and scales it by a learnable scale factor γ , thus reduces the
layers, resulting in slow learning in the earlier layers of networks’ internal covariant shift. BN speeds up training
the model. Slow learning is highly avoided in DL since it of CNNs and reduce the sensitivity to network initializa-
results in expensive and tedious computations [112]. tion. According to [118], BN allows the use of much higher
ReLU became a popular choice in DL and even nowa- learning rates and less care about initialization as it acts a
days provides outstanding results as it solves the vanishing regularize. BN results in faster convergence and as a con-
gradient problem [111]. ReLU has gradient one for posi- sequence overall faster training for a CNN. Besides that,
tive inputs and zero for negative inputs. As long as values BN regulates the values going into each activation func-
are above zero, the gradient of the activation function will tion. With BN, saturating nonlinear activation functions
be one, meaning that it can learn anyways. This solves the (e.g. sigmoid) that do not work well in deep networks tend
vanishing gradient problem present in the sigmoid activa- to become viable again. Similar to dropout, BN adds some
tion function. On the downside, once the gradient is zero noise to each hidden layer’s activations. Therefore, using
the corresponding nodes do not have any influence on the BN causes less dropout value. BN has been used in CNNs
network anymore, which is known as “dying ReLU” prob- for MG images [65, 73, 101]. For mammography, it is rec-
lem. Leaky ReLU is one attempt to overcome the dying ommended to not depend only on BN for regularization;
ReLU problem [113]. Instead of the output of ReLU being and to use it together with dropout.
zero when input is less than zero, a leaky ReLU will pro- Transfer learning
vide a small negative slope (α of 0.01, or so). This small Training a deep CNN requires large amounts of labeled
slop reduces the sparsity but, on the other hand, makes the training data [11]. Only few studies train an entire CNNs
gradient more robust for optimization, since in this case, from scratch with random initialization; and the rest use
the weight will be adjusted for those nodes that were not TL approaches either fine-tune a pre-trained network [46,
active with ReLU. When the slop is not constant (e.g. 0.01) 52, 53, 58, 60, 63, 72, 73, 94, 110, 119, 120] or use a
then it is called randomized ReLU. pre-trained network as feature extractor [15, 32, 46, 70].
A detailed explanation of the advantages and disadvan- Recent overviews of TL in deep network models are given
tages of different activation functions are discussed in [16, in [37, 45, 46, 65]. The need for TL in medical domain
111, 112]. Theoretically, leaky ReLU is in general better occurs because data are scarce and expensive, they are not
than ReLU. However, ReLU has been chosen as an activa- publicly available, and it is time-consuming to collect and
tion function in most of the CNNs for MGs as it allows label them by professional radiologists [17, 46, 55, 121–
faster learning [58, 64, 65, 70, 114, 115]. 124]. Moreover, training a deep CNN requires extensive
computational and memory resources [16, 17, 78].
Techniques for improving the CNNs performance References [60, 77, 125] show that the main power of
Dropout a CNN lies in its deep architecture. Extracted features
Dropout is a regularization technique proposed in [116] of earlier layers of a pre-trained CNN (i.e. on natural
that superior the other regularization methods (L1, L2, images) contain more generic features (e.g. edge detec-
Max norm). Dropout prevents a CNN model from over- tors or blob detectors) that are useful for many tasks; but
fitting. This technique randomly selects neurons and in later layers, generic features are combined and become
ignore them during training. They are “dropped-out” ran- more specific to the details of the classes contained in
domly. This means that their contribution to the activa- the training data-set. Thus, a deep CNN allows extract-
tion of downstream neurons is temporally removed on the ing a set of discriminating features at multiple levels of
abstraction which can be transferable from one domain Cross-validation avoids overfitting and gives a less biased
to another. However, the required level of fine-tuning dif- estimate of the performance of the model [67, 134]. In
fers from one application to another. Tajbakhsh et al. [125] practice, the choice of the number of folds depends on
show that neither shallow tuning nor deep tuning may be the size of the data-set. In literature, the common strategy
the optimal choice for a particular application. Moreover, is to use K-fold cross validation for mammography. For
layer-wise fine-tuning may offer a practical way to reach large data-sets, it is a common choice to use 3 to 5-fold
the best performance for a certain application and should cross-validation. For small mammography data-set, it is a
be chosen experimentally. In addition, the work in [21, common choice to use 10-fold cross-validation.
106, 109] has achieved a good performance on a small Context and patient information
data-set by pre-training the network on a large data-set of Integrating some information such as patient age, breast
general medical images. density and other context like the view type (CC or MLO)
Most of the studies employed TL have used ImageNet’ into a CNN method can improve the detection rate of
data-set [82] for pre-training their network [46, 58, 60, 72, CNNs [96]. Multi-modal machine learning aims to build
94, 95, 110, 126–128]. The commonly used pre-trained models that can process and relate information from mul-
CNNs architectures for mammography are Alex-Net [46, tiple modalities (e.g. images and text) with a score level
50, 58, 60, 72, 94, 95, 110, 127, 128], VGG16 [50, 127, 129], fusion at the final prediction results.
ResNet50 [127, 129] and GoogLeNet [58, 72, 127]. All the Multi-view and single-view images
deep CNN architectures that are pre-trained using Ima- It is a good practice to use both CC & MLO views to
geNet are designed for a 1000-class classification task. To detect abnormalities. A true abnormality can usually be
adapt them to the task at hand, the last three layers are detected on two different views of a MG. Recent studies
removed from each network and a three new layers (FC in [15, 20–25, 95, 107] lead to significant improvements
layer, soft-max layer, and classification layer) are appended of multi-view (MV) approaches compared to single-view
to the remaining structure of each network. (SV) ones, demonstrating that the high-level features of
Until large-scale medical image data-sets for mammog- the individual CNN models provide a robust representa-
raphy became available, the combination of TL and data tion of the input images. Comparing two views can aid in
augmentation is a very promising approach for training the reduction of false positives and false negatives.
deep CNNs. By visualizing the features learned at different Balanced and imbalance distribution
layers during the training process, a model can be mon- A couple of publicly available databases (e.g. INbreast,
itored to closely observe and track its performance [23]. DDSM) are constructed to include approximately the
Learned features can indicate whether a model is success- same proportions of normal and abnormal cases, which is
fully learning or not, allowing a user to stop the training a balanced distribution of classes. Other databases called
process early [130]. imbalance distribution (natural distribution) databases,
Cross-validation which include unequal proportions of normal and abnor-
Cross-validation is a statistical technique to evaluate mal cases. Training CNN models directly on imbalanced
predictive models by partitioning the original samples into data-sets may bias the prediction towards the more com-
a training set to train the model, and a test set to evaluate mon classes like normal, resulting in false negatives.
it. There are three common types used in literature for val- Whereas the minority ones are misclassified frequently
idation, the hold-out splits [76, 131], three-way data splits [135]. The authors in [20, 21, 32, 45, 46, 53, 56, 58, 73, 74]
[8, 22, 58, 65, 96], the K-fold cross-validation [20, 23, 25, have pointed out that the balance of the number of sam-
26, 49, 94, 110, 115, 132, 133]. In the hold-out data splits, ples per class has a great impact on the performance of the
data is split into training set and test set (e.g. 80%, and 20%, system. However, the authors in [22, 44, 96] used a nat-
respectively). The training set is used to train the model ural distribution databases. According to [136] choosing
and the test set is used to estimate the error rate of the a wrong distribution or objective function while devel-
trained model. In the three-way data splits, data are ran- oping a classification model can introduce bias towards
domly split into training, validation and testing sets. The potentially uninteresting class (non-cancerous). For MG
CNN model is trained on the training set and is evalu- images, it is preferable to use a balanced data-set. Dif-
ated on the validation set. Training and validation may be ferent approaches to handle imbalanced data-sets include
iterated a few times till the best model is found. The final random under-sampling and random over-sampling tech-
model is assessed using the test set. niques [135]. Random under-sampling aims to balance
In the K-folds cross validation, data are split into k dif- class distribution by randomly eliminating majority class
ferent subsets (or folds). The cross-validation process is samples (normal cases). This is done until the majority
repeated K times (the folds), with each of the K sets used and minority class instances are balanced out as done
exactly once as the test set. The K error estimates from the in [74]. In the other side, over-sampling increases the
folds can then be averaged to produce a single estimation. number of instances in the minority class (abnormal
cases) by randomly replicating them in order to present a Toolkits and libraries for deep learning
higher representation of the minority class. Unlike under- Implementing a DL network from scratch is an exhausting
sampling, over-sampling leads to no information loss. process and probably beyond the skills of most medical
The appropriate approach (random under-sampling imaging researchers. It is much more efficient to utilize
or random over-sampling) depends on the amount of the publicity available resources. Some criteria should be
available data-set and the specific problem at hand. considered while choosing a library and toolkit includ-
Researchers empirically test each approach and select the ing its programming language for interface, the quality of
one that gives them the best results. In the case of using documentation of the toolkit, the ease of programming,
an imbalance data-set, accuracy is not a right metric to the runtime to do thousands of calculations per pixel, the
evaluate the performance of the model. There are more training speed, GPU support for faster performance [17],
appropriate scores when using imbalanced data-sets such and lastly its popularity among experts. Recent surveys
as F1-score [136] that combines the trade-offs of preci- done in [139, 140] discus the most famous and recent
sion and recall, and outputs a single number reflecting the toolkits and libraries used generally for DL. The com-
goodness of a classifier. mon toolkits used in training CNNs for mammography
Multi-stage and end-to-end (E2E) methods are Tensorflow [141], Keras, Caffe [142, 143], PyTorch
A multi-stage pipeline used for detection and classi- [144] and MatConvNet [145]. Table 5 gives a compari-
fication of a lesion consists of multiple stages such as son between these libraries and their ranking based on the
pre-processing, image segmentation, feature detection, forks received by the community on GitHub.
feature selection and classification stages [137, 138]. End- Tensorflow is one of the most popular DL libraries,
to-end (E2E) deep learning methods take all these multi- it was developed by the Google Brain team and open-
ple stages and replace it with just a single neural network. sourced in 2015 [141]. Tensorflow is a Python-based
Researchers in [12, 15, 24, 40–42, 60, 67, 102, 105] have library capable of running on multiple CPUs and GPUs.
used one or more stages of this multi-stage pipeline in It can be used directly to create deep learning models, or
their CNN systems. In their multi-stage method, a CNN by using wrapper libraries (e.g. Keras) on top of it. Ten-
is trained to determine whether a small patch has Mass sorflow does not contain many pre-trained models and
and/or MCs. Other researchers focused on training a deep there’s no support for external data-sets, like Caffe. The
CNN for classifying a small ROI or full image into benign framework is written in C++ and Python and has large
or malignant, assuming an existing Mass/MCs detection amount of available documentation. As of today it is the
system as in [23, 25, 43, 50, 55, 56, 62, 66, 71, 72, 87, most commonly used deep learning framework.
104]. In multi-stage methods for CNNs, several cascaded Keras is a very lightweight open source library, easy to
classifiers are trained independently, each classifier makes use, and pretty straightforward to learn. It was built as a
a prediction, and all predictions are combined into one simplified interface for building efficient deep neural net-
using different strategies. Dhungel et al. has found that the works in just a few lines of code and use Tensorflow as
multi-stage methods are effective in the reduction of false back-end.
positive detection [97]. Moreover, researchers in [22, 25, Caffe is one of the first deep learning libraries developed
30, 45, 57, 96, 98, 107] used the E2E methods. mainly by Berkeley vision and learning center (BVLC)
E2E methods for MGs are better than multi-stage [142, 143]. It is a C++ library which also has a Python
method when training a CNN with a large data-set. But interface and finds its primary application in modeling
if the data-set is small in size, then the learning algorithm CNNs. Caffe provides a number of pre-trained networks
cannot capture much insight from data. Excluding poten- directly from the Caffe Model Zoo, available for immedi-
tially useful hand-crafted features that are very helpful ate use.
if well designed is the downside of the E2E approaches. PyTorch is a Python library enabling GPU accelerated
Therefore, the key parameter to choose using E2E deep tensor computation, similar to NumPy. A few advantages
learning approach is having sufficient data to learn the of using PyTorch are it’s multi-GPU support, dynamic
model. computational graphs, custom data loaders, optimization
Table 5 A comparison between most famous toolkits and libraries for training mammography
Interface Languages Open source CUDA support Pre-trained models Forks (Github) Contributions (Github)
TensorFlow Python C++, Python Yes Yes Yes 63,603 1,481
Keras Python, R Python Yes Yes Yes 11,203 681
Cafee Python, Matlab, C++ C++, Python Yes Yes Yes 14,868 267
PyTorch Python C, Python, CUDA Yes Yes Yes 3,592 644
MatConvNet Matlab CUDA Yes Yes Yes 651 24
of tasks, and memory managements. PyTorch provides a images and used it with the representation features of
rich API for neural network applications [144]. PyTorch is their CNN. Their work demonstrates that DL methods are
used by many companies such as Twitter, Facebook and superior to traditional classifiers. Domingues et al. [67]
Nvidia to train DL models. used a shallow CNN that did not outperform traditional
MATLAB has a neural network toolbox that provides CAD methods, as they used a very small data-set to
algorithms to create, train, visualize deep neural net- train their network and the selected normal ROIs did not
works. TL can be done with pre-trained deep CNNs represent every possible aspect of healthy breast tissue.
models (including Inception-v3, ResNet-50, ResNet- Antropova et al. [100] developed a system incorporat-
101, GoogLeNet, Alex-Net, VGG-16, and VGG-19) and ing both deep CNN and conventional CAD methods that
models imported from Keras or Caffe. MATLAB allows performed statistically better than either one separately.
computations and data distribution a across multi-core Sert et al. [63] stated that human level recall perfor-
processors and GPUs with the parallel computing tool- mance in detecting breast cancer considering MCs from
box. MatConvNet [145] is an open source implementa- MGs has a recall value between 74.5% and 92.3%. In [63],
tion of CNNs with a deep integration in the MATLAB the authors reached a recall value of 94.0% above human
environment. level performance. Wang et al. [15] showed that breast
arterial calcifications (BACs), detected in MGs, can be
Applications of deep CNNs for mammography useful for identifying risk markers for having cancer. The
After describing deep CNNs in the previous section, and authors in [15] showed that their CNN method achieves
different practices that are famous for mammography, we a level of detection similar to the human experts. Kooi
will now turn our focus to how these are used for recog- et al. [12] employed a deep CNN with a large augmented
nition purposes for mammography. More specifically, we data-set. Similar to the work of [15], the network in [12]
review recent deep CNNs’ applications in mammogra- performs similar to experienced radiologists, achieving
phy such as classification, localization, image retrieval, AUC of 0.87 while the mean AUC of the experienced
high resolution image reconstruction and risk analysis. radiologists is 0.84. In [96], Kooi et al. proposed to use
We summarized these recent works in Additional file 1: a random forest classifier for mass detection followed by
Table S1. a deep CNN that classifies each detected mass. Their
method relies on a manually extracted features and fea-
Lesion classifications and detection tures extracted from CNN layers. In [96], Kooi et al.
The detection of lesions in mammography is a common trained their model on a large data-set and integrated
task for CNNs. In contrast to lesion detection, classifica- additional information such as lesion location and patient
tion of MGs into benign and malignant is a challenging information. Kooi et al. [149] following their work in
task that many studies try to address it. The authors in [12, 96] employed a conditional random field (CRF) that
[12, 15, 20, 21, 24, 40, 41, 44, 52, 60, 67, 70, 90, 93, 102, is trained on top of CNN to model contextual interac-
105, 120] are interested in lesion classification into two tions such as the presence of other suspicious regions. In
classes. They developed a CNN to predict a probability of [21], Kooi et al. employed a deep MV CNN using a pre-
being normal (NL), contain mass and/or MCs. The stud- trained network on medical domain. They combined the
ies in [23, 43, 46, 49, 50, 55, 56, 58, 59, 62, 63, 66, 69, extracted features using the deep CNN with hand-crafted
71–73, 94, 99, 100, 104, 132, 146–150] present deep CNN features.
methods to classify the MG images into 2 classes (benign The studies in [46, 50, 58, 64, 72, 94, 106, 109, 120]
or malign), or three classes (benign, malign or without demonstrated the use of TL in their work. The authors
tumor). The authors in [32, 95] studied the development in [46, 50, 72] showed that CNNs in addition to TL can
of malignancy of mass(es). The authors in [15, 40, 42, 44, superior current CAD methods for tumor detection and
151] are interested in the classification and detection of classification based on small data-sets. Samala et al. [106]
MCs in mammography. Chan et al. [40] introduced one demonstrated that MGs can be useful for pre-training a
of the earliest application of CNNs to detect clustered deep CNN for mass detection in digital breast tomosyn-
MCs. The authors applied enhancement filters for noise thesis (DBT). The similarity between masses in mammog-
reduction on fifty-two FSM images. They observed that raphy and DBT can be observed from the ability of the
the shape of MCs in the breast is randomly oriented, thus DCNN in recognizing masses in DBT. In [94], Samala et
they introduced an augmentation technique. Sahiner et al. al. demonstrated that CNNs with TL achieve better gen-
[41] demonstrated the great effect of mixing CNN repre- eralization to unknown cases than networks without TL.
sentation features and textural features (AUC of 0.873). Lo Similar to [94, 106], Hadad et al. [109] described a TL
et al. [102] introduced a multiple circular path CNN cou- approach for using a pre-trained deep CNN on MGs to
pled with morphological features of ROIs (AUC of 0.89). improve the detection accuracy of fine-tuned CNN on
Sharma et al. [59] extracted geometrical features from MG breast MRI lesions. Suzuki et al. [120] developed a deep
CNN pre-trained on natural images, then the authors Their deep multi-instance network uses linear regression
modified the last fully connected layer and subsequently with weight sharing for the malignant probability of each
train the modified CNN using 1,656 ROIs. Similar to position from the CNN’s feature maps. The authors in
[120], Jiao et al. [55] achieved an accuracy of 96.7% by [50, 146] trained a multi-stage CNN network for the
applying fine-tuning on a pre-trained CNN on natural classifications of lesions in MGs. Bekker et al. [56] pre-
images to extract features for the next procedures. Jiao sented a deep MV CNN for the classification of clustered
et al. [64] following his work in [55] proposed metric breast MCs to two classes. Their results show that clas-
learning layers to further improve performance of the sification based on MV MGs show promising results.
deep structure and distinguish malignant instances from Carneiro et al. [32] addressed the classification of mass(es)
benign ones. Levy and Jain [58] demonstrated that a using a pre-trained MV CNN. Their model classifies a full
fine-tuned pre-trained network significantly outperforms MG by extracting features from each view of the breast
shallow CNNs. (train a separate CNN for each view) and combining these
Abbas [49] used speed-up robust features and local features in a joint CNN model to output a prediction that
binary pattern variance descriptors that are extracted estimates the patient’s risk of developing breast cancer.
from ROIs. After that, they constructed deep invari- Carneiro et al. [95] following his work [32] build a fully
ant features in supervised and unsupervised fashions automated pre-trained CNN for detecting masses and
through a multilayer CNN architecture. Valvano et al. MCs in MV MG images. Geras et al. [22] developed a MV
[151] achieved accuracy of 83.7% for MCs detection using CNN that utilizes large high-resolution images without
a deep CNN. Jamieson et al. [43] introduced a four-layer downscaling. They showed that the accuracy of detect-
unsupervised adaptive deconvolution network to learn the ing and classifying MGs clearly increases with the size of
image representation using 739 FFDM images. Sun et al. the training data-set and that the best performance can
[105] developed a graph-based semi-supervised learning only be achieved using the images in the original resolu-
(SSL) method using a deep CNN, their method allows the tion. Yi et al. [23] utilized a deep MV learning by averaging
users to include the unlabeled data into the DL training the probability scores of both views to make the final pre-
data-set. In contrast, Arevalo et al. [69] used supervised diction. Lotter et al. [65] introduced a multi-scale deep
training in their method using ROIs annotated manu- CNN trained with a curriculum learning strategy. Lotter
ally made by expert radiologists, achieving AUC of 0.86. et al. first train CNN-based patch classifiers on ROIs, and
Arevalo et al. [71] following their work in [69], used a then use the learned features to initialize a scanning-based
hybrid supervised CNN classifier along with an exten- model that renders a decision on the whole image, hav-
sive enhancement pre-processing process. Dubrovina et ing final results by averaging final scores across MV of
al. [104] presented a supervised CNN for region classifica- the breast. Dhungel et al. [97, 98] presented an cascade
tion into semantically coherent tissues. The authors over- DL networks for detecting, segmenting and classifying
came the difficulty involved in a medium-size database breast masses from MGs with minimal user intervention.
by training the CNN in an overlapping patch-wise man- Dhungel et al. [25], following their work in [52, 97, 98],
ner. Teare et al. [62] proposed dual supervised CNNs for implemented a MV deep residual neural network for the
classifying full MG images to normal, benign and malig- fully automated classification of MGs as either malignant
nant classes. In their work a random forest classifier was or normal/benign (AUC of 0.8).
trained, taking the outputs of the two-deep CNNs.
The authors in [42, 44, 57, 66, 69, 70, 72, 73, 90, 146, Risk assessment
147] applied pre-processing, augmentation, normaliza- The studies in [26, 27, 33, 34, 107, 115, 133] have demon-
tion, regularization, mixing FSM and FFDM MG images, strated that applying CNNs methods have significant
and other techniques to better implement their network. potential to develop a new short-term risk predicting
Ge et al. [42] compared the performance of CNNs on pairs scheme with improved performance in detecting early
of FFDM and SFM obtained from the same patients with abnormal symptom from the negative MGs. Breast den-
a time span of less than 3 months. Their results show sity is considered a strong indicator of breast cancer risk
that the CNN with FFDM images (AUC of 0.96) detect [26, 27, 33, 34, 152]. Fonseca et al. [26, 152] explored
more MCs than the CNN with FSM images (AUC of 0.91). an automatic breast composition classification work-flow
Hepsaug [74] achieved an accuracy of 88% when train- based on CNN for feature extraction in combination with
ing separate deep CNN on only mass ROIs and 0.84% a support vector machines classifier. Similar approach was
on training deep CNN on only MCs ROIs in the BCDR done by Becker [33] achieving an (AUC of 0.82) compara-
database. On the other hand, the accuracy results show ble to experienced radiologists (AUC of 0.79–0.87).
that classifying only mass or only MCs is more success- Li [153] trained a deep CNN to estimate a probability
ful compared to classifying mass and MCs data. Zhu et al. map of breast density (PMD) to classify mammographic
[20] conducted mass detection for whole MG images. pixels into fatty class or dense class. Kallenberg et al.
[27] presented an unsupervised CNN for breast density pre-trained on general images since ROI characteristics
segmentation and automatic texture scoring. The model of medical images are thoroughly different from nat-
learns features across multiple scales, then they are fed to ural images. However, their opinion contradicts other
a simple classifier that is specific to the task of interest researchers work.
yielding AUC of 0.59. Ahn et al. [34] used CNN for the The authors in [31, 52, 155] proposed a patch-based
task of automatic classification of mammographic breast CNN to detect masses. Choukroun et al. [155] pro-
tissues into dense and fatty tissues. Their CNN is con- posed a method that classifies MGs by detecting discrim-
figured to learn the local features from image patches inative local information contained in patches through
while keeping the context information of the whole MG. a deep CNN and then uses the local information to
Wu et al. [107] managed to train a MV deep CNN localize tumors. Dhungel et al. [52] used the output
using a data-set of 201,179 MGs for breast density clas- from a CNN as a complimentary potential function to
sification. Mohamed et al. [115] achieved AUC of 0.95 a deep belief network (DBN) models for the localiza-
when using only the MLO view images. In comparison, tion of breast masses from MGs, using a small train-
the AUC is 0.88 when using only the CC view images. ing data-set. A drawback of the patch-based approach
When both the MLO and CC view images were com- in [31, 52] is that the input patches came from non-
bined as a single data-set, the AUC is lowered to 0.92. The overlapping areas, which makes it difficult to pre-
authors in [110] following their work in [115] achieved ciously localize masses. Moreover, the size of the input
better AUC of 0.98 by fine-tuning a pre-trained net- patches in [31, 52] is very small that produces a dif-
work. Hang [148] achieved classification accuracy of 66% ficulty in differentiating normal tissues from abnormal
for classification of full images into normal, benign and ones.
malignant. The authors in [14, 154] used the famous YOLO-based
deep CNN [83] for breast mass classification and local-
Lesion localization ization. The trained YOLO-based system localizes the
For localization, the information about which category an masses and classifies their types into benign or malignant.
image belongs to is already available and the task is to The authors in [154] achieved a mass location with an
instead figure out where exactly the object is located in overall accuracy of 96.33% and detection of benign and
the image. Classification and localization can also be com- malignant lesions with an overall accuracy of 85.52%.
bined so that a fixed amount of lesions in an image will
be classified and also located. This task, called multi-class Image retrieval
localization. The following authors employed CNNs in the Tasks like medical image retrieval using DL have been
aim of lesions classification and then localization within lately addressed in the medical field to facilitate the
these images [14, 31, 45, 52–54, 57, 61, 98, 154, 155], process of production and management of large medi-
potentially enabling E2E training. Ben-Ari et al. [24] intro- cal image databases. Conventional methods for analyzing
duced the detection of AD using a supervised pre-trained medical images have achieved limited success, as they are
region-based network (R-CNN). Ertosun and Rubin [53] not capable to tackle the huge databases. The learned fea-
developed an E2E dual CNN based visual search system tures and the classification results from training a CNN
for localization of mass(es) in MGs. Kisilev et al. [61] gave are used to retrieve medical images. Qayyum et al. [114]
a semantic description for MGs. The authors presented a proposed a DL based framework for content based med-
multi-task R-CNN approach for detection and semantic ical image retrieval (CBMIR) by training a deep CNN for
description of lesions in diagnostic images. Carneiro and the classification tasks using medical images for different
Bradley [54] presented an automated supervised architec- body organs (e.g. MGs, lungs, brain, liver etc. Qayyum
ture composes of a multi-scale deep belief network that et al. [114] achieved an average classification accuracy of
selects suspicious regions to be further processed by a 99.77% for 24 classes of medical images. Similarly, Ahmad
two-level cascaded R-CNN. Akselrod et al. [45] integrated et al. [156] trained a deep CNN for CBMIR of different 193
several cascaded segmentation modules into a modified classes for different body organs. Moreover, [156] applied
cascaded R-CNN. Hwang et al. [51] proposed a self- TL and augmentation to increase the performance of their
transfer learning framework which enables training CNNs deep CNN.
for object localization without neither any location infor-
mation nor pre-trained models. Zhu et al. [57] introduced Super resolution image reconstruction
an E2E adversarial training for mammographic mass seg- The task of super resolution image reconstruction using
mentation to learn robustly from scarce MGs. The authors CNN (SRCNN) is an E2E mapping between the low and
highlighted the importance of pre-processing, augmenta- high-resolution images for enhancing images [157]. The
tion, image enhancement, and normalization techniques. mapping is represented as a deep CNN that takes the
The authors stated that it is not feasible to use networks low resolution image as the input and outputs the high
resolution one. The study of Umehara et al. [158] shows of lesions [27, 54, 65]. More research is required to find
that SRCNN can significantly outperform conventional lesions of different sizes.
interpolation methods for enhancing image resolution in
Memory constrains
digital mammography especially in dense breasts.
The classification of whole size MG images is challeng-
ing due to the memory constraints and increased feature
Research challenges and directions
space. Researchers in [22, 128] address this problem by
In this section, we list the research challenges and
resizing the images to smaller ones, however, this affects
directions that require further investigations by the
the accuracy of their model. More research should be
community.
done on how to overcome the memory constraints while
training CNNs with full-size MG images.
Localization of tumors
The patch-based CNNs, R-CNNs, Fast R-CNNs, Faster
Non-annotated data-set
R-CNNs, and YOLO methods have recently become more
Another challenging problem to researchers is how to
popular for localization tasks for MGs. Faster R-CNN is
train a CNN model using a non-annotated data-set. In
the choice of most of the mammography researchers who
non-annotated data-set, the input image to CNN model
aim to obtain high detection accuracy numbers. However,
is binary labeled as normal or cancerous without any
training a R-CNN and its variants faster versions is time-
details about the location of the abnormalities. To address
consuming and memory expensive. In contrast, for faster
this problem, Lotter at el. [65] train a patch-level CNN
computations, less accurate detection, and limited mem-
classifier, which is then used as feature extractor to an
ory computations, the YOLO method is the right choice.
image-level model. Training the CNNs for classification of
Finally, patch-based CNN methods are not recommended
non-annotated data-set is still an open area for research
and result in many false positives. More research need to
[20, 65, 129].
be done for better localization of tumors in MGs.
False positives reduction
Limited data for learning Even though CNNs are very successful in providing bet-
One of the challenging problems that face researchers ter performance compared to traditional CADs, they
while training CNNs is the size of the training data-set. still result in false positives. False positive results cause
As discussed in the best practice section, although sev- patients needless anxiety, additional testing, biopsies, and
eral approaches such as data augmentation, TL, and drop unnecessary costs. Several approaches have been pro-
out have been used to handle the problem of training the posed to improve false positive in CNNs such as using
model with limited samples, this problem has remained MV CNNs [15, 20–25, 95, 107, 108]. However, more
challenging. research is required to integrate prior images with current
screening to eliminate false positives.
Imbalanced data-set
Another challenging problem is the imbalance ratio Multiple detection
between positive and negative classes in the training data- Current CNN models are trained to detect and/or local-
sets. Training CNN models directly on imbalanced data- ize mass(es) within MGs neglecting the existence of MCs.
sets may bias the prediction towards the more common More research should be directed on detecting multiple
classes like normal. The effect of imbalanced data-set on abnormalities within the same breast.
the performance of a CNN for MGs has not been stud-
ied thoroughly. Some works used balanced data-set and Pre-processing filters
some used imbalanced ones. Since in general less abnor- In FSM images, a significant number of abnormalities are
mal MGs are available compare to normal MGs it is misdiagnosed or missed due to the less visibility, low con-
very important to investigate the effect of using balanced trast, poor quality, and noisy nature of these images. Com-
and imbalanced data-sets on the accuracy of the CNN mon pre-processing techniques (e.g. CLAHE, median fil-
model. ter) are proposed in [62, 89–91] to enhance image quality,
image smoothing and noise reduction. However, choos-
Size of lesions ing the proper pre-processing technique for MGs in order
The size variation of lesions within MG images is another to improve the classification of CNNs is still an open
challenge for training CNNs in detecting cancer. Resiz- problem.
ing a large MG to 224×224 or 227×227 (common choices
among researchers) will likely make the ROI hard to detect Discussion and recommendations
and/or classify. To address this problem, several studies We show a breakdown of the studies included in this sur-
have proposed to train a CNN model using different scales vey grouped by their neural network task (see Additional
file 1: Table S1). Figure 3, shows the percentage of stud- • Use a suitable validation approach according to the
ies employing some of the CNN best practices that size of the data-set available.
are discussed in the previous section and are shown in • Use augmentation, drop-out, and TL to reduce
Additional file 1: Table S1. 78 studies (out of 83) used overfitting and increase the generalization of the
common pre-processing techniques to enhance the qual- model.
ity of images, reduce or remove noise, and improve the • Use suitable batch size if using ROIs.
contrast of MGs. That shows the importance of having • Use multi-view (MV) CNNs to embed more
a good separation between foreground and background information for better performance.
pixels and not removing the important information from • Use full resolution images if it is computationally
the images. Moreover, 59 studies used ROIs for more effi- practical.
cient computation, while 23 studies applied CNN to MG • Mix between FFDM and FSM images.
of full image size as in [20, 22, 26, 32, 45, 51, 57, 62]. Even • Use suitable activation function such as ReLU, be
for CNNs that are trained with full image size, the pre- careful with initializing the learning rates and possibly
processing is mandatory to remove marks, labels, pectoral monitor the fraction of dead neurons in the network.
muscle and black areas that can interfere in the post- • Use large well labeled data-set if available.
processing of these images. Data augmentation has been • Go deeper in layers if large data-set is available.
recommended and employed by 52 studies. Data augmen- • Use context and patient information in multi-modal
tation reduces overfitting by generating more instances of models.
training data. TL is gaining more popularity for medical • Use recently available libraries for implementing
images, 32 studies have successfully applied it to pre-train CNNs such as Tensorflow or Keras.
their network. From 2015 until now, there is an increas-
ing trend in using TL. 15 studies implemented a MV Conclusions
CNNs which lead to significant improvements in the per- In this survey, we conducted a detailed review of the
formance of the single-view ones. It is a beneficial practice strengths, limitations, and performance of the most
to use both CC and MLO views to detect abnormalities. recent CNNs applications in analyzing mammogram
25 studies implemented an E2E CNN which may include (MG) images. This survey systematically compares recent
segmentation, detection, and classification of lesions in approaches of CNNs in MG images, and show how the
MGs. We summarize the recommendations to signifi- advances in DL methods give promising results that can
cantly improve the performance of CNNs in detection aid radiologists and serve as a second eye for them.
and classification of breast cancer using MG images as The potential role of CNN methods is to handle mil-
follows: lions of routine imaging exams, presenting the poten-
tial cancers to the radiologists who perform follow-
• Use pre-processing techniques such as CLAHE filter up procedures. We discuss the currently publicly avail-
to improve the contrast of MGs, median filter to able MG databases. We also give a deep insight into
reduce noise, and un-sharp masking to smooth the the architectures of CNNs used for various tasks in
images. mammography.
• Apply cropping and down sampling for more This survey represents a valuable resource for the mam-
efficient computation. mography research community since it can be utilized
as a basis in their current and future studies. The given Authors’ contributions
comparison among common publicly available MG repos- DH and SN designed the study. DH performed all the analyses in this paper
and interpreted the results. DH and SN wrote the manuscript. All authors read
itories guides the community to select the most appro- and approved the final version of the manuscript.
priate database for their application(s). Moreover, this
Ethics approval and consent to participate
survey lists the best practices that improve the perfor- Not applicable.
mance of CNNs including the pre-processing of images
Consent for publication
and the use of multi-view images. In addition, other listed
Not applicable.
techniques like transfer learning (TL), data augmentation,
batch normalization, and dropout are appealing solu- Competing interests
The authors declare that they have no competing interests.
tions to reduce overfitting and increase the generalization
of the CNN model. Finally, we identified research chal- Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
lenges and directions that require further investigations published maps and institutional affiliations.
for mammography.
Author details
1 Department of Computer Science and Engineering, University of
Connecticut, 06269 Storrs, CT, USA. 2 The Informatics Research Institute (IRI),
Additional file City of Scientific Research and Technological Application (SRTA-City), New
Borg El-Arab, Egypt. 3 Department of Diagnostic Imaging, University of
Connecticut Health Center, 06030 Farmington, CT, USA.
Additional file 1: Supplementary Table 1, a comparison between
different approaches in literature. (PDF 98 kb) Published: 6 June 2019
References
Abbreviations 1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin.
ACC: Accuracy; AUC: Area under the receiver operating curve; AD: Architectural 2016;66(1):7–30.
distortion; BACs: Breast arterial calcifications; BCDR: Breast cancer digital 2. Li Y, Chen H, Cao L, Ma J. A survey of computer-aided detection of
repository; BN: Batch normalization; CAD: Computer-aided detection; CBMIR: breast cancer with mammography. J Health Med Inf. 2016;4(7).
Content based medical image retrieval; CC: Craniocaudal; CRF: Conditional 3. Feig SA. Screening mammography benefit controversies: sorting the
random field; CLAHE: Contrast limited adaptive histogram equalization; CNNs: evidence. Radiol Clin N Am. 2014;3(52):455–80.
Deep convolutional neural networks; DBT: Digital breast tomosynthesis; 4. Welch HG, Passow HJ. Quantifying the benefits and harms of screening
DDMS: Digital database for screening mammography; DL: Deep learning; E2E: mammography. JAMA Intern Med. 2014;3(174):448–54.
End-to-end; FC: Fully connected layer; FFDM: Digital mammography; FN: False 5. Lehman CD, Wellman RD, Buist DS, Kerlikowske K, Tosteson AN,
negative; FP: False positive; FPR: False positive rate; FSMs: Film screen Miglioretti DL. Diagnostic accuracy of digital screening mammography
mammograms; HGD: Histogram of the gradient divergence; HOG: Histogram with and without-aided detection. JAMA Intern Med. 2015;175(11):
of oriented gradients; IRMA: Image retrieval in medical applications; MCs: 1828–37.
Calcifications; MG: Mammogram; MIAS: Mammographic Image Analysis 6. Hayward JH, Ray KM, Wisner DJ, Kornak J, Lin W, Joe BN, et al. Improving
Society; ML: Machine learning; MLO: Mediolateral-oblique; MLP: Multilayer screening mammography outcomes through comparison with multiple
perceptron; PMD: Probability map of breast density; Pool: Pooling layer; R-CNN: prior mammograms. Am J Roentgenol. 2016;207(4):918–24.
Region-based network; ReLU: Rectified liner unit; ROC: Receiver operating 7. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.
characteristic curve; ROIs: Region of interests; SRCNN: Super resolution image 8. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al.
reconstruction; SSL: Semi-supervised learning; SVM: Support vector machine; A survey on deep learning in medical image analysis. 2017. arXiv
TL: Transfer learning; TN: True negative; TP: True positive; TPR: True positive rate preprint arXiv:170205747.
9. Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX. Deep Learning and Its
Applications to Machine Health Monitoring: A Survey. 2016. arXiv
Acknowledgements preprint arXiv:161207640.
Not applicable. 10. Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, et al. Deep Learning in
Medical Imaging: General Overview. Korean J Radiol. 2017;4(18):570–84.
Funding 11. Hedjazi MA, Kourbane I, Genc Y. On identifying leaves: A comparison of
This study was supported by the Sheida Nabavi’s startup fund at the University CNN with classical ML methods. In: Signal Processing and Communications
of Connecticut and Dina Abdelhafiz’s scholarship from Egypt. Publication Applications Conference (SIU) 2017 25th. IEEE; 2017. p. 1–4.
12. Kooi T, Gubern-Merida A, Mordang JJ, Mann R, Pijnappel R, Schuur K,
costs are funded by the Sheida Nabavi’s startup fund.
et al. A comparison between a deep convolutional neuralnetwork and
radiologists for classifying regions of interest in mammography. In:
Availability of data and materials International Workshop on Digital Mammography. Springer; 2016.
The DDSM dataset is available online at http://www.eng.usf.edu/cvprg/ p. 51–6.
Mammography/Database.html. The INbreast dataset can be requested online 13. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, et al.
at http://medicalresearch.inescporto.pt/breastresearch/index.php/Get_ Deep Learning: A Primer for Radiologist RadioGraphics. 2017;7(37):2113–31.
INbreast_Database. The breast cancer digital repository (BCDR) dataset can be 14. Platania R, Shams S, Yang S, Zhang J, Lee K, Park SJ. Automated Breast
requested online at https://bcdr.eu. The MIAS database is available online at Cancer Diagnosis Using Deep Learning and Region of Interest Detection
http://peipa.essex.ac.uk/info/mias.html. (BC-DROID). In: Proceedings of the 8th ACM International Conference on
Bioinformatics, Computational Biology and Health Informatics. ACM;
About this supplement 2017. p. 536–43.
This article has been published as part of BMC Bioinformatics Volume 20 15. Wang J, Ding H, Azamian F, Zhou B, Iribarren C, Molloi S, et al.
Supplement 11, 2019: Selected articles from the 7th IEEE International Conference Detecting cardiovascular disease from mammograms with deep
on Computational Advances in Bio and Medical Sciences (ICCABS 2017): learning. IEEE Trans Med Imaging. 2017.
bioinformatics. The full contents of the supplement are available online at 16. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep
https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume- convolutional neural networks. In: Advances in neural information
20-supplement-11. processing systems; 2012. p. 1097–105.
17. Greenspan H, van Ginneken B, Summers RM. Guest editorial deep 37. Moreira IC, Amaral I, Domingues I, Cardoso A, Cardoso MJ, Cardoso JS.
learning in medical imaging: Overview and future promise of an exciting Inbreast: toward a full-field digital mammographic database. Acad
new technique. IEEE Trans Med Imaging. 2016;35(5):1153–9. Radiol. 2012;19(2):236–48.
18. Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big 38. Lopez MG, Posada N, Moura DC, Pollán RR, Valiente JMF, Ortega CS,
Data. 2016;3(1):9. et al. BCDR: a breast cancer digital repository. In: 15th International
19. Christoyianni I, Constantinou E, Dermatas E. Automatic detection of Conference on Experimental Mechanics; 2012.
abnormal tissue in bilateral mammograms using neural networks. 39. Oliveiraa JE, Guelda MO, Araújob AdA, Ottc B, Desernoa TM. Towards a
Methods Appl Artif Intell. 2004;267–75. standard reference database for computer-aided mammography. In:
20. Zhu W, Lou Q, Vang YS, Xie X. Deep multi-instance networks with Proc SPIE vol. 2008. p. 69151Y.
sparse label assignment for whole mammogram classification. 40. Chan HP, Lo SCB, Sahiner B, Lam KL, Helvie MA. Computer-aided
In: International Conference on Medical Image Computing and detection of mammographic microcalcifications: Pattern recognition
Computer-Assisted Intervention. Springer; 2017. p. 603–11. with an artificial neural network. Med Phys. 1995;22(10):1555–67.
21. Kooi T, Ginneken B, Karssemeijer N, Heeten A. Discriminating solitary 41. Sahiner B, Chan HP, Petrick N, Wei D, Helvie MA, Adler DD, et al.
cysts from soft tissue lesions in mammography using a pretrained deep Classification of mass and normal breast tissue: a convolution neural
convolutional neural network. Medical physics. 2017;44(3):1017–27. network classifier with spatial domain and texture images. IEEE Trans
22. Geras KJ, Wolfson S, Kim S, Moy L, Cho K. High-Resolution Breast Med Imaging. 1996;5(15):598–610.
Cancer Screening with Multi-View Deep Convolutional Neural Networks.
42. Ge J, Hadjiiski LM, Sahiner B, Wei J, Helvie MA, Zhou C, et al.
2017. arXiv preprint arXiv:170307047.
Computer-aided detection system for clustered microcalcifications:
23. Yi D, Sawyer RL, Cohn III D, Dunnmon J, Lam C, Xiao X, et al.
comparison of performance on full-field digital mammograms and
Optimizing and Visualizing Deep Learning for Benign/Malignant
digitized screen-film mammograms. Phys Med Biol. 2007;4(52):981.
Classification in Breast Tumors. 2017. arXiv preprint arXiv:170506362.
24. Ben-Ari R, Akselrod-Ballin A, Karlinsky L, Hashoul S. Domain specific 43. Jamieson AR, Drukker K, Giger ML. Breast image feature learning with
convolutional neural nets for detection of architectural distortion in adaptive deconvolutional networks SPIE Medical Imaging. Strony.
mammograms. In: Biomedical Imaging (ISBI 2017) 2017 IEEE 14th 2012;2012:831506–831506.
International Symposium on IEEE. 2017. p. 552–6. 44. Mordang JJ, Janssen T, Bria A, Kooi T, Gubern-Mérida A, Karssemeijer
25. Dhungel N, Carneiro G, Bradley AP. Fully automated classification of N. Automatic microcalcification detection in multi-vendor
mammograms using deep residual neural networks. In: Biomedical mammography using convolutional neural networks. In: International
Imaging (ISBI 2017) 2017 IEEE 14th International Symposium on IEEE. Workshop on Digital Mammography. Springer; 2016. p. 35–42.
2017. p. 310–4. 45. Akselrod-Ballin A, Karlinsky L, Alpert S, Hasoul S, Ben-Ari R, Barkan E. A
26. Fonseca P, Mendoza J, Wainer J, Ferrer J, Pinto J, Guerrero J, et al. region based convolutional network for tumor detection and
Automatic breast density classification using a convolutional neural classification in breast mammography. In: International Workshop on
network architecture search procedure. In: Proc of SPIE Vol; 2015. Large-Scale Annotation of Biomedical Data and Expert Label Synthesis.
p. 941428–1. Springer; 2016. p. 197–205.
27. Kallenberg M, Petersen K, Nielsen M, Ng AY, Diao P, Igel C, et al. 46. Huynh BQ, Li H, Giger ML. Digital mammographic tumor classification
Unsupervised deep learning applied to breast density segmentation using transfer learning from deep convolutional neural networks. J Med
and mammographic risk scoring. IEEE Trans Med Imaging. 2016;35(5): Imaging. 2016;3(3):034501–034501.
1322–31. 47. CBIS-DDSM. https://mcl.nci.nih.gov/science-data/cbis-ddsm-1.
28. Oustimov A, Gastounioti A, Hsieh MK, Pantalone L, Conant EF, Kontos D. Accessed 3 Feb 2019.
Convolutional neural network approach for enhanced capture of breast 48. Heath M, Bowyer K, Kopans D, Moore R, Kegelmeyer WP. The digital
parenchymal complexity patterns associated with breast cancer risk. In: database for screening mammography. In: Proceedings of the 5th
SPIE Medical Imaging. International Society for Optics and Photonics; international workshop on digital mammography. Medical Physics
2017. p. 101340S–101340S. Publishing; 2000. p. 212–8.
29. Petersen K, Nielsen M, Diao P, Karssemeijer N, Lillholm M. Breast tissue 49. Abbas Q. DeepCAD: A Computer-Aided Diagnosis System for
segmentation and mammographic risk scoring using deep learning. Mammographic Masses Using Deep Invariant Features. Computers.
In: International Workshop on Digital Mammography. Springer; 2014. 2016;4(5):28.
p. 88–94. 50. Gallego-Posada J, Montoya-Zapata D, Quintero-Montoya O. Detection
30. Qiu Y, Wang Y, Yan S, Tan M, Cheng S, Liu H, et al. An initial and Diagnosis of Breast Tumors using Deep Convolutional Neural
investigation on developing a new method to predict short-term breast Networks.
cancer risk based on deep learning technology. In: SPIE Medical Imaging. 51. Hwang S, Kim HE. Self-transfer learning for fully weakly supervised
International Society for Optics and Photonics. 2016. p. 978521. object localization. 2016. arXiv preprint arXiv:160201625.
31. Sun W, Tseng TLB, Zheng B, Qian W. A preliminary study on breast
52. Dhungel N, Carneiro G, Bradley AP. Deep learning and structured
cancer risk analysis using deep neural network. In: International
prediction for the segmentation of mass in mammograms. In:
Workshop on Digital Mammography. Springer; 2016. p. 385–91.
International Conference on Medical Image Computing and
32. Carneiro G, Nascimento J, Bradley AP. Unregvistered multiview mammogram
Computer-Assisted Intervention. Springer; 2015. p. 605–612.
analysis with pre-trained deep learning models. In: International
53. Ertosun MG, Rubin DL. Probabilistic visual search for masses within
Conference on Medical Image Computing and Computer-Assisted
mammography images using deep learning. In: Bioinformatics and
Intervention. Springer; 2015. p. 652–60.
Biomedicine (BIBM), 2015 IEEE International Conference on. IEEE; 2015.
33. Becker AS, Marcon M, Ghafoor S, Wurnig MC, Frauenfelder T, Boss A.
p. 1310–5.
Deep learning in mammography: diagnostic accuracy of a multipurpose
image analysis software in the detection of breast cancer. Investig 54. Carneiro NDG, Bradley AP. Automated Mass Detection from
Radiol. 2017;52(7):434–40. Mammograms using Deep Learning and Random Forest. 2016.
34. Ahn CK, Heo C, Jin H, Kim JH. A Novel Deep Learning-based Approach 55. Jiao Z, Gao X, Wang Y, Li J. A deep feature based framework for breast
to High Accuracy Breast Density Estimation in Digital Mammography. In: masses classification. Neurocomputing. 2016;197:221–31.
SPIE Medical Imaging. International Society for Optics and Photonics. 56. Bekker AJ, Greenspan H, Goldberger J. A multi-view deep learning architecture
2017. p. 101342O–101342O. for classification of breast microcalcifications. In: Biomedical Imaging
35. Li H, Giger ML, Huynh BQ, Antropova NO. Deep learning in breast (ISBI) 2016 IEEE 13th International Symposium on. IEEE; 2016. p. 726–30.
cancer risk assessment: evaluation of convolutional neural networks on 57. Zhu W, Xie X. Adversarial deep structural networks for mammographic
a clinical dataset of full-field digital mammograms. J Med Imaging. mass segmentation. 2016. arXiv preprint arXiv:161205970.
2017;4(4):041304. 58. Lévy D, Jain A. Breast mass classification from mammograms using deep
36. Suckling J, Parker J, Dance D, Astley S, Hutt I, Boggis C, et al. The convolutional neural networks. 2016. arXiv preprint. arXiv:161200542.
mammographic image analysis society digital mammogram database. 59. Sharma K, Preet B. Classification of mammogram images by using CNN
In: Exerpta Medica. International Congress Series. 1994. p. 375–8. classifier; 2016. p. 2743–9.
60. Suzuki S, Zhang X, Homma N, Ichiji K, Sugita N, Kawasumi Y, et al. 81. Srinivas S, Sarvadevabhatla RK, Mopuri KR, Prabhu N, Kruthiventi SS,
Mass detection using deep convolutional neural network for Babu RV. An Introduction to Deep Convolutional Neural Nets for
mammographic computer-aided diagnosis. In: Society of Instrument Computer Vision. In: Deep Learning for Medical Image Analysis. Elsevier;
and Control Engineers of Japan (SICE) 2016 55th Annual Conference of 2017. p. 25–52.
the. IEEE; 2016. p. 1382–6. 82. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al.
61. Kisilev P, Sason E, Barkan E, Hashoul S. Medical image description using Imagenet large scale visual recognition challenge. Int J Comput Vis.
multi-task-loss CNN. In: International Workshop on Large-Scale 2015;3(115):211–52.
Annotation of Biomedical Data and Expert Label Synthesis. Springer; 83. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified
2016. p. 121–9. real-time object detection. In: Proceedings of the IEEE conference on
62. Teare P, Fishman M, Benzaquen O, Toledano E, Elnekave E. Malignancy computer vision and pattern recognition. 2016. p. 779–88.
Detection on Mammography Using Dual Deep Convolutional Neural 84. Al-masni MA, Al-antari MA, Park JM, Gi G, Kim TY, Rivera P, et al.
Networks and Genetically Discovered False Color Input Enhancement. J Simultaneous detection and classification of breast masses in digital
Digit Imaging. 2017;4(30):499–505. mammograms via a deep learning YOLO-based CAD system. Comput
63. Sert E, Ertekin S, Halici U. Ensemble of convolutional neural networks for Methods Prog Biomed. 2018;157:85–94.
classification of breast microcalcification from mammograms. In: 85. Mishkin D, Sergievskiy N, Matas J. Systematic evaluation of CNN
Engineering in Medicine and Biology Society (EMBC) 2017 39th Annual advances on the ImageNet. 2016. arXiv preprint. arXiv:160602228.
International Conference of the IEEE. IEEE; 2017. p. 689–92. 86. Jifara W, Jiang F, Rho S, Cheng M, Liu S. Medical image denoising using
64. Jiao Z, Gao X, Wang Y, Li J. A parasitic metric learning net for breast convolutional neural network: a residual learning approach. J
mass classification based on mammography. Pattern Recogn. 2017. Supercomput. 20171–15.
65. Lotter W, Sorensen G, Cox D. A Multi-scale CNN and Curriculum 87. Sharma J, Rai J, Tewari R. Identification of pre-processing technique for
Learning Strategy for Mammogram Classification. In: Deep Learning in enhancement of mammogram images. In: Medical Imaging m-Health
Medical Image Analysis and Multimodal Learning for Clinical Decision and Emerging Communication Systems (MedCom) 2014 International
Support. Springer; 2017. p. 169–177. Conference on. IEEE; 2014. p. 115–9.
66. Jadoon MM, Zhang Q, Haq IU, Butt S, Jadoon A. Three-Class 88. Bandyopadhyay SK. Pre-processing of Mammogram Images. Int J Eng
Mammogram Classification Based on Descriptive CNN Features. BioMed Sci Technol. 2010;11(2):6753–8.
Res Int. 2017;2017. 89. Kaur P, Kaur A. Review of Different Approaches in Mammography. 2016.
67. Domingues I, Cardoso J. Mass detection on mammogram images: a first 90. Bria A, Marrocco C, Galdran A, Campilho A, Marchesi A, Mordang JJ, et
assessment of deep learning techniques. In: 19th Portuguese al. Spatial Enhancement by Dehazing for Detection of
Conference on Pattern Recognition (RECPAD). 2013. Microcalcifications with Convolutional Nets. In: International Conference
68. Domingues I, Sales E, Cardoso J, Pereira W. Inbreast-database masses on Image Analysis and Processing. Springer; 2017. p. 288–98.
characterization. XXIII CBEB. 2012. 91. Abdelhafiz D, Nabavi S, Ammar R, Yang C. The Effect of Pre-Processing
69. Arevalo J, González FA, Ramos-Pollán R, Oliveira JL, Lopez MAG. on Breast Cancer Detection Using Convolutional Neural Networks. In:
Convolutional neural networks for mammography mass lesion Poster session presented at the meeting of the IEEE International
classification. In: Engineering in Medicine and Biology Society (EMBC) Symposium on Biomedical Imaging. Washington DC; 2018.
2015 37th Annual International Conference of the IEEE. IEEE; 2015. p. 92. Abdelhafiz D, Nabavi S, Ammar R, Yang C. Survey on deep
797–800. convolutional neural networks in mammography. In: Computational
70. Wichakam I, Vateekul P. Combining deep convolutional networks and Advances in Bio and Medical Sciences (ICCABS), 2017 IEEE 7th
SVMs for mass detection on digital mammograms. In: Knowledge and International Conference on. IEEE. p. 1.
Smart Technology (KST) 2016 8th International Conference on. IEEE; 93. Ge J, Sahiner B, Hadjiiski LM, Chan HP, Wei J, Helvie MA, et al.
2016. p. 239–44. Computer aided detection of clusters of microcalcifications on full field
71. Arevalo J, González FA, Ramos-Pollán R, Oliveira JL, Lopez MAG. digital mammograms. Med Phys. 2006;33(8):2975–88.
Representation learning for mammography mass lesion classification 94. Samala RK, Chan HP, Hadjiiski LM, Helvie MA, Cha KH. Richter CD.
with convolutional neural networks. Comput Methods Prog Biomed. Multi-task transfer learning deep convolutional neural network:
2016;127:248–57. application to computer-aided diagnosis of breast cancer on
72. Jiang F, Liu H, Yu S, Xie Y. Breast mass lesion classification in mammograms. Phys Med Biol. 2017;23(62):8894.
mammograms by transfer learning. In: Proceedings of the 5th 95. Carneiro G, Nascimento J, Bradley AP. Automated Analysis of
International Conference on Bioinformatics and Computational Biology. Unregistered Multi-View Mammograms With Deep Learning. IEEE Trans
ACM; 2017. p. 59–62. Med Imaging. 2017;11(36):2355–65.
73. Chougrad H, Zouaki H, Alheyane O. Convolutional Neural Networks for 96. Kooi T, Litjens G, van Ginneken B, Gubern-Mérida A, Sánchez CI, Mann R,
Breast Cancer Screening: Transfer Learning with Exponential Decay. et al. Large scale deep learning for computer aided detection of
2017. arXiv preprint arXiv:171110752. mammographic lesions. Med Image Anal. 2017;35:303–12.
74. Hepsağ PU, Özel SA, Yazıcı A. Using deep learning for mammography 97. Dhungel N, Carneiro G, Bradley AP. The automated learning of deep
classification. In: Computer Science and Engineering (UBMK)2017. features for breast mass classification from mammograms. In:
International Conference on IEEE. 2017. p. 418—23. International Conference on Medical Image Computing and
75. Schmidhuber J. Deep learning in neural networks: An overview. Neural Computer-Assisted Intervention. Springer; 2016. p. 106–14.
Netw. 2015;61:85–117. 98. Dhungel N, Carneiro G, Bradley AP. A deep learning approach for the
76. Wei D, Sahiner B, Chan HP, Petrick N. Detection of masses on analysis of masses in mammograms with minimal user intervention.
mammograms using a convolution neural network. In: Acoustics Med Image Anal. 2017;37:114–28.
Speech and Signal Processing 1995. ICASSP-95. 1995 International 99. Qiu Y, Yan S, Tan M, Cheng S, Liu H, Zheng B. Computer-aided
Conference on. vol. 5. IEEE; 1995. p. 3483–6. classification of mammographic masses using the deep learning
77. Zeiler MD, Fergus R. Visualizing and understanding convolutional technology: a preliminary study. In: SPIE Medical Imaging. International
networks. In: European conference on computer vision Springer. 2014. Society for Optics and Photonics; 2016. p. 978520–978520.
p. 818–33. 100. Antropova N, Huynh BQ, Giger ML. A Deep Feature Fusion
78. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going Methodology for Breast Cancer Diagnosis Demonstrated on Three
deeper with convolutions. In: Proceedings of the IEEE conference on Imaging Modality Datasets. Med Phys. 2017.
computer vision and pattern recognition. 2015. p. 1–9. 101. Wang D, Khosla A, Gargeya R, Irshad H, Beck AH. Deep learning for
79. Simonyan K, Zisserman A. Very deep convolutional networks for identifying metastatic breast cancer. 2016. arXiv preprint.
large-scale image recognition. 2014. arXiv preprint. arXiv:14091556. arXiv:160605718.
80. He K, Zhang X, Ren S, Sun J. Deep residual learning for image 102. Lo SCB, Li H, Wang Y, Kinnard L, Freedman MT. A multiple circular path
recognition. In: Proceedings of the IEEE conference on computer vision convolution neural network system for detection of mammographic
and pattern recognition. 2016. p. 770–8. masses. IEEE Trans Med Imaging. 2002;21(2):150–8.
103. Agrawal P, Vatsa M, Singh R. Saliency based mass detection from 127. Xi P, Shu C, Goubran R. Abnormality Detection in Mammography using
screening mammograms. Signal Process. 2014;99:29–47. Deep Convolutional Neural Networks. 2018. arXiv preprint.
104. Dubrovina A, Kisilev P, Ginsburg B, Hashoul S, Kimmel R. arXiv:180301906.
Computational mammography using deep neural networks. Comput 128. Zhang X, Zhang Y, Han EY, Jacobs N, Han Q, Wang X, et al. Whole
Methods Biomech Biomed Eng Imaging Vis. 20161–5. mammogram image classification with convolutional neural networks.
105. Sun W, Tseng TLB, Zhang J, Qian W. Enhancing deep convolutional In: Bioinformatics and Biomedicine (BIBM) 2017 IEEE International
neural network scheme for breast cancer diagnosis with unlabeled data. Conference on. IEEE; 2017. p. 700–4.
Comput Med Imaging Graph. 2017;57:4–9. 129. Shen L. End-to-end Training for Whole Image Breast Cancer Diagnosis
106. Samala RK, Chan HP, Hadjiiski L, Helvie MA, Wei J, Cha K. Mass detection using An All Convolutional Design. 2017. arXiv preprint. arXiv:170809427.
in digital breast tomosynthesis: Deep convolutional neural network with 130. Hohman F, Kahng M, Pienta R, Chau DH. Visual Analytics in Deep
transfer learning from mammography. Med Phys. 2016;43(12):6654–66. Learning: An Interrogative Survey for the Next Frontiers. 2018. arXiv
107. Wu N, Geras KJ, Shen Y, Su J, Kim S, Kim E, et al. Breast density preprint arXiv:180106889.
classification with deep convolutional neural networks. 2017. arXiv 131. Lee RS, Gimenez F, Hoogi A, Miyake KK, Gorovoy M, Rubin DL. A
preprint arXiv:171103674. curated mammography data set for use in computer-aided detection
108. Kooi T, Karssemeijer N. Classifying symmetrical differences and temporal and diagnosis research. Sci Data. 2017;4:170177.
change for the detection of malignant masses in mammography using 132. Qiu Y, Yan S, Gundreddy RR, Wang Y, Cheng S, Liu H, et al. A New
deep neural networks. J Med Imaging. 2017;4(4):044501. Approach to Develop Computer-Aided Diagnosis Scheme of Breast
109. Hadad O, Bakalo R, Ben-Ari R, Hashoul S, Amit G. Classification of breast Mass Classification Using Deep Learning Technology. J X-Ray Sci
lesions using cross-modal deep learning, Vol. 2017. IEEE. p. 109–12. Technol (Preprint). 20171–13.
110. Mohamed AA, Berg WA, Peng H, Luo Y, Jankowitz RC, Wu S. A deep 133. Thomaz RL, Carneiro PC, Patrocinio AC. Feature extraction using
learning method for classifying mammographic breast density convolutional neural network for classifying breast density in
categories. Med Phys. 2017. mammographic images. In: Medical Imaging 2017: Computer-Aided
111. Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Diagnosis. vol. 10134. International Society for Optics and Photonics.
Proceedings of the Fourteenth International Conference on Artificial 2017. p. 101342M.
Intelligence and Statistics. 2011. p. 315–23. 134. Yadav S, Shukla S. Analysis of k-Fold Cross-Validation over Hold-Out Validation
112. Pedamonti D. Comparison of non-linear activation functions for deep on Colossal Datasets for Quality Classification. In: Advanced Computing
neural networks on MNIST classification task. 2018. arXiv preprint (IACC) 2016 IEEE 6th International Conference on. IEEE; 2016. p. 78–83.
arXiv:180402763. 135. Masko D. Hensman, P; 2015.
113. Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural 136. Chawla NV. Data mining for imbalanced datasets: An overview. In: Data
network acoustic models. In: Proc icml. vol. 30. 2013. p. 3. mining and knowledge discovery handbook. Springer; 2009. p. 875–86.
114. Qayyum A, Anwar SM, Awais M, Majid M. Medical image retrieval using 137. Raman V, Sumari P. Then H Al-Omari SAK. Review on Mammogram
deep convolutional neural network. Neurocomputing. 2017. Mass Detection by MachineLearning Techniques. Int J Comput Electr
115. Mohamed AA, Luo Y, Peng H, Jankowitz RC, Wu S. Understanding Eng. 2011;6(3):873.
Clinical Mammographic Breast Density Assessment: a Deep Learning 138. El Atlas N, El Aroussi M, Wahbi M. Computer-aided breast cancer
Perspective. J Digit Imaging. 20171–6. detection using mammograms: A review. In: Complex Systems (WCCS),
116. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. 2014 Second World Conference on. IEEE; 2014. p. 626–31.
Dropout: a simple way to prevent neural networks from overfitting. J 139. Erickson BJ, Korfiatis P, Akkus Z, Kline T, Philbrick K. Toolkits and
Mach Learn Res. 2014;15(1):1929–58. Libraries for Deep Learning. J Digit Imaging. 2017;1–6.
117. Smirnov EA, Timoshenko DM, Andrianov SN. Comparison of 140. Sherkhane P, Vora D. Survey of deep learning software tools. In: Data
regularization methods for imagenet classification with deep Management Analytics and Innovation (ICDMAI) 2017 International
convolutional neural networks. Aasri Procedia. 2014;6:89—94. Conference on. IEEE; 2017. p. 236–8.
118. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network 141. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. TensorFlow:
training by reducing internal covariate shift. In: International Conference A System for Large-Scale Machine Learning. In: OSDI. vol. 16. 2016.
on Machine Learning. 2015. p. 448–56. p. 265–83.
119. Carneiro G, Nascimento J, Bradley AP. CHAPTER OUTLINE. Deep Learn 142. Jia Y, Shelhamer E. Caffe model zoo. 2015.
Med Image Anal. 2017;321. 143. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, et al.
120. Suzuki S, Zhang X, Homma N, Ichiji K, Kawasumi Y, Ishibashi T, et al. Caffe: Convolutional Architecture for Fast Feature Embedding. 2014.
WE-DE-207B-02: Detection of Masses On Mammograms Using Deep arXiv preprint arXiv:14085093.
Convolutional Neural Network. A Feasibility Study. Med Phys. 2016;43(6): 144. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, et al.
3817–7. Automatic differentiation in pytorch. 2017.
121. Pang S, Yu Z, Orgun MA. A novel end-to-end classifier using domain 145. Vedaldi A, Lenc K. Matconvnet: Convolutional neural networks for
transferred deep convolutional neural networks for biomedical images. matlab. In: Proceedings of the 23rd ACM international conference on
Comput Methods Prog Biomed. 2017;140:283–93. Multimedia. ACM; 2015. p. 689–92.
122. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, et 146. Agarwal V, Carson C. Using Deep Convolutional Neural Networks to
al. Convolutional neural networks for medical image analysis. Full predict semantic features of lesions in mammograms. C231n Course
training or fine tuning? IEEE Trans Med Imaging. 2016;35(5):1299–312. Project Reports. 2015.
123. Oquab M, Bottou L, Laptev I, Sivic J. Learning and transferring mid-level 147. Jaffar MA. Deep Learning based Computer Aided Diagnosis System for
image representations using convolutional neural networks. In: Breast Mammograms. Int J Adv Comput Sci Appl. 2017;7(8):286–90.
Proceedings of the IEEE conference on computer vision and pattern 148. Hang W, Liu Z, Hannun A. GlimpseNet: Attentional Methods for
recognition. 2014. p. 1717–24. Full-Image Mammogram Diagnosis.
124. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in 149. Kooi T, Mordang JJ, Karssemeijer N. Conditional Random Field
deep neural networks? In: Advances in neural information processing Modelling of Interactions Between Findings in Mammography. In: SPIE
systems. 2014. p. 3320–8. Medical Imaging. International Society for Optics and Photonics. 2017.
125. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, et p. 101341E–101341E.
al. On the necessity of fine-tuned convolutional neural networks for 150. Bakkouri I, Afdel K. Breast tumor classification based on deep
medical imaging. In: Deep Learning and Convolutional Neural Networks convolutional neural networks. In: Advanced Technologies for Signal
for Medical Image Computing. Springer; 2017. p. 181–93. and Image Processing (ATSIP). International Conference on IEEE 2017.
126. Wei X, Chen J, Cai C. Using Deep Convolutional Neural Networks and 2017. p. 1–6.
Transfer Learning for Mammography Mass Lesion Classification. Journal 151. Valvano G, Della Latta D, Martini N, Santini G, Gori A, Iacconi C, et al.
of Computational and Theoretical Nanoscience. 2017;14(8):3802–06. Evaluation of a Deep Convolutional Neural Network method for the
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at