Feature Fusion Siamese Networkfor Breast CA
Feature Fusion Siamese Networkfor Breast CA
net/publication/359148689
Feature fusion Siamese network for breast cancer detection comparing current
and prior mammograms
CITATION READS
1 59
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Clifford K. Yang on 30 September 2022.
Jun Bai, Annie Jin, Tianyu Wang, Clifford Yang and Sheida Nabavi
Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield
Way, Storrs, CT 06269, USA
University of Connecticut School of Medicine, 263 Farmington Ave. Farmington, CT 06030,
USA
Department of Radiology, UConn Health, 263 Farmington Ave. Farmington, CT 06030, USA
Abstract
Purpose: Automatic detection of very small and non-mass abnormalities from mam-
mogram images has remained challenging. In clinical practice for each patient, radiol-
ogists commonly not only screen the mammogram images obtained during the exam-
ination, but also compare them with previous mammogram images to make a clinical
decision. To design an AI system to mimic radiologists for better cancer detection, in
this work we proposed an end-to-end enhanced Siamese convolutional neural network
to detect breast cancer using previous year and current year mammogram images.
Methods: The proposed Siamese based network uses high resolution mammogram
images and fuses features of pairs of previous year and current year mammogram im-
ages to predict cancer probabilities. The proposed approach is developed based on the
concept of one-shot learning that learns the abnormal differences between current and
prior images instead of abnormal objects, and as a result can perform better with small
sample size data sets. We developed two variants of the proposed network. In the first
model, to fuse the features of current and previous images, we designed an enhanced
distance learning network that considers not only the overall distance, but also the
pixel-wise distances between the features. In the other model, we concatenated the
features of current and previous images to fuse them.
Results: We compared the performance of the proposed models with those of some
baseline models that use current images only (ResNet and VGG) and also use cur-
rent and prior images (LSTM and vanilla Siamese) in terms of accuracy, sensitivity,
precision, F1 score and AUC. Results show that the proposed models outperform the
baseline models and the proposed model with the distance learning network performs
the best (accuracy: 0.92, sensitivity: 0.93, precision: 0.91, specificity: 0.91, F1: 0.92
and AUC: 0.95).
Conclusions: Integrating prior mammogram images improves automatic cancer classi-
fication, specially for very small and non-mass abnormalities. For classification models
that integrate current and prior mammogram images, using an enhanced and effective
distance learning network can advance the performance of the models.
This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting,
pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article
as doi: 10.1002/mp.15598
i
I. Introduction
In accordance with the recent cancer statistics, breast cancer is the most common cancer
diagnosed in women both in the U.S. and worldwide 1 , and induces considerable anxiety on
the general public 2 . From all newly diagnosed cancer cases, 30% are diagnosed as breast
cancer 1 . Since 1930, the American Cancer Society (ACS) and other cancer care pioneers
place emphasis on ”early detection” as the key for diminishing disease burden 3 . It is widely
documented that mammography (a gold standard for evaluation of breast cancer) and treat-
ment at an early stage contribute to a reduction in breast cancer mortality 3,4,5 . In 2004,
ACS updated their guidelines for breast cancer screening, recommending annual screening
Accepted Article
for women ages 40 and older, which allows cancers to be recognized before the appearance
of clinical signs 6 . However, the enormous number of screening mammograms as well as the
use of double reading of examinations lead to a heavy workload that constitutes a threat
to efficiency. In addition, the low prevalence of breast cancer in the screening population
and the complexity of mammograms limit the performance of radiologists and increase the
risk of false and missed diagnoses 7 . In mammography there is a likelihood of missing small
tumors surrounded by dense fibroglandular breast tissue, resulting in delays in the diagno-
sis and missing early detection 8 . As well, there is false positive detection, which leads to
unnecessary burden on patients such as benign biopsies, extra spending, and psychological
side effects 9,10 .
deep learning has shown promising results in analyzing mammogram images, classifying
mammogram images has remained challenging, mainly because of the specific characteristics
of mammogram images compared to those of natural images for which deep learning mod-
els have been originally developed for. Breast abnormalities are tiny parts of mammogram
images and they share many features with the background tissues. For example, masses
are similar to glandular breast tissue and architectural distortions appear as thin straight
lines or spiculations radiating from a point. Thus, it is crucial to develop a deep learning
model to accurately be able to detect occult small tumors, identify tumors with ambiguous
boundaries, and classify tumors with different shapes (non-mass).
It has been shown that comparing current mammogram scans with the prior ones im-
proves the outcomes of mammography classification models 22 . Indeed, in practice, radiolo-
Accepted Article
gists compare the current mammogram scans with previous ones to make an inference. A
couple of studies have employed deep learning based models to compare current and prior
mammogram images 23,24 . In the study by Kooi and Karssemeijer 24 , the authors used ROI
of masses from primary mammograms with their corresponding history mammograms (or
opposite breast mammogram if the history mammogram is not available) and employed a
twin CNNs to classify mammogram ROIs. The limitation of this study is that the model
requires mapping the mass ROIs from current mammograms to their prior ones, which in-
volves a certain level of radiologist knowledge to identify ROIs. In the other study by Perek
et al., CNN models are used to extract features from current year images and previous year
images, and then Long short-term memory (LSTM) layers are employed to learn the differ-
ences in order to classify mammograms 23 . This method requires image registration at the
data pre-processing stage. In addition, training LSTM to learn tissue changes by a small
size data set can cause overfitting.
Besides the challenge of extracting effective features from complex breast tissue, ana-
lyzing mammograms suffers from low sample size. Similar to other biomedical imaging data
sets, mammogram data sets include a limited number of images 25 . Recently, several tech-
niques have been proposed to adapt deep learning algorithms to low sample size regimens,
under the umbrella of one-shot learning. In particular, Siamese networks 26,27 — which take a
pair of images and identify if the pair of images contain the same object, even if the model has
never seen the object before — have attracted much attention and made significant strides
towards improving classification performance of limited size data sets. Siamese networks
I. INTRODUCTION
have been used in analyzing X-Ray 28,29 , CT 30,31 and other medical images 32,33,34,35 .
In this study, in light of Siamese Networks’ empirical effectiveness and the reality of
mammogram data scarcity, we propose Siamese network based models to classify mammo-
grams which are referenced from the radiologists’ reading procedure. More specifically, we
propose a model based on the Siamese network methodology that compares high resolu-
tion previous (history) mammogram exams with current mammogram exams to increase
the accuracy of breast cancer detection, and to be able to detect very small and non-mass
abnormalities. Our contributions are:
• developing a novel end-to-end model based on the Siamese CNN model that uses
Accepted Article
previous year and current year images as paired inputs to predict the probability of
malignancy.
• designing a new distance learning function for more effective comparison between the
current year and previous year mammogram images.
We evaluated the performance of our model using accuracy, sensitivity, precision, speci-
ficity, F1 score and ROC area under the curve (AUC) metrics. Moreover, we examined the
performance of our model in detecting non-mass and small tumors. We compared the per-
formance of the proposed model with those of some baseline models that use current images
only (ResNet and VGG) and also use current and prior images (LSTM and vanilla Siamese).
II. Methods
Traditional CNN models for FFDM classification only consider the intra-image (within im-
age) features from each individual image. Few models have been proposed to learn both
inter-image (between images) and intra-image features from both CC and MLO views of a
patient’s particular breast 36,37 . In this work, we constructed an end-to-end model based on
the Siamese network model proposed by Koch et al. 38 to extract intra-image and inter-image
features from pairs of patients’ previous and current year FFDMs for more accurate breast
cancer classification. In the following, we explain two variants of the proposed model to fuse
intra-image features: distance learning network, and concatenation network.
II.A.1. Feature Fusion Siamese CNN (FFS-CNN) with Distance Learning Net-
work
The proposed model consists of two identical parallel CNNs (twin CNNs) with shared weights
as twin networks followed by a distance learning network to predict whether or not the input
pair mammograms are similar or dissimilar. We used pre-trained ResNet as the backbone 39
for the parallel networks, as can be seen in Figure 1. The detail information about the
Accepted Article
ResNet model is given in Section II.B.1. and shown in Table 1. Each of the parallel networks
extracts internal level features from its input image. The distance learning network measures
the distance between the feature maps from the twin networks and employs a fully connected
(FC) network to learn the differences between the feature maps (inter-image features). We
called our proposed model Feature Fusion Siamese CNN (FFS-CNN).
Pairs of current and previous mammogram images are inputs of the proposed model,
FFS-CNN. The goal of the proposed model is to predict the similarity between a cur-
rent year image, denoted by C, and its corresponding previous year image, denoted
by P, where “similar” means normal and “dissimilar” means cancer. Define S =
{(C1 , P1 , y1 ), . . . , (CN , PN , yN )} to present the training data set, where yi represent the
class label. For paired of images Ci and Pi , the binary label yi is assigned to 1, indicating
cancer when Ci is a cancer image and Pi is a normal image. Otherwise, the binary label yi
is assigned to 0, indicating normal, when both Ci and Pi are normal images.
The twin CNNs generate feature representation denoting the flattened feature maps
(feature vectors) of a pair of current year and previous year images. These feature vectors
are input to the distance learning functions given in Equations 1 and 2, where d1 measures
the pixel-wise distance of fC and fP , d2 measures the Euclidean distance between fC and fP ,
and m is the size of the feature vectors.
d1 = fC − fP (1)
v
uX
u m
d2 = t (fC j − fP j )2 (2)
j=0
Vector d1 is concatenated with scalar d2 to build the distance feature for classification.
This distance feature inputs to the distance learning FC layer that is the output layer.
Finally, at the output layer a sigmoid function, as given in Equation 3 is applied to the
distance feature to predict the probability of dissimilarity (cancer) or similarity (normal).
where λ1 , λ2 , and λ3 are parameters for L, Lentropy is the cross entropy loss for classification
as given in 5.
Lentropy = −(y log(ŷ) + (1 − y) log(1 − ŷ)), (5)
Lnorm1 is the L1 norm, and Lnorm2 is the squared L2 norm of the vector representation
of the FC layer parameters, w, and are defined as
X
Lnorm1 = |wi |, (6)
wi ∈w
X
Lnorm2 = wi2 , (7)
wi ∈w
Lnorm1 and Lnorm2 are used as regularizer to penalize the number of parameters to avoid
overfitting. Training and optimization of the model are described in Section IV..
In order to examine the effectiveness of the proposed distance learning function used in
FFS-CNN, we developed a variant of the FFS-CNN model that does not include the distance
learning function. Same as FFS-CNN, the model contains two sub-networks (parallel CNNs),
using ResNet as the backbone to extract abstract intra-image features from pairs of input
images. However, instead of using distance learning functions, the extracted previous year
and current year features ( fC and fP ) are concatenated and are followed by a dense layer
(without using any distance function) to learn the feature level differences. We called this
model Feature Fusion Siamese CNN with Feature Concatenation (FFS-CNN-FC) (shown in
Accepted Article
Figure 2.e). The loss function used for this model is defined in Equation 4.
II.B.1. ResNet
The overall structure of the ResNet model is shown in Figure 2.a.1. We used the original
structure of ResNet50 proposed by He et al. 39 . The ResNet50 model contains five building
blocks followed by an average pooling layer. In the first building block, there is a 7 × 7
convolutional layer with a batch normalization layer and the ReLu activation layer. Max
pooling is also applied after the first building block. The other building blocks contain
convolutional blocks and identity blocks. Each convolutional block and identity block has
three convolutional layers (kernel size are: 1 × 1, 3 × 3, and 1 × 1), three batch normalization
layers and three activation layers. In convolutional blocks, an 1 × 1 convolutional layer and
batch normalization layer are added to the short cut path of the convolutional blocks (the
overall structure of the convolutional block and identity block are shown in the Figure 2.a.2).
To adjust the original ResNet50 network for our data set that has 2 classes (single neuron
output), we removed the top layers of the original ResNet and added two FC layers, with
dimensions of 512, and 256 with an output layer. We used ReLu activation function for the
FC layers. The output is a single neuron and we applied sigmoid function to obtain the
likelihood of cancer and normal. We used the binary cross-entropy loss function, given in
Accepted Article
II.B.2. VGG
We used the VGG model proposed by Simonyan and Zisserman 40 in this work as a baseline
model. The structure of the VGG model is demonstrated in Figure 2.b. The model contains
five building blocks. The first and second building blocks contain two convolutional layers
with pooling layers. The third to fifth building blocks contain three convolutional layers with
pooling layers. The kernel size of all the convolutional layers is 3 × 3. The ReLu activation
function is applied to all the convolutional layers. The last three FC layers are modified to
accommodate our data set. The last FC layers are switch to 512, and 256 dense layer. We
used the overall loss, including the binary cross-entropy loss function, given in Equation 4,
to train this model.
We also compared the performance of our models with a recently proposed LSTM-based
model which uses current year and prior year mammogram images to detect cancer 23 . The
overall model consists of the twin CNNs used in our proposed models which use ResNet
as the backbone, and an LSTM block to learn the feature changes from current year and
previous year images 41 . This LSTM-based model uses the extracted features from current
year and previous year images as longitudinal features, and employs the LSTM layers to
classify longitudinal features. As shown in Figure 2.c, the LSTM layers are applied to the
concatenation of previous year and current year features extracted from the twin CNNs for
classification. The LSTM block contains three layers (one 256 LSTM layer, one 128 LSTM
layer, and one 64 LSTM layer). We used the overall loss, including binary cross entropy,
given in Equation 4, as the loss function in this model.
We used the vanilla Siamese network proposed by Koch et al. 38 as a baseline model to
compare its performance with the performance of our proposed models. The structure of
Accepted Article
the parallel CNNs is the same as the structure of the parallel CNNs in our proposed model.
As Figure 2.d shows, the intra-image breast features of a current year image and a previous
year image are extracted from the shared weights twin networks. To learn the inter-image
breast tissue feature changes using the intra-image breast features, the vanilla Siamese model
employs the Euclidean distance function given in Equation 2. To predict the feature level
similarity of a pair of previous year and current year images, we used the contrastive loss
function given in Equation 8 for this model.
1 1 2
L = (1 − y) ∗ (d2 )2 + (y) ∗ max(0, n − d2 ) , (8)
2 2
where y is the ground truth label, d2 is the Euclidean distance given in Equation 2, and n is
a hyper-parameter, set to 1 in our experiments.
We used four data sets (three for pre-training and one for training and testing): i) Dig-
ital Database for Screening Mammography(DDSM) 42,43 , ii) the Chinese Mammography
Database (CMMD) 42,44 , iii) Breast Cancer Screening-Digital Breast Tomosynthesis (BCS-
DBT) 42,45 , and iv) a private data set provided by the Radiology department at the University
of Connecticut Health Center (UCHC). The overall work flow for using the data sets in this
study is shown in Figure 3.
The DDSM, CMMD and BCS-DBT (Table 2) were used to pre-train the backbone model,
ResNet and VGG baseline models. Note that these data sets do not include history images.
The DDSM data set contains normal, benign and cancer cases determined by experts.
Since our study focused on classifying cancer and normal cases, we excluded benign cases
from DDSM. The average resolution of original DDSM mammogram images is 3000 × 4800
pixels. We used 2,055 cancer cases from this data set. The CMMD data set contains benign
cases and cancer cases. We excluded benign cases from CMMD. The average resolution of
original CMMD mammogram images is 1914 × 2294 pixels. We used 2,632 cancer cases from
this data set.
Accepted Article
The BCS-DBT data set is a public Digital Breast Tomosynthesis (DBT) 3D data set
which contains normal, cancer, benign, and actionable FFDMs (did not result in biopsy but
requires further imaging) 45 . To increase the number of pre-training images and to have a
balanced number of cancer and normal cases for training the backbone models, we generated
synthetic 2D mammogram (s2D) using the BCS-DBT 3D mammograms. We employed the
combination of Hologic c-view and re-project 2D mammogram algorithms 46,47 to generate
s2D mammograms (the s2D algorithm is described in the Supplementary Materials). Based
on the design of our study, we leveraged normal and cancer cases from BCS-DBT. We
generated 8,528 normal s2D and 75 cancer s2D from BCS-DBT normal and cancer cases in
this study.
The UCHC data set, including current and history mammograms, was used to train, test and
validate the proposed and baseline models. The UCHC data set consists of collected FFDMs
from patients who had mammogram exams at UCHC from October 31, 2006 to August 23,
2021. The FFDMs were acquired on a Hologic machine. The data collection was approved
by the UCHC Institutional Review Board. With assistance from the Diagnostic Imaging
Informatics Department at UCHC, the DICOMs were exported from Picture Archiving and
Communication Systems (PACS) at UCHC. Additionally, patient identifiers were removed
and patched with a set naming convention. The mammograms in the data set were annotated
by radiologists.
The UCHC data set includes current year and prior year FFDMs of 289 patients (119
mass, 68 AD, 66 MCs and 36 normal patients), ranging from 28 to 95 years old. The
FFDMS of two breasts and two views for each breast (LCC, RCC, LMLO, and RMLO) from
a majority of patients are included in the data set (for a few patients not all two breasts
and two views FFDMS are available). In this collection (Table 3), 493 mammogram pairs
are labeled cancer (493 current cancer FFDMs paired with their corresponding prior normal
FFDMs), and 581 mammogram pairs are labeled normal (581 current normal FFDMs paired
with their corresponding prior normal FFDMs). The data labeling is shown in Figure 3.b.
The majority of patients, 83.4%, had the time between two visits fall within 1-3 years.
Accepted Article
The cancer cases were defined as labeled breast CC views and MLO views with biopsy
confirmed cancerous breast lesions. These cases had Breast Imaging Reporting Scores (BI-
RADS) of 4 or 5, indicating suspicious abnormality or highly suggestive of malignancy,
respectively, and required further confirmation with biopsy. Normal cases were defined as
labeled breast CC views and MLO views with no abnormalities found on the breast. These
cases had BI-RADS score of 1 or 2, indicating no malignancy and required no further action.
In order to increase the generalizability of the data set, we included a variety tumor
and breast density types. The mass type in the data set contains round, oval, architectural
distortion, irregular, and lobulated. The microcalcification type in the data set includes
amorphous, coarse, fine linear branching, pleomorphic, punctate and round with regular.
The data set contains all types of breast density including fatty breast, fibroglandular dense
breast, heterogeneously dense breast and extremely dense breast. The fibroglandular dense
breast type and heterogeneously dense breast type cover a large portion of the data set.
Note that our proposed model is for classifying cancer and normal images; therefore,
we used the labeled data at the image level, not the patient level. Examples of two cancer
paired images and two normal paired images are shown in Figure 4.
We mixed DDSM, CMMD and s2D BCS-DBT mammogram images to build a training data
set for pre-training the backbone model. We also applied data normalization to all the images
In the data pre-processing step for the UCHC data set, the annotations have been
removed when we converted DICOM files to images using pydicom package. The pixel
metal marks have been removed manually from the mammograms. In order to reduce the
unnecessary computational cost, we cut out the black background from the images after
removing annotations and metal marks. By examining all the mammogram images in the
UCHC data set (including RCC, RMLO, LCC, and LMLO views), we computed the widest
breast length, θ. All mammograms, Is, I ∈ RN ×M , where N is height, and M is width, are
Accepted Article
cropped such that the mammograms after cut, Icut s, have a dimension of N × M − θ + ,
Icut ∈ RN ×M −θ+ , where is a constant for margin (we used 20 in this study). Then, we
resized the mammograms to 1024×1024 by employing the bilinear interpolation. To increase
the size of the training data set, we used rotation (90◦ , 180◦ and 270◦ ) and CLAHE 48 filter
for data augmentation.
To avoid overfitting, we employed transfer learning to train the proposed and baseline models.
For transfer learning, we pre-trained the ResNet backbone networks and the VGG and
ResNet baseline models, as shown in Figure 3. In the first step, we used pre-trained ResNet
and VGG models, trained by the ImageNet data set, as initial models. Next, we pre-trained
the initial models using the combined training mammogram data set (DDSM, CMMD and
s2D) as explained in Section III. to fine-tune the initial models based on mammogram images.
The pre-trained networks, then, were used as the backbone models in the proposed and
baseline models and were trained using the UCHC data.
We used randomly selected 70% of the UCHC cancer patients (493 pairs of current cancer
and prior normal mammograms) and normal patients (581 pairs of current normal and prior
normal mammograms) for training the proposed models and other baseline models with twin
networks. The same selected current patients’ mammograms (70% of current cancer patients
and current normal patients) are used for training the ResNet and the VGG baseline models
that do not have twin network (shown in the Table 3).
To optimize our models and the baseline models, we examined different hyper-
parameters. To train all the models we used 25 epochs, and explored the starting learning
rate in a range from 1e−2 to 1e−5. We used cosine learning rate scheduler for optimize
the models learning rate. In our experiments we used input images of size 1024 × 1024.
Because of the limited computational resources for high resolution inputs, we used end to
end mini-batch stochastic gradient descent with batch size of 4 to optimize all the models
with parallel CNNs, and a batch size of 16 for the ResNet and VGG baseline models. The
hyper-parameters used to optimize the proposed models and the baseline models are shown
Accepted Article
in Table 4. We used learning curve of training and validation data to monitor the perfor-
mance in terms of overfitting. To prevent overfitting, we used dropout in the FC layers
(tested range from 0.2 − 0.6). The FC layers weights are initialized with normal distribution
(Xavier), and the bias parameter is set to 0. In addition, we used Lnorm1 regularizer (λ3
tested in a range from 1e−5 to 2) and Lnorm2 regularizer (λ2 tested in a range from 1e−4
to 2) in the FC layers. We used the gradient decent with adaptive momentum (ADAM)
optimizer to optimize the accuracy of all the models. We used Tesla V100 GPUs with 32
GB memory to train and test all the models.
We used the UCHC data set to test and evaluate the performance of the proposed models
and compared it with those of the baseline models. We split out the data set to training
(70%), validation (10%) and testing (20%) data sets (Note that data augmentation is only
applied to the training data set after splitting the data and the testing data set is only
used for testing the models). The 95% confidence interval (CI) of all evaluation metrics are
reported in this study.
We used Accuracy, Specificity, Sensitivity, Precision, F1 score, and Area Under the ROC
Curve (AUC) evaluation metrics; where Accuracy defines the percentage of correctly clas-
sified images, Specificity defines percentage of negative (normal) images classified correctly,
Sensitivity, which is also called Recall, defines the proportion of correctly predicted from
positive (cancer) classes, Precision measures the rate of positive (cancer) images that are
correctly classified, F1 measures accuracy which is the harmonic mean of recall (Sensitivity)
and Precision, and AUC measures the performance of a binary classification using a range
of values for thresholds for computing sensitivities and false positive rates (1 − Specif icity).
The equations for these metrics are shown as follows:
TP
F1 = 1 , (9)
TP + 2
(F P + F N)
TP + TN
Accuracy = , (10)
TP + FN + TN + FP
TP
Sensitivity = , (11)
TP + FN
Accepted Article
TP
P recision = , (12)
TP + FP
TN
Specif icity = , (13)
TN + FP
TN
Specif icity = , (14)
TN + FP
where TP denotes true positive, TN denotes true negative, FP denotes false positive,
and FN denotes false negative.
V. Results
The performance of all the models in terms of accuracy, specificity, sensitivity, precision, F1
score and AUC are given in Table 5.
As shown in Table 5, the results show that the proposed model FFS-CNN out performs
all other baseline models in terms of all the performance metrics. The accuracy of FFS-
CNN (0.92) and FFS-CNN-FC (0.91) are higher compared with accuracy of VGG (0.82)
and ResNet (0.86). The longitudinal LSTM and vanilla Siamese models showed comparable
accuracy (0.89 and 0.88) with each other, but lower than those of the proposed models.
The ROC for all the models is shown in Figure 5.a., where the average AUC of VGG is
0.86, the average AUC of ResNet is 0.90, the average AUC of the longitudinal LSTM model
is 0.93, the average AUC of the vanilla Siamese network is 0.92, the average AUC of FFS-
CNN-FC is 0.94, and the average AUC of FFS-CNN is 0.95. To test the significance of the
difference between the AUC of the proposed model, FFS-CNN, with that of the other models
we employed the McNeil & Hanley’s test 49 given as an online service (http://vassarstats.
net/roc_comp.html). The AUC improvement of FFS-CNN (0.95) is significant compared
with VGG (p = .01), and ResNet (p = .04), but is not significant compared with longitudinal
LSTM (p = .11), vanilla Siamese (p = .06) and FFS-CNN-FC (p = .33). The proposed
model, FFS-CNN, also performs better compared to the other models in terms of specificity
and precision with a specificity of 0.91 and a precision of 0.91. This shows that the proposed
Accepted Article
In terms of sensitivity, FFS-CNN shows the best performance with average sensitivity
of 0.93 which is 0.08 and 0.14 higher than the sensitivity of ResNet and VGG, respectively.
The vanilla Siamese and longitudinal LSTM models show the average sensitivity of 0.86 and
0.89, respectively.
As can be seen from Table 5, in terms of all the evaluation metrics, all the models
that employ history of images outperform the ResNet and VGG models that use only the
current images without considering previous year images. This indicates the importance
of employing history of images. The observation that FFS-CNN outperforms all the other
feature fusion models, including FFS-CNN-FC, indicates that the distance learning functions
can impact the performance of the model.
We examined the discriminative performance of the proposed models for non-mass and small
size tumors. As shown in Table 6, we computed false discover rate (FDR) and false negative
rate (FNR) in classification where abnormalities are mass, microcalcification, and AD for
all the models. The FDRs and FNRs of VGG, ResNet, longitudinal LSTM, and vanilla
Siamese in classifying mammograms with masses are comparable and are higher than those
of the FFS-CNN-FC and FFS-CNN models. As can be seen in Table 6, all the models have
higher error rates in classifying cancer when tumor shapes are non-masses. However, the
proposed models’ FDRs and FNRs are considerably lower compared to the other models.
For microcalcification tumors, VGG, ResNet, and longitudinal LSTM perform similarly and
Accepted Article
the vanilla Siamese network, FFS-CNN-FC, and FFS-CNN perform better in terms of FDR
and FNR. For AD cases, FFS-CNN-FC and FFS-CNN out perform all the baseline models in
terms of FDR and FNR. Results show that proposed FFS-CNN-FC and FFS-CNN improve
the detection rate of microcalcification and AD shaped tumors.
To evaluate the performance of the proposed model in detecting small tumors, we com-
puted the tumor area ratio, r, in mammograms as, r = (t/Ia ) · 100, which t is the tumor area
in pixel, and Ia is the image area in pixel. The tumor ratios in mammograms that are accu-
rately classified as cancer (blue bars) and tumor ratios of ground truth cancer mammograms
(red bars) for all the models are shown in Figure 6.
Results show that all the models, except FFS-CNN, mis-classified few mammograms
with larger tumors with r > 4. All models misclassified some mammograms with small
tumors, r < 0.5. However, VGG and ResNet missed small tumors more (number of misclas-
sified images > 10), and our proposed models missed small tumors less (number of misclas-
sified images < 8). The FFS-CNN-FC and FFS-CNN models show superior performance in
classifying mammograms with smaller tumors.
To study the effect of adding more FC layers, we built the FFS-CNN-FC models using two,
three, and four FC layers (performance shown in Table 7). Results show that FFS-CNN-FC
models perform slightly worse compared to FFS-CNN and adding more FC layers cannot
Last edited Date : V.B. Detection of non-mass and small size tumors
VI. Discussion
In this study, our proposed model, FFS-CNN, employs the Siamese network methodology,
which first extracts intra-image features of current and previous FFDMs and then extracts
inter-image features for classification. Our model’s success is due to two aspects: 1) using
prior mammogram screens as guidance to identify cancer based on not only current breast
features, but also prior breast features, and 2) employing a distance learning network to cap-
ture cancerous changes in the structure of breast tissues. To enhance the learning ability of
Accepted Article
the distance learning network, we proposed to employ both feature map pixel-wise distances
and the Euclidean distance between the extracted features from the current year and previ-
ous year images. To examine the effectiveness of the distance learning model, we proposed
a variant model (FFS-CNN-FC) that concatenates intra-image features and lets FC layers
learn the difference between the features, without explicitly imposing any distance metrics
to the features. Our experiments demonstrate the superior performance of our proposed
FFS-CNN model over conventional deep learning models and current deep learning models
that employ history of images.
Well-known conventional deep learning models such as the ResNet and VGG models
have a strong ability to learn FFDM intra-image features. However, as shown in Table 6,
the VGG and ResNet models show limited ability to identify AD shaped tumors. This
can be because the ResNet and VGG models are not able to effectively learn the complex
characteristics of AD for such a low sample size data set. The generalization performance
of conventional deep learning models heavily depends on the size of the training data set.
In other words, having an optimal generalizable classification model for FFDMs requires
training in many different shapes of tumors. However, collecting all possible tumor shapes
to train a model is not practical. Hence, using a small size data set to train those models
can increase the risk of overfitting and lead the model to ignore unseen tumor shapes.
VI. DISCUSSION
not to learn objects. As a result, one-shot learning model can be trained and perform well
with smaller sample size data sets 26,27 . As shown in Table 5, the Siamese-based models show
better results in sensitivity, indicating their strong ability to identify cancer cases, even when
trained with smaller data sets. A demo of tumors identified by the Siamese-based proposed
model, but missed by ResNet and VGG is illustrated in Figure 7. As the figure shows, the
proposed Siamese-based model is able to identify the AD shaped tumor, which is hard to
distinguish from breast tissue, but was not in the prior mammogram.
The proposed model outperforms the vanilla Siamese model, which also compares cur-
rent and previous FFDMs. In the vanilla Siamese network, the similarity between previous
year features and current year features is learned using the Euclidean distance. The Eu-
Accepted Article
clidean distance is the most common distance which represents the overall dissimilarity and
has a better stability property than other distance functions. However, its effectiveness is
limited when the feature dimension increases, and the dissimilarity details are important. As
shown in Figure 6, the vanilla Siamese model failed to estimate the dissimilarity of even a few
large size tumors. To capture discriminative dissimilarity features of high resolution complex
images, nonlinear combination of differences between pixels of feature vectors can be more
effective than overall distance between the feature vectors. Therefore, to have overall and
detailed dissimilarity features, we concatenated the Euclidean distance with the pixel-wise
distance and applied an FC network as our distance learning network. It is reflected from
Figure 6 that FFS-CNN is able to identify the tumors missed by the vanilla Siamese network.
The LSTM-based model does not perform as well as the proposed model too. The
LSTM-based model is often beneficial to learn time lagged features in time series data. Its
learning mechanism is to predict the likelihood of features from current data based on the
prior data rather than capturing differences between the data. As a result, its performance
in comparing previous and current mammograms is not as good as the proposed model,
especially for more challenging shapes of tumors. As shown in Table 6, the LSTM-based
model has the lowest ability to identify AD tumors compared with the other twin models.
The proposed model has limitations in the structure of the backbone network due to
the computational complexity. We used smaller batch sizes and input size of 1024 by 1024
to address these limitations. In this study, we built a classifier for cancer and normal cases.
We are continuing this work on classifying benign cases, and also classifying different types
of tumors. We will work on developing models to segment tumors using current year and
prior year images as our future work as well.
VII. Conclusion
We built a shared weight twin networks model — Siamese model— that learns intra-
image feature representation of a pair of current year and previous year screened images, and
predicts the similarity of breast tissues using a distance learning network to extract inter-
image features of the paired images. Moreover, because of the nature of Siamese networks
— domain specified feature representation distance prediction and one-shot learning— it
can be applied to small sample size data sets and perform better than famous models. In
this study, we showed that employing a Siamese based model with a novel distance learning
network to compare previous and current year mammogram images can improve classifying
mammogram images.
Acknowledgment
This work is supported by a grant from the University of Connecticut Research Excellence
Program, PIs: Nabavi and Yang; and Jun Bai’s Cigna Graduate Fellowship from University
of Connecticut.
VII. CONCLUSION
Data availability
The public DDSM data set analysed in this study is available at http://www.
eng.usf.edu/cvprg/mammography/database.html. The public CMMD data set anal-
ysed in this study is available at https://wiki.cancerimagingarchive.net/pages/
viewpage.action?pageId=70230508. The public BCS-DBT data set analysed in
Accepted Article
References
1
R. L. Siegel, K. D. Miller, and A. Jemal, Cancer statistics, 2020, 70, 7–30 (2020),
eprint: https://acsjournals.onlinelibrary.wiley.com/doi/pdf/10.3322/caac.21590.
2
A. C. Society, How Common Is Breast Cancer? Breast Cancer Statistics, 2020.
3
C. Coleman, Early Detection and Screening for Breast Cancer, Seminars in Oncology
Nursing 33, 141–155 (2017-05-01).
4
L. Tabár, B. Vitak, H. H. Chen, M. F. Yen, S. W. Duffy, and R. A. Smith, Beyond
randomized controlled trials: organized mammographic screening substantially reduces
breast carcinoma mortality, Cancer 91, 1724–1731 (2001-05-01).
5
S. W. Duffy, L. Tabar, B. Vitak, N. E. Day, R. A. Smith, H. H. T. Chen, and M. F. A.
Yen, The relative contributions of screen-detected in situ and invasive breast carcinomas
in reducing mortality from the disease, European Journal of Cancer (Oxford, England:
1990) 39, 1755–1760 (2003-08).
6
R. A. Smith, D. Saslow, K. A. Sawyer, W. Burke, M. E. Costanza, W. P. Evans,
R. S. Foster, E. Hendrick, H. J. Eyre, and S. Sener, American Cancer Society
Guidelines for Breast Cancer Screening: Update 2003, 53, 141–169 (2003), eprint:
https://acsjournals.onlinelibrary.wiley.com/doi/pdf/10.3322/canjclin.53.3.141.
7
I. G. Murphy, M. F. Dillon, A. O. Doherty, E. W. McDermott, G. Kelly, N. O’higgins,
and A. D. Hill, Analysis of patients with false negative mammography and symptomatic
breast carcinoma, Journal of surgical oncology 96, 457–463 (2007).
8
P. T. Huynh, A. M. Jarolimek, and S. Daye, The false-negative mammogram, Radio-
graphics: A Review Publication of the Radiological Society of North America, Inc 18,
1137–1154; quiz 1243–1244 (1998-10).
9
M.-S. Ong and K. D. Mandl, National Expenditure For False-Positive Mammograms
Accepted Article
And Breast Cancer Overdiagnoses Estimated At $4 Billion A Year, Health Affairs 34,
576–583 (2015-04-01), Publisher: Health Affairs.
10
H. D. Nelson, M. Pappas, A. Cantor, J. Griffin, M. Daeges, and L. Humphrey, Harms
of Breast Cancer Screening: Systematic Review to Update the 2009 U.S. Preventive
Services Task Force Recommendation, Annals of Internal Medicine 164, 256–267 (2016-
02-16).
11
P. Teare, M. Fishman, O. Benzaquen, E. Toledano, and E. Elnekave, Malignancy Detec-
tion on Mammography Using Dual Deep Convolutional Neural Networks and Genetically
Discovered False Color Input Enhancement, 30, 499 (2017-08), Publisher: Springer.
12
W. Hang, Z. Liu, and A. Hannun, GlimpseNet: Attentional Methods for Full-Image
Mammogram Diagnosis, page 9 (2017).
13
W. Lotter, G. Sorensen, and D. Cox, A multi-scale CNN and curriculum learning
strategy for mammogram classification, in Deep Learning in Medical Image Analysis
and Multimodal Learning for Clinical Decision Support, pages 169–177, Springer, 2017.
14
D. Ribli, A. Horváth, Z. Unger, P. Pollner, and I. Csabai, Detecting and classifying
lesions in mammograms with Deep Learning, 8, 1–7 (2018-03-15), Number: 1 Publisher:
Nature Publishing Group.
15
B. Sahiner, H.-P. Chan, N. Petrick, D. Wei, M. Helvie, D. Adler, and M. Goodsitt,
Classification of mass and normal breast tissue: a convolution neural network classifier
with spatial domain and texture images, 15, 598–610 (1996-10), Conference Name:
IEEE Transactions on Medical Imaging.
16
S.-C. B. Lo, H. Li, Y. Wang, L. Kinnard, and M. T. Freedman, A multiple circular
path convolution neural network system for detection of mammographic masses, IEEE
transactions on medical imaging 21, 150–158 (2002).
17
T. Kooi, A. Gubern-Merida, J.-J. Mordang, R. Mann, R. Pijnappel, K. Schuur, A. den
Heeten, and N. Karssemeijer, A comparison between a deep convolutional neural network
and radiologists for classifying regions of interest in mammography, in International
Workshop on Breast Imaging, pages 51–56, Springer, 2016.
Accepted Article
18
T. Kooi, B. van Ginneken, N. Karssemeijer, and A. den Heeten, Discriminating solitary
cysts from soft tissue lesions in mammography using a pretrained deep convolutional
neural network, Medical physics 44, 1017–1027 (2017).
19
Q. Abbas, DeepCAD: A Computer-Aided Diagnosis System for Mammographic Masses
Using Deep Invariant Features, 5, 28 (2016-12), Number: 4 Publisher: Multidisciplinary
Digital Publishing Institute.
20
B. Q. Huynh, H. Li, and M. L. Giger, Digital mammographic tumor classification using
transfer learning from deep convolutional neural networks, 3 (2016-07).
21
N. Wu et al., Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer
Screening, IEEE transactions on medical imaging 39, 1184–1194 (2020-04).
22
J. H. Hayward, K. M. Ray, D. J. Wisner, J. Kornak, W. Lin, E. A. Sickles, and B. N.
Joe, Improving Screening Mammography Outcomes through Comparison with Multiple
Prior Mammograms, 207, 918 (2016-10), Publisher: NIH Public Access.
23
S. Perek, L. Ness, M. Amit, E. Barkan, and G. Amit, Learning from longitudinal
mammography studies, in International Conference on Medical Image Computing and
Computer-Assisted Intervention, pages 712–720, Springer, 2019.
24
T. Kooi and N. Karssemeijer, Classifying symmetrical differences and temporal change
for the detection of malignant masses in mammography using deep neural networks, 4,
044501 (2017-10).
25
S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. Van Ginneken, A. Madab-
hushi, J. L. Prince, D. Rueckert, and R. M. Summers, A Review of Deep Learning in
Medical Imaging: Imaging Traits, Technology Trends, Case Studies With Progress High-
lights, and Future Promises, 109, 820–838 (2021-05), Conference Name: Proceedings of
the IEEE.
26
M. Heidari and K. Fouladi-Ghaleh, Using Siamese Networks with Transfer Learning
for Face Recognition on Small-Samples Datasets, in 2020 International Conference on
Machine Vision and Image Processing (MVIP), pages 1–4, IEEE, 2020.
27
M. Dunnhofer, M. Antico, F. Sasazawa, Y. Takeda, S. Camps, N. Martinel, C. Micheloni,
Accepted Article
28
M. D. Li, N. T. Arun, M. Gidwani, K. Chang, F. Deng, B. P. Little, D. P. Mendoza,
M. Lang, S. I. Lee, A. O’Shea, A. Parakh, P. Singh, and J. Kalpathy-Cramer, Au-
tomated Assessment and Tracking of COVID-19 Pulmonary Disease Severity on Chest
Radiographs Using Convolutional Siamese Neural Networks, 2, e200079 (2020-07-01),
Publisher: Radiological Society of North America.
29
M. Shorfuzzaman and M. S. Hossain, MetaCOVID: A Siamese neural network framework
with contrastive loss for n-shot diagnosis of COVID-19 patients, 113, 107700 (2021-05).
30
Z. He, Y. Wang, X. Qin, R. Yin, Y. Qiu, K. He, and Z. Zhu, Clas-
sification of neurofibromatosis-related dystrophic or nondystrophic scoliosis based
on image features using Bilateral CNN, 48, 1571–1583 (2021), eprint:
https://aapm.onlinelibrary.wiley.com/doi/pdf/10.1002/mp.14719.
31
Z. Ding, D. Zhou, H. Li, R. Hou, and Y. Liu, Siamese networks and multi-scale local
extrema scheme for multimodal brain medical image fusion, 68, 102697 (2021-07-01).
32
Y.-A. Chung and W.-H. Weng, Learning deep representations of medical images us-
ing siamese CNNs with application to content-based image retrieval, arXiv preprint
arXiv:1711.08490 (2017).
33
Y. Fu, P. Xue, H. Ji, W. Cui, and E. Dong, Deep model with Siamese network for viable
and necrotic tumor regions assessment in osteosarcoma, 47, 4895–4905 (2020), eprint:
https://aapm.onlinelibrary.wiley.com/doi/pdf/10.1002/mp.14397.
34
A. Mahajan, J. Dormer, Q. Li, D. Chen, Z. Zhang, and B. Fei, Siamese neural networks
for the classification of high-dimensional radiomic features, 11314, 113143Q (2020-02).
35
M. D. Li, K. Chang, B. Bearce, C. Y. Chang, A. J. Huang, J. P. Campbell,
J. M. Brown, P. Singh, K. V. Hoebel, D. Erdoğmuş, S. Ioannidis, W. E. Palmer,
M. F. Chiang, and J. Kalpathy-Cramer, Siamese neural networks for continu-
ous disease severity evaluation and change detection in medical imaging, 3, 1–
9 (2020-03-26), Bandiera abtest: a Cc license type: cc by Cg type: Nature Re-
Accepted Article
36
Y. Yan, P.-H. Conze, M. Lamard, G. Quellec, B. Cochener, and G. Coatrieux, Towards
improved breast mass detection using dual-view mammogram matching, 71, 102083
(2021-07-01).
37
Y. Liu, F. Zhang, Q. Zhang, S. Wang, Y. Wang, and Y. Yu, Cross-view correspondence
reasoning based on bipartite graph convolutional network for mammogram mass detec-
tion, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 3812–3822, 2020.
38
G. Koch et al., Siamese neural networks for one-shot image recognition, in ICML deep
learning workshop, volume 2, Lille, 2015.
39
K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-
level performance on imagenet classification, in Proceedings of the IEEE international
conference on computer vision, pages 1026–1034, 2015.
40
K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale
Image Recognition, arXiv preprint arXiv:1409.1556 (2014), arXiv: 1409.1556.
41
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation 9,
1735–1780 (1997).
42
K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips,
D. Maffitt, M. Pringle, L. Tarbox, and F. Prior, The Cancer Imaging Archive (TCIA):
Maintaining and Operating a Public Information Repository, 26, 1045–1057.
43
R. Sawyer-Lee, F. Gimenez, A. Hoogi, and D. Rubin, Curated Breast Imaging Subset
of DDSM, Type: dataset.
44
C. Cui, L. Li, H. Cai, Z. Fan, L. Zhang, T. Dan, J. Li, and J. Wang, The Chinese Mam-
mography Database (CMMD): An online mammography database with biopsy confirmed
types for machine diagnosis of breast, Type: dataset.
45
M. Buda, A. Saha, R. Walsh, S. Ghate, N. Li, A. Swiecicki, J. Y. Lo, J. Yang, and
Accepted Article
46
J. Wei, H.-P. Chan, M. A. Helvie, M. A. Roubidoux, C. H. Neal, Y. Lu, L. M. Hadjiiski,
and C. Zhou, Synthesizing Mammogram from Digital Breast Tomosynthesis, 64, 045011
(2019-02-11).
47
A. Smith, Synthesized 2D Mammographic Imaging, Hologic, Inc., Marlborough, Mas-
sachusetts, U.S., 2016.
48
S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter
Haar Romeny, J. B. Zimmerman, and K. Zuiderveld, Adaptive histogram equalization
and its variations, 39, 355–368 (1987-09-01).
49
J. A. Hanley and B. J. McNeil, The meaning and use of the area under a receiver
operating characteristic (ROC) curve., Radiology 143, 29–36 (1982).
Legends
Figure 1: FFS-CNN structure. a. and b. The structure employs two parallel CNNs with
shared weights for domain-specific feature representation learning. The shared weights net-
works employ ResNet as backbone 39 . c. The feature representation fC and fP from a pair
of a current year image and a previous year image will feed to d1 (·) and d2 (·) functions to
build the distance features for the distance learning network. The plus-circle sign denotes
concatenation of distance feature vectors. The output of the distance learning is the simi-
larity probability where “similar” denotes normal and “dissimilar” denotes cancer.
Figure 2: Schematic diagrams of the baseline models. a.1. Overall view of the ResNet50
model. a.2. Structure of the ResNet model’s building blocks i for block 2 to 5. Each
block consists of a convolutional block and identity block, where convolutional and identity
block contains three convolutional layers (kernel size: 1 × 1,3 × 3, and 1 × 1) with batch
Accepted Article
normalization layer and ReLu activation function. Convolutional blocks also includes a 1 × 1
convolutional layer and batch normalization layer at the short cut path. Block 2 to block 5
have 2,3,5,2 Identity blocks (denoted by ×N ), respectively. b. VGG model. The first and
second building blocks contain two convolutional layers and the third to fifth building blocks
contain three convolutional layers (kernel size: 3 × 3). c. Longitudinal LSTM model. d.
Vanilla Siamese model. e. FFS-CNN-FC model. f. FFS-CNN model. The detailed structure
of the FFS-CNN model is showing in Figure 1.
Figure 3: The workflow for training the model. a. First, the ImageNet pre-trained backbone
model (ResNet or VGG) is used as an initial model (weights). Next, DDSM, CMMD and
BCS-DBT s2D mammograms are used to pre-train the backbone model. Then, the pre-
trained model’s weights are transferred to the proposed Siamese network (shared weight
twin model). Finally, the pretrained backbone model is fine tuned using UCHC pairs of
current and previous mammograms to generate the final model. b. Data collection process
for UCHC current year mammograms with their corresponding history mammograms.
Figure 4: Examples of pairs of current year and previous year mammogram images. Top
row shows current year FFDM images, and bottom row shows previous year FFDM images.
a. Two examples of cancer pair input. Cancer tumors are indicated by yellow circles on the
current year mammograms. b. Two examples of normal pair input.
Figure 5: AUC and Precision Recall (PR) plots of the proposed models and baseline models.
Accepted Article
Figure 6: Cancer prediction of all the models for different size of tumors (tumor ratios). x-
axis is the ratio of tumor area in mammogram images. y-axis is the number of mammogram
images. Red color indicates ground truth, and blue color indicates the model prediction.
Figure 7: Cancer case illustration where cancer is detected by the proposed model but not
by the ResNet model. In each case, the top row contains a current year image with enlarged
abnormal tissue part that indicated by a white square, and the bottom row contains previous
year image with the enlarged tissue part from the same location (abnormal tissue part of
the current year image).
Figure S1: BCS-DBT s2D demo. The top two rows are normal s2D images. The third and
fourth rows are cancer s2D images.
History exam
Normal cases
Cancer cases
Resolution
Modality
Location
Data set
Link
DDSM USA 2D 3000×4800 7 - 2,055 http://www.eng.usf.edu/cvprg/
mammography/database.html
CMMD China 2D 1914×2294 7 - 2,632 https://wiki.
cancerimagingarchive.net/
pages/viewpage.action?pageId=
70230508
BCS- USA 3D 1890×2457 7 8,528 75 https://wiki.
Accepted Article
DBT cancerimagingarchive.net/
pages/viewpage.action?pageId=
64685580
UCHC USA 2D 5928×4728 X 581 493 -
to (pairs) (pairs)
2294×1914
Supplementary Materials
We used the combination of the Hologic s2D algorithm 47 and the method proposed by
Wei et al. 46 to synthesize 2D mammograms using DBT z-stacks. For a normal DBT z-
stack, we evenly divided the z-stack into four partitions. Each partition contains the same
number of slices (for odd numbers of z-stack, the last partition contains one more slice).
i+1
In each partition, the pSM (X ∈ Rn ) = U n = f (X (n−1) , X n ), where U i+1 = f (X (i),X ),
i = 0, ..., n − 1, X (0) = f (λGussianBlur(X 0 ) + X 1 ), and λ is a parameter for weight for
Accepted Article
summation. The final s2D is a weighted sum of pSM s. For a cancer DBT z-stack, we first
identified the position (slice number) of tumor center slice (tumor center slice number is
provided by the data). The pSM weight of the tumor center slice was increased at the final
weighted sum computation step. Examples of generated s2D mammograms are shown in
Figure S1.