0% found this document useful (0 votes)
12 views15 pages

Q1 A Transfer Learning Approach To Breast Cancer Classification in A Federated Learning Framework

This document presents a study on a novel federated learning (FL) approach to breast cancer classification using transfer learning techniques. The authors developed a decentralized framework that enhances data privacy while improving classification performance through the use of synthetic minority oversampling and deep learning models. Experimental results demonstrate that their method significantly outperforms traditional centralized learning approaches in terms of accuracy and recall for breast cancer detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views15 pages

Q1 A Transfer Learning Approach To Breast Cancer Classification in A Federated Learning Framework

This document presents a study on a novel federated learning (FL) approach to breast cancer classification using transfer learning techniques. The authors developed a decentralized framework that enhances data privacy while improving classification performance through the use of synthetic minority oversampling and deep learning models. Experimental results demonstrate that their method significantly outperforms traditional centralized learning approaches in terms of accuracy and recall for breast cancer detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Received 20 February 2023, accepted 7 March 2023, date of publication 15 March 2023, date of current version 22 March 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3257562

A Transfer Learning Approach to Breast Cancer


Classification in a Federated Learning Framework
Y. NGUYEN TAN1 , VO PHUC TINH1 , PHAM DUC LAM2 , NGUYEN HOANG NAM3 ,
AND TRAN ANH KHOA 3
1 Faculty
of Electrical and Electronics Engineering, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam
2 Faculty
of Engineering and Technology, Nguyen Tat Thanh University, Ho Chi Minh City 700000, Vietnam
3 ModelingEvolutionary Algorithms Simulation and Artificial Intelligence, Faculty of Electrical and Electronics Engineering, Ton Duc Thang University,
Ho Chi Minh City 700000, Vietnam
Corresponding author: Tran Anh Khoa ([email protected])
This work was supported by Funding no 37/qd-khcn/2023 of Nghe An Department of Science and Technology.

ABSTRACT Artificial intelligence (AI) technologies have seen strong development. Many applications
now use AI to diagnose breast cancer. However, most new research has only been conducted in centralized
learning (CL) environments, which entails the risk of privacy breaches. Moreover, the accurate identification
and localization of lesions and tumor prediction using AI technologies is expected to increase patients’
likelihood of survival. To address these difficulties, we developed a federated learning (FL) facility that
extracts features from participating environments rather than a CL facility. This study’s novel contributions
include (i) the application of transfer learning to extract data features from the region of interest (ROI) in
an image, which aims to enable careful pre-processing and data enhancement for data training purposes;
(ii) the use of synthetic minority oversampling technique (SMOTE) to process data, which aims to more
uniformly classify data and improve diagnostic prediction performance for diseases; (iii) the application
of FeAvg-CNN + MobileNet in an FL framework to ensure customer privacy and personal security; and
(iv) the presentation of experimental results from different deep learning, transfer learning and FL models
with balanced and imbalanced mammography datasets, which demonstrate that our solution leads to much
higher classification performance than other approaches and is viable for use in AI healthcare applications.

INDEX TERMS Artificial intelligence, synthetic minority oversampling, federated learning, transfer
learning, breast cancer.

I. INTRODUCTION access to treatment have become increasingly prevalent issues


According to statistics published by the International Agency that require more attention and follow-up. Breast cancer is a
for Research on Cancer in December 2020, breast cancer malignant tumor of the breast. A tumor can be benign (non-
has overtaken lung cancer as the most diagnosed cancer cancerous) or malignant (cancerous). Most breast cancers
worldwide [1]. Over the past two decades, the total number begin in the milk ducts, with a small percentage of cases
of people diagnosed with cancer has nearly doubled, from developing in the milk sacs or lobules. If detected and treated
an estimated 10 million in 2000 to 19.3 million in 2020. late, breast cancer may metastasize to the bones and other
Today, one in five people worldwide will develop cancer organs and the pain will multiply.
in their lifetime. It is estimated that the number of people Therefore, early detection of breast cancer is critical for
diagnosed with cancer will further increase in the future: treating and saving patients. When the disease is in its early
nearly 50% by 2040 compared to 2020. The number of people stages, its manifestations may not be accurate and precise;
who die from cancer has also increased, from 6.2 million as a result, many abnormalities may be overlooked [2]. Cur-
in 2000 to 10 million in 2020. Late diagnosis and a lack of rently, many studies apply machine learning to improve early
detection, reduce the risk of death, and prolong the patient’s
The associate editor coordinating the review of this manuscript and life. However, sharing patient data is not widely considered
approving it for publication was Turgay Celik . at present due to privacy, technical, and legal issues. Security
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
27462 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

and privacy techniques enable stricter protection of patient an international group of hospitals and medical imag-
data and the use of data for research and routine clinical ing centers have joined this collaborative effort to train
purposes [3], [4]. the model in a completely decentralized manner, with no
A study on breast mass classification from mammograms data sharing between hospitals. This places higher require-
using convolutional neural networks (CNN) was published ments on the robustness of algorithms and the selection of
in 2016 [5], where the authors gave results with recall for hyperparameters.
identifying lesions estimated to be between 0.75 and 0.92, The fourth challenge concerned the distributed learning
which means that up to 25% of abnormalities could remain ability of the FL models [14], [15] employed. Many dis-
undetected. Therefore, the ability to automatically detect tributed learning models are used in FL for different appli-
lesions and predict their likelihood of malignancy would be cations. However, most studies focus on hypothetical data,
valuable for doctors and could dramatically improve survival and each model is only suitable for one dataset, which makes
rates. Thus, we developed an FL base to extract features it difficult for researchers with practical applications as in
from multiple participating environments rather than a cen- breast cancer. To evaluate the effectiveness of these models,
tralized learning environment. To investigate the real-world we tested the evaluation by other methods for comparison.
performance of FL, we conducted a study for the applied The contributions of this paper are as follows:
development of numerous breast cancer classification models • Design of an FL framework for breast cancer classifica-
using mammography data. An international group of hos- tion that includes a global server, which acts as a weight
pitals and medical imaging centers joined this collaborative aggregator and mobile replacement edge clients in tissue
effort to train models in a completely decentralized fash- training deep learning (DL). This solution is useful for
ion, without any data sharing between hospitals. This placed AI healthcare applications and can be widely deployed
higher requirements on the robustness of algorithms and the in different hospitals or clinics.
selection of hyperparameters. In our study, we believed that • Pioneering use of a transfer learning pre-training dataset
the analysis of recall performance was more important than in FL for breast cancer classification. Various models in
accuracy as false negatives can be life-threatening and false transfer learning were selected for performance evalu-
positives are likely to be viewed by humans in diagnosing ation, including k-nearest neighbors (kNN), AdaBoost,
breast cancer, and that is the main objective of this study. and eXtreme Gradient Boosting (XGB). First, the
Recently, FL has become a novel research trend in AI image’s features are extracted using the Convolutional
applications. It aims to train a machine learning (ML) algo- Neural Network (ConvNet) of the pre-trained model,
rithm across multiple decentralized nodes while holding the and a linear classifier is used to classify the images.
data samples (i.e., without locally exchanging them) [6]. Next, we used data equalization techniques such as
Training such a decentralized model in an FL setup presented SMOTE and data augmentation in combination with
four main challenges: (i) system and data heterogeneity, ImageNet to enrich and further optimize the training
(ii) pre-trained data processing, (iii) data protection and pri- data.
vacy, and (iv) efficiency selection of distributed ML algo- • With both balanced and imbalanced methods, experi-
rithms. We addressed these challenges for breast cancer mental results from the Digital Database for Screening
classification in the context of FL. Mammography (DDSM) dataset demonstrate that our
The first challenge was system and data heterogeneity. solution’s FeAvg-CNN + MobileNet is much better for
Different system vendors produce images with considerably centralized learning, which is more than 5% recall [5] in
different intensity profiles for the same imaging modality. improved performance. Moreover, the accuracy of our
To address this diversity, many recent studies have found research results reached nearly 98%; by comparison, the
that a data-balancing solution such as the unsupervised maximum results were only 88.67% for the two-class
domain adaptation method forces the model to learn solution cases (calcifications and masses) and 94.92% (benign
domain-agnostic features through adversarial learning [7] or mass vs. malignant mass and benign calcification vs.
a specific type of batch normalization [8]. However, more malignant calcification) in the study [6].
straightforward methods were used in the current study to
address this challenge; we present a solution more efficiently The rest of this study is organized as follows. Section II
balances data. discusses related works; Section III describes the background
The second challenge is imperative to process the data study, and proposes together with some challenges and under-
before training because of its heterogeneity. There are many lying ideas. Section IV presents an experimental evaluation of
data processing methods [9], [10]. We chose transfer learning the deployed components. Finally, Section V discusses, and
due to its many benefits, such as saving training time, better Section VI concludes the future work.
neural network performance (in most cases), and the fact that
large amounts of data are not needed [11]. II. RELATED WORKS
To address the third challenge, data protection and pri- A. DEEP LEARNING
vacy [12], [13], many studies have incorporated more Today, breast cancer research mainly focuses on detecting and
security and privacy solutions. Our solution assumes that diagnosing breast tumors using deep learning algorithms [3],

VOLUME 11, 2023 27463


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

the experiments, six metrics were used and demonstrated that


the transfer learning of the VGG16 model was powerful in
classifying mammogram images with accuracy, sensitivity,
specificity, and so on for breast cancer diagnosis. One study
used a breast cancer–detection system that included principal
component analysis (PCA), a multilayer perceptron (MLP),
transfer learning, and a support vector machine (SVM) [17].
FIGURE 1. Architecture of the proposed approach federated learning The authors proposed a new processing method for pre-
settings.
dicting breast cancer based on nine individual attributes
and four basic machine learning methods; the final accu-
racy of the results was 86.97% on breast cancer coimbra
dataset (BCCD). Research becomes complex when too many
approaches are combined, and the results remain scarce.
Another study used CNN and US-ELM for feature extrac-
tion and clustering [3] and a mammogram segmented into
several sub-regions. Then, CNN was used to extract features
based on each sub-region, and unsupervised extreme learning
machine (US-ELM) was employed to cluster features of sub-
regions, which eventually located the region of the breast
tumor. Next, the authors designed a CNN network with 20 in-
FIGURE 2. Flowchart of data processing for training and evaluation. depth features and other features to determine tumor density.
However, the mammogram dataset only included approxi-
mately 400 women and yielded moderate accuracy.
By using DL to support the AdaBoost algorithm, paper [4]
introduced an advanced technique for identifying and diag-
nosing breast cancer regions. Moreover, the study used the
CNN network and LSTM algorithms to identify the character-
istics of tumors for diagnostic tasks. The results demonstrated
that the use of magnetic resonance imaging (MRI), ultra-
sound (US), digital breast tomosynthesis, and mammography
yielded an accuracy that was too high: up to 97.2%. The
previous section introduced challenges of data imbalance in
breast cancer detection. A research article in [18] used a
transfer learning solution to solve this issue. The primary
model for breast cancer image classification is VGG-19. The
results demonstrated that the accuracy was approximately
FIGURE 3. Diagram of DDSM data processing and conversion to images 90%. The paper [19] introduced a framework for automat-
by ROI extraction. ically evaluating areas of doubt detected in mammography
screening without additional tests, especially in unnecessary
biopsies, the suspected site is a benign tumor. It mainly
[4], [16], [17], [18], [19]. However, most studies still focus on focused on the identification of segmented regions of interest
data processing, tumor prediction, and diagnosis using dis- (ROIs) using a modified K-means algorithm. Next, a two-
tributed learning models on a server. In such an environment, way experimental mode (BEMD) analysis algorithm was
patient data and information must be shared for centralized applied to extract multiple layers from the ROI. The results
processing on the server, which negatively affects privacy. demonstrate that accuracy reached 98.6% when the digital
Moreover, centralized processing entails more work when not mammography dataset was used.
all parties have access to a highly configured server.
A study [16] proposed a novel and efficient DL model
based on transfer learning to automatically detect and diag-
nose breast cancer. The specific of this study is to use the B. FEDERATED LEARNING
knowledge gained while solving one problem in another Recently, FL was raised from the need to share sensitive data
relevant problem. Furthermore, in the proposed model, the between service providers in various fields, such as compe-
features are extracted from a mammographic image anal- tent healthcare and smart cities [20], [21], [22], [23], [24]. The
ysis dataset (MIAS) using a pre-trained CNN such as results of some FL studies have been confirmed and applied to
InceptionV3, RestNet50, Visual Geometry Group Networks medical imaging, such as brain tumor segmentation, predic-
(VGG)-10, VGG-16, and Inception-V2 Restnet. To evaluate tion of disease incidence, patients’ responses to treatment and

27464 VOLUME 11, 2023


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

other healthcare services, and late classification [25], [26], we summarize and compare our proposed method with the
[27], [28], [29], [30]. existing literature in Table 1.
Regarding breast imaging, only two papers [31], [32] have
evaluated breast density classification. The authors employed III. THE PROPOSED METHOD
a client server-based FL method with federated averag- This section highlights our proposed approach in the
ing (FedAvg) [7], which combines local stochastic gradi- FL framework. First, its overall structure is presented in
ent descent (SGD) on each site with a server that performs Sub-section III-A. Next, Sub-section III-B describes he use
model averaging. However, [31] significantly downsampled of transfer learning for data feature export. Next, in sub-
the input mammograms. Although low resolutions are accept- section III-C, we introduce the two mammography datasets
able for density classification, the loss of detail negatively used in this study and how they were processed to improve
affects malignancy classification. Moreover, this study did classification quality. The FedAvg algorithm is introduced
not apply any domain adaptation techniques to compensate in Sub-section III-D, which we summarize in pseudocode
for the domain shift of different pixel intensity distributions. to explain the implementation of the FL framework. Next,
The authors [31] opted for a different approach by work- section IV, we evaluate the results and provide explanations.
ing on high-resolution mammograms with federated domain
adversarial learning [32]. In addition, they [32] applied cur- A. AN OVERALL ARCHITECTURE OF THE
riculum learning in FL to boost classification performance PROPOSED METHOD
while improving domain alignment and explicitly handling The current sub-section III-A presents a complete system
domain shift with federated adversarial domain adaptation. model, including an overview diagram of how DL models
The paper employs three datasets of Full Field Digital Mam- work in FL and simulation designs. Fig. 1 depicts how the
mography (FFDM), and the experimental results shown that general behavior of the federated model was tested. The
the proposed memory-aware curriculum method is beneficial model structure includes a global server that acts as a weight
to further improve classification performance. aggregator and edge stations that replace mobile devices in
Based on the FL framework combined with CNN, the paper training the deep learning model. The FL process occurs
in [33] used CNN’s federated prediction model is based on in three stages: stage (1) priming the initial model in the
improvements in general modeling and simulation conditions first round of FL or updating the new model after aggre-
on five types of cancer, the accuracy of cancer data reaches gating weights after the N th round of learning, stage (2)
more than 90%, the accuracy is better than the tree model sin- local training with terminal data at the edge stations, and
gle model machines and linear models and neural networks. stage (3) aggregating the weights to the server and updating
However, this study still lacked comparisons with different the global model. Taking advantage of transfer learning in
models rather than only MLP and did not address the issue of the local environment of edge stations, here are hospitals
data imbalance and treatment. that connect locally to machine learning-approaching tech-
In 2022, there is a growing trend toward using FL to nology devices using models that optimize performance, traf-
predict breast cancer. Another study [34] used the Breast fic conditions, etc. Homogeneous communication does not
Cancer Histopathological Image Classification dataset (BHI) impose data labels on devices for prediction. However, it only
for detection. The authors first used residual neural networks uses personalized data features to learn, limiting transmission
for automatic feature extraction, then employed the network weights, limiting computation on edges, helping local train-
of Gabor kernels to extract another set of features from the ing at the edge take place rapidly in reinforcement learning,
dataset. They extracted two sets of features and passed the back-distribution, cross-linking increasing data on the edge
output through a custom classifier. The results showed that device sent to the edge increases over time.
this method achieved more than 80% accuracy. The objective of the current study was to test the FL
Unlike previous authors [31], [32], [33], [34], who pro- models’ distributed learning ability. Thus, the server and edge
posed a FL framework for breast density classification based devices were simulated by initializing a similar DL model in
on deep learning models, we targeted the more complex both the global and local phases. Therefore, the weight update
task of breast cancer prediction based on the mammography between the host and the edge device was also directly be
dataset. The innovation of the current research is that we performed and ignored transmission time in the network.
focused on accurately processing data using transfer learn- At the beginning of the FL process, the starting server
ing. Because the data was robust, the classification results initiated a DL model. It sent the newly initialized set of
were expected to improve the treatment rate for patients. weights to participating stations to create the first round
Many different models evaluated experimental results, and of FL (cloud server). The local model updated this set of
the most suitable model for breast cancer was selected based weights and began the training process. At the input of the
on the DDSM dataset. In addition to performing the results local model, data is the data that is optimally learned from
in terms of accuracy, recall, and higher F1-score compared pre-trained modeling with IMAGENET [35] on incoming
to previous methods [5], we also analyzed diagnostic images edge devices (phones, computers, doctors, medical devices,
based on histograms to enable doctors and medical staff to etc.) (client feature extraction). Due to the limitation in sim-
have a clearer view based on the displayed images. Finally, ulation, local learning occurred by training each edge with

VOLUME 11, 2023 27465


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

TABLE 1. Summary of literature paper used for breast cancer classification.

27466 VOLUME 11, 2023


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

a piece of data from the processed dataset and grouping it uses training under a decentralized model. Once the model is
to mark the order of the participating stations’ FL network independently trained, each of the updated model weights is
(edge sharing). Therefore, updating and training for the entire sent back to a central server, which combines them to create
edge station were sequentially performed until the number a highly efficient model. This also ensures that the data in
of data groups in the divided dataset was exhausted instead each node complies with data security policies and protects
of simultaneous parallel learning on all devices, as in real against any data leaks or breaches. Quality data locally exists
situations. In addition, the decentralized exchange sequential on edge devices in hospital centers around the globe and is
sharing system mechanism has strict privacy conditions. protected by strict privacy laws. FL provides an intelligent
During the local training phase, the pre-split data will be means of connecting machine learning models to this discrete
trained with the local model. The learning process is akin to data regardless of location and, more importantly, without
a regular DL network training process, which includes the breaking privacy laws. Instead of taking the data to the model
following steps: fitting, forward propagation, and backward for general rule training, FL feeds the model to the data
propagation. Local training is completed only after all sta- instead. All that is needed is the flexibility of the data storage
tions have concluded training with their data. The weights device to commit itself to the binding process.
of all stations will be aggregated for the global model update
according to the expression of the FedAvg algorithm. In addi- B. TRANSFER LEARNING FOR FEATURE EXTRACTION
tion to local weights, the number of data points in the data In a hypothetical problem, 1,000 patients must be identified,
group used for training at each station must be collected to but the data for the training consists of only approximately
perform the aggregation. Therefore, in addition to storing four images per person. Thus, there is insufficient data to
weights, each edge station also calculates and retains the train a complete machine learning model. In such cases,
number of data points that it has trained to prepare for the the models are usually pre-trained with extensive data from
synthesis process at the server. The entire local learning sources such as ImageNet, which contains 1.2 million images
process at the edge stations is performed with DL models. and 1,000 different categories. Then, in this study, the feature
Specifically, the network model FedAvg-ANN (MLP) and extractor solution was used. After extracting the features of
FedAvg-CNN and machine learning models kNN, AdaBoost, the images using ConvNet of the pre-trained model, we used
and XGB were used in this study. a linear classifier to classify the images. In short, the image
After the local training is complete, the set of weights and features (e.g., calcifications, tumors, etc.) provided input for
data points for all stations is updated on the server. Currently, linear or logistic regressions. The overview had three feature
the server plays the role of aggregator and runs the algorithm extraction methods, which resulted in three different results
with the set of weights and number of data points from the for each classifier [36]:
stations to find a new set of weights for the global model.
Once the global model has been updated, the server checks • Histograms of oriented gradients (HOGs)
the results of the FL round by running the classification • Features extracted from the discrete cosine (DCT)
problem on the test data, saves the test results, and moves on domain
to the next learning round. The entire FL process is run with • Features extracted from a pre-trained CNN.
N given learning rounds. At each edge station, the number Different feature extraction methods have different advan-
of epochs (DL cycles) is also performed with n pre-selected tages. HOGs are commonly used for object detection and
cycles. The number of FL rounds selected for the simulation employ gradients to provide information about edges, cor-
is 200. The timing and predictive power of the model both ners, and contents in an image. However, the decision was
depend on the number of epochs performed at each station. made to extract features from the DCT domain because the
To check for relativity, the model was sequentially tested with features were created to describe quality parameters in an
one, two, and five epochs in the first 200 FL rounds. Both image. The last method uses features extracted from a CNN;
cases will be tested with only a DL model applied in FL. the network is trained on a large set of images in the object
In this study, the models selected for FL were all DL recognition task to enable it to be generalized to the tasks and
networks with shared parameters in the training process. For another dataset for which the network has not been trained.
backpropagation, the categorical cross-entropy loss function This method was chosen because of its high performance on
and the SGD optimizer were used to improve the training of common tasks in organized learning. The task in Sub-section
all models. During forward propagation, the ReLU function III-B requires a more detailed representation of the compact
was used to activate hidden layer neurons, while the Softmax and efficient local communication of input values through an
function was used for predictive decision-making at the out- unsupervised learning scheme with transfer from the large
put layer. ImageNet (transfer) by mapping the input to a selective latent
FL algorithms are fundamentally different and mainly through a ConvNet network of pre-trained models whose
solve the problem of data security (i.e., the connection predictive output we need. The representative and symbolic
between security hospitals). In a traditional data science learned output feature is usually low-dimensional (called
workflow, data is collected on a single server and used to build FEx) and contains all of the necessary information at the
and train a centralized model. FL has a centralized model that calcification region, as indicated by the heatmap, and can

VOLUME 11, 2023 27467


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

therefore be used as a calculation vector input function for a


supervised learning model (e.g., MLP or CNN) to perform FL
at the edge server. It is noteworthy that the features are learned
by compressing useful information of local input data into the
low-dimensional ConvNet network (MobileNet, ResNet50,
Xception, etc.) from developers providing the network struc-
ture with reference open handover, always learned and main-
FIGURE 4. Illustrations of the calcification status of layers.
tained by local users without sending to the cloud. More-
over, the cloud server only aggregates the updated model
parameters obtained by performing calculations at the edge
server based on the global model and local data from users, abnormal and whether the abnormalities were calcifications
patients, medical devices, etc., without the user’s sensitive or masses and benign or malignant. DDSM provides metadata
data access rights (e.g., data samples, representative low- in three files that include the patient’s age, the study date,
dimensional features) to protect user privacy in federated the date of digitization, the type of dense tissue, the scanner
communication and, optimally, in local sampling dispersion. used to digitize, and the resolution of each image. In addi-
The left portion of Fig. 2 shows how the system is trained tion, cases with anomalies include an OVERLAY file that
on the input set, which results in two classification models: contains information about each abnormality, including the
one for quality and one for content. The reviews are sorted type of abnormality (mass or calcification). The entire data
by rank, and images that are classified as outstanding are processing for training and evaluation is described in detail
sent to the retrieval section. In partial retrieval, selection is in Fig. 3 and the subsequent sub-sections III-C2.
made from sets of similar images, such that that only one
is retrieved. The resulting images are good, outstanding, and 2) DATA PRE-PROCESSING
unique; we perform further analysis by retraining the classi- The dataset used in the study included images from
fication model to provide the best result (e.g., the right image the DDSM and CBIS-DDSM datasets. Images were
is the result of the whole set of images’ progress). pre-processed and converted to 299 × 299 images by extract-
ing the ROIs. The data was stored as a tfrecords file for
C. DATASETS
TensorFlow. The dataset contained 55,890 training examples,
of which 86% were negative, and the remaining 14% were
DL heavily relies on datasets to automatically extract fea-
positive; they were divided into five tfrecords [42]. The
tures that uniquely characterize the various target classes.
data was also divided into training and testing in the CBIS-
In our study, we used the digital database of mammogra-
DDSM dataset. The test files were equally divided into test
phy screenings from the University of South Florida. These
and validation data. However, the separation between the
datasets and step processes are described in the following
test and validation data was incorrectly performed, which
sub-section III-C.
resulted in the non-test files containing only volumes and
the validation files containing only calcifications. The dataset
1) DIGITAL DATABASE FOR SCREENING MAMMOGRAPHY consists of negative images from the DDSM dataset and
(DDSM) DATASETS positive images from the CBIS-DDSM dataset. The data
DDSM is a database of 2,620 scanned film mammography was pre-processed to convert it into 299 × 299 images. The
studies [37], [38], [39], [41]. It includes normal, benign, and negative (DDSM) images were tiled into 598 × 598 tiles,
malignant cases with verified pathology information. The which were then resized to 299 × 299. The positive (CBIS-
Curated Breast Imaging Subset of DDSM (CBIS-DDSM) DDSM) images had their ROIs extracted using the masks
collection includes a subset of DDSM data selected and with a small amount of padding to provide context. Each
curated by a trained mammologist. Since CBIS-DDSM only ROI was then randomly cropped three times into 598 ×
contains abnormal images, conventional scans were obtained 598 images, with random flips and rotations, and then the
from DDSM and combined with CBIS-DDSM scans. How- images were resized down to 299 × 299. These files should
ever, the size image is relatively small. To increase the size be combined for complete and balanced test data. The images
of the dataset, we extracted ROIs from each image, the algo- were labelled in two ways: i) normal label: 0 for negative
rithm is proposed in Algorithm 1. We aimed to classify the and 1 for positive; ii) full multi-class labels: 0 for normal,
calcifications in which stage of the disease, so we extracted 1 for benign calcification, 2 for benign mass, 3 for malignant
information from several available sources and introduced calcification, and 4 for malignant mass.
previous work. Previous studies in [42] did a great job extract- As previous work addressed the classification of prede-
ing ROI features from a combined DDSM and CBIS-DDSM fined abnormalities, we focused on classifying images as
data source. However, our study is only selective regarding normal or abnormal. We expected to retrain the model to
the number and size of images to be suitable for transfer classify for the whole five labels after achieving satisfactory
tasks and federated for edge device environments. The Con- performance illustrated through each stage. Fig. 4 and Table 2
vNets were trained to predict whether the scan was normal or summarize breast cancer classification stages in the DDSM

27468 VOLUME 11, 2023


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

TABLE 2. Breast cancer stages in the DDSM dataset. Algorithm 1 ROI Extraction Algorithm
Input: Slide_size = 299; Full_slide_
size = 598; offset = 60.
Get the Base File Name and the Mask Name.
Add the ROI ← Preprocessing
if mask_size ≤ (full_slice_offset) then
image_slice = image[ROI_edges]
end if
dataset. Based on the analysis results from Fig. 4 and Table 2, if mask_size ≤ (full_slice) / 1.5 then
roi_size = mask_size + 20%
doctors can diagnose which stage a patient is in and can
image_slice = Random_flip and
provide them with helpful advice.
rotate(image[ROI_edges])
end if
3) DATA AUGMENTATION
if mask_size > (full_slice) then
CBIS-DDSM scans are relatively large, with an average roi_size = mask_size + 5%
height of 5,295 pixels and an average width of 3,131 pixels. image_slice = Random_flip and
To create usable images from full-sized scans, ROIs were rotate(image[ROI_edges])
extracted using a mask and sized down to 299 × 299 pixels. end if
Each ROI was extracted in several ways: return image_slice[slice_size]
• The ROI was extracted at the initial size of 598 × 598 Preprocessing(image,mask)
• The ROI was zoomed to 598 × 598, with borders to image ← resize(image) = image2
provide context mask ← resize(mask) = image.size
• If the ROI was too large to fit into a 598 × 598 image, if image > 50,000 white pixels then
it was extracted in 598 × 598 cells with a spacing of 299 image ← image − trimming edges (image) =
The 598 × 598 pixels were then resized to 299 × 299. 20 pixels
end if
To increase data samples, dataset data augmentation was
(center_row, center_col) ← corners(mask)
used, including random positioning of the ROI in the image,
return center_row, center_col, mask
random horizontal flip, random vertical flip, and random
mask_size = int(max(mask.shape[0], mask.shape
rotation. The extraction of the ROI is detailed below. Since
[1]))
the CBIS-DDSM dataset only contained anomalous scans,
return center_row, center_col, mask_size
normal scans were obtained from the DDSM dataset. While
ROI_edges(center_col, center_row, image.shape,
the CBIS-DDSM images were reviewed and changed to
roi_size)
remove pseudo-elements such as white borders and overlay
return (start_row, end_row, start_col, end_col)
text, the DDSM images were absent. Many variable-size
contours and arrays of white color were used to obscure the
patient’s personal information. To remove contours, DDSM
images were cropped by 7% on each side. Since the extracted
CBIS-DDSM ROIs were proportional to their size instead of From there, appropriate decisions can be made to improve the
at a fixed zoom, the DDSM images were scaled down by a quality of the data input. Fig. 5 shows the density and pixel
random factor of between 1.8 and 3.2, then segmented into intensity distribution for different stages of breast cancer of
299 × 299 pixels with spacing between 150 and 200 pixels. DDSM dataset used in the study. The vertical axis represents
the number of pixels; the higher the vertices (e.g., the label
4) HISTOGRAM ‘‘Malignant Calcification’’), the more pixels there are in that
It was necessary to collect different datapoints from the area and the greater the detail. The horizontal axis represents
DDSM dataset to evaluate quality indicators because the the average brightness of each area, which means a new
data constantly fluctuated. Consider those randomly obtained color similar to gray at 18%. The image labeled ‘‘Normal’’
data, it will be challenging to appreciate the whole meaning occupies the dark space because it does not contain more
of the information they bring, and it is difficult to iden- calcified areas (white pixels) than the layers labeled positive
tify their fluctuations. To analyze and evaluate the quality for breast cancer. The origin of 0 is considered the darkest
situation from the collected data to draw accurate conclu- (akin to black); values increase as they move to the right, and
sions, people gather, classify, and rearrange them to represent the lightest value is 255. The area between these two values
the distribution in the form of charts. Different pixel densi- represents medium brightness. Thus, the closer pixels are to
ties (histograms) according to the characteristics of the data a value of 0, the darker the image; conversely, the closer they
obtained. Based on the form of a frequency distribution by the are to a value of 255, the brighter the image. Pixels on the
graph, one can have accurate conclusions about the normal vertical column of either value will lose detail (either too
or abnormal situation of the quality criteria of the process. dark or too light). A bright and clear image has a bell-shaped

VOLUME 11, 2023 27469


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

FIGURE 6. Class distribution predictive breast cancer data before SMOTE


FIGURE 5. Density and pixel intensity distribution for different stages of use.
breast cancer.

histogram, peaks in the region of medium brightness, and


tapers off to the left and right regions of the graph.

5) SMOTE
From the fully labeled dataset, data balancing was performed
using the SMOTE method [40]. In an unbalanced dataset,
a different number of input samples represents each out-
put layer (or target layer). The resulting classifier loss will
often not be the high fact that the data is always non-IID.
However, the scenario that solves this problem is quite com-
FIGURE 7. Class distribution predictive breast cancer data after SMOTE
mon nowadays. Some current methods apply unbalanced data use.
pre-processing:
• Oversampling: Oversampling involves increasing the
number of samples of the smallest class up to the number
of samples of the largest class and creating composite
templates
• Undersampling: Undersampling involves reducing the
number of samples of the largest class to the smallest
class size and removing some samples from the largest
class
FIGURE 8. Data division process with the DDSM dataset.
• Class weight: This method involves specifying a weight
for each class. The weight of the largest class is equal abnormal minority class with normal), with a goal of 85%
to 1, while the weight of the smallest class is equal to normal. The data was separated randomly into training and
the largest class’s sample divided by the smallest class’s test data using existing parts of the CBIS-DDSM dataset to
sample avoid overlap. By using the scikit-learning machine learn-
• Decision threshold: If the predicted value is greater than ing library, which provides the train_test_split()
the threshold, it is set to 1; otherwise, it is set to 0. function to perform train test split. The overall data was
Because of the unbalanced input data, one way to address divided randomly into training, validation, and testing data
unbalanced datasets is to oversample the minority. The most according to the following percentages: 80%, 10%, and 10%,
straightforward approach is to duplicate examples in the respectively.
minority class. However, these examples do not add any new
information to the model. Instead, recent examples can be D. FEDAVG ALGORITHMS
synthesized from existing measures. This data augmentation The primary algorithms 2, 3, and 4, which were used to
type for the minority class is called the synthetic minority simulate the learning process in the FL model, is introduced
oversampling technique, abbreviated as SMOTE. Since the in the sub-section III-D. The outermost for loop loops through
Fig. 6 and 7 show that the class distribution predictive breast a given number of learning loops, each being a global learning
cancer before and after SMOTE use. loop. In each of these loops, the program performs global
parameter initialization in the first step. The feature extraction
6) DATA SPLITING task performs the following steps for the client side: loading
Only around 10% of mammogram images are abnormal. the optimal weighted model from ImageNet corresponding
To maximize the likelihood, we evaluated our training data to breast cancer data, randomly partitioning the data into
more towards the abnormal scan direction (balance the packets for direct prediction (i.e., taking advantage of the

27470 VOLUME 11, 2023


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

model’s convolutional layers to extract features), packing the Algorithm 3 Collaborative Local Update in Client-
data into FE_x, and sending it to the edge. There will be Edge. There Are M Edge Servers Model Are Indexed
two processes at the edge server. The first is training for by m;
global communication between edges, while the other is local B Is the Edge Batch Size From Cloud Sever Request;
feature extraction of data from clients and linking. The model E Is the Number of Client Epochs;
will get weight with the general model (ANN-FedAvg, CNN- P Feature Extracted Dataset From Edge Type 1-
FedAvg). Client updates pretrained data with ImageNet, and Dimensional and η Is the Learning Rate Into Each
extracted will be reconnected. Then, edge server will train Edge Rounds
with the total data obtained to generate local weight. This is a EdgeSeverUpdate(m, w, P)
global update algorithm that could sever, initialize weight w0 , received M (Wround ) from global model of cloud sever
then send to edge-sever according to the preset round-robin and initialize to edge model
schedule. On the edge-server side will update the weight M (Wround ) ← initialize(net)
later in the training process at communication loops. After for each edge model m ∈ M in parallel doP
getting the weight list of all participating edges, the model Pmt+1 ← EdgeConcatenate(m, FE_x, c k)←
will aggregate and average. Then update the weights again ClientUpdate(k, FE_x) /* Pretrained
for the next round. & arrange global model */
for each edge epoch from 1 to E do
Algorithm 2 Local Pretrained-Model Update Feature for batch b ∈`B do
Extracted Low-Dimensional Dataset. wtm ← η l (M (wround ; b))
Msub Are Selected Model From Edge Sever; end for
C Client With Index k in Edge Computing Area; end for
B Is the Edge Batch Size From Edge Sever Request; end for
Psub Data Point Each Node Be Divided From Edge return wm and Pm to cloud server
ClientUpdate(k, FE_x)
Msub (woptimized ) ← initialize(net) from IMAGENET
/* feature extractor Algorithm 4 Global Aggregation. There Are N Edge
*/
Bk ← (split Psub into batches of size Bk Sever Are Indexed by i;
for batch b in Bk do B Is the Edge Batch Size;
images ← uniformly random sample b images E Is the Number of Edge Epochs;
X , y ← preprocess(images) P Data Point Each Edge Held
z ← predict(net, X ) ← Msub (woptimized ) for each round t = 1, 2 . . . do
l ← loss(z, y) updated wt send to all N edge sever each round
FE_x ← update[flatten(z), backward(l)] end for
end for for each edge i ∈ N in parallel do
return FE_x to edge server wit+1 ← EdgeSeverUpdate(i, wt , Pit )
wt+1 ← N w̃l
P
i=1 N /* global aggregation
based on parameters wt+1 */
end for
IV. EXPERIMENTAL VALIDATION
The current section evaluates the effectiveness of transfer
learning using popular models such as MobileNet, Densen-
net21, Xception, and Resnet50 with two DDSM datasets on evaluation as accuracy, precision, recall, and F1-score and the
a FL framework and compares it to previous methods of area under the curve (AUC) are used to evaluate the models.
breast cancer identification, including FL-CNN and FL-MLP. After a well-performing model is extracted on the shared
For fair comparison, all models are implemented in Python balance dataset, three epochs with the global model will be
3.8 with Tensorflow 2.9, and Anaconda3. The experiments retrained, with the previous weights reused to update the
were run in the machine with following specifications: GPU training. If the model works well, it is used to classify all five
NVIDIA TESLA P100 2vCPU RAM 256 GB. Furthermore, classes; it is assumed that this will allow the convolutional
in all simulations, we used the same settings: rounds = 60, layers to extract the essential features that provide optimal
edges = 3, clients = 4, frequency = 5, epochs = 3, batches = input from the device hospital update.
32, length = 27,940. We considered using transfer learning from MobileNet,
Densenet121, Xception, Resnet50, or randomization models
A. PERFORMANCE TO COMPARE THE QUALITY at each suitable parametric model time, but we decided that
CLASSIFICATION FOR FL AND CENTRALIZDED LEARNING the features of the ImageNet data were adequately different
Since training on the dataset is prolonged, the period mod- from those of the ImageNet data features of X-ray scans.
els choose three epochs with binary labels. The metrics Therefore, it made more sense to learn the features from

VOLUME 11, 2023 27471


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

scratch on this dataset. However, using transfer learning minority classes. That is, the periods do not almost correctly
between models accelerated the training process by consid- predict more than the threshold of 0.5.
ering around 60 communication rounds, which saved weeks
of training time to reach the global minimum loss. C. PERFORMANCE TO COMPARE THE AVERAGE
In the first performance of Table 3, we compare accu- ACCURACY, AUC FOR K-FOLD CROSS-VALIDATION
racy and AUC between CL, including models like KNN, In the sub-section III-C6, we randomly split the dataset into
AdaBoost, and XGB, and FL models like FedAvg-MLP and training, validation, and testing sub-datasets by the following
FedAvg-CNN for two datasets, balanced and imbalanced percentages: 80%, 10% and 10%. This division is perfectly
data. In addition, in transfer learning, we leave the random reasonable if we have a large amount of data. However, when
model in the data processing so that we can compare each there is too little data, this division will lead to the abysmal
different model. performance of the deep learning model. The reason is that
The AUC results with the balanced dataset show that some data points useful for training have been included for
XGB’s centralized learning yielded the best results, followed validation and testing, but the model still needs to learn that
by FedAvg-CNN, FedAvg-MLP, KNN, and AdaBoost, which data point. Sometimes, the small amount of data will lead to
exhibited similar accuracy. With the imbalanced dataset, erroneous results when validating and testing because some
XGB still demonstrated the best results, followed by FedAvg- classes are only used in validation and testing and not in
MLP, KNN, FedAvg, and AdaBoost with AUC and similar training (because the division of training and validation is
for accuracy. Based on the results, the application of FL in entirely random). If only based on that result were to evaluate
breast cancer classification is a reasonable choice when it the model, it would be inaccurate. In this study, to assess
is associated with more advantages than traditional learning whether the MobileNet model we have chosen is suitable for
methods. Moreover, the results confirmed that AUC should the DDSM dataset, we use K-fold cross-validation (K = 4)
be chosen over accuracy in breast cancer classification. to evaluate. In the case of non-Fold without cross-validation,
In the evaluation in Tabs. 4, 5, 6, and 7, similar to Table 3, we test and use the 90% (80% training set, 10% validation
we change the transfer learning models in data processing set) dataset to train the process and the 10% test set, while
at the client. However, the analysis showed lower results for the 10% test set is distinct. In the case of K = 4 with cross-
FL than for XGB (100% for both balanced and unbalanced validation, we use 90% (training, validation) divided every
data). Even when comparing the results with other meth- 4-fold corresponding to each K, including 75% training set
ods, the FeAvg-MLP and FeAvg-CNN results combined with (K-1) and 25% (K) for the validation set. The remaining 10%
MobileNet transfer learning are the best compared to other of the test set is similar to non-Fold (fair-play) that we use
classical methods. for testing after finishing training with cross-validation. After
However, in terms of other advantages, FedAvg-CNN the evaluation process, we choose K = 4 because the data
with MobileNet transfer learning gives better results than shared by the server is already relatively small enough for the
other learning methods, including accuracy and AUC in both validation process to have enough data to test. Then we train
data types: balanced (97,106%/99.743%) and unbalanced the model K times, where ‘‘1’’ part is the validation data and
(85,741%/95,999%). In addition to checking the quality (K-1) the rest is the training data. The final model evaluation
classification performance, we must pay attention to the result will be the average of K training times evaluation
selection of learning models for customers. In FL, low- results. Consider Table 9, we do non-fold normal and 4-fold
parameter models are necessary because models with sig- cross-validation with two datasets, balanced and unbalanced,
nificant parameters cannot run on low-profile clients. Based with different models.
on the results in Table 3 to 7 choose MobileNet, with a Evaluation results based on accuracy and AUC show that
capacity of 4.3M, as the lowest and most reasonable. Again, it is correct to choose FeAvag-CNN and MobileNet models
we recommend selecting the suitable learning models for with advantages when using the FL framework. Then the
research requirements; otherwise, the choice will affect the value of accuracy and AUC (97.91%/99.80% in case of
accuracy. non-fold and 92.79%/99.04% in case of 4-fold) balance. And
accuracy and AUC (84.93%/95.93% in case of non-fold and
89.33%/96.99% in case of 4-fold) balance is the highest and
B. PERFORMANCE TO COMPARE THE AVERAGE most stable compared to the other methods.
ACCURACY, RECALL, AND F1-SCORE RESULTS ACROSS
BREAST CANCER STAGE D. PERFORMANCE TO COMPARE THE ACCURACY ON
Consider the results on the breast cancer stage prediction LOCAL MODELS AT EDGE SERVERS
classes as in Table 8, the recall result to avoid false negative We performed a local comparison of training, testing, and
control almost reached 100% for ‘‘IID’’ when the DDSM data validation data on three edge servers for 60 rounds of data
used about 50%, and oversampling by SMOTE the samples without overfitting (see Fig. 9). For this test, we evenly split
on three edge servers, then averaging from a total of about the data across edge servers, assuming that each server would
600 samples per layer. Meanwhile, the scenario ‘‘non-IID’’ have 33.3% of the original data. This data will be optionally
keeps the original data of the model entirely unchecked for the distributed and have no pre-assigned label order. The data

27472 VOLUME 11, 2023


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

TABLE 3. Quality classification accuracy and AUC performance assessment results for a balanced and imbalanced datasets for CL at edge servers,
FL models at cloud servers, and random model in transfer learning at clients.

TABLE 4. Quality classification accuracy and AUC performance assessment results for a balanced and imbalanced datasets for CL at edge servers,
FL models at cloud servers, and MobileNet model in transfer learning at clients.

TABLE 5. Quality classification accuracy and AUC performance assessment results for a balanced and imbalanced datasets for CL at edge servers,
FL models at cloud servers, and Densenet121 model in transfer learning at clients.

TABLE 6. Quality classification accuracy and AUC performance assessment results for a balanced and imbalanced dataset for centralized learning at edge
servers, federated learning models at cloud servers, and Xception model in transfer learning at clients.

TABLE 7. Quality classification accuracy and AUC performance assessment results for a balanced and imbalanced dataset for centralized learning at edge
servers, federated learning models at cloud servers, and Resnet50 model in transfer learning at clients.

extraction process was performed after the client extracts. The E. PERFORMANCE TO COMPARE THE GLOBAL AUC AND
results of the edge server training demonstrate that, with the RECALL POINT CHART BOX PLOT WITH DIFFERENT
CNN model, the distribution was only different, and the test MODELS
results for edge 1 were not significantly lower than those of We compared AUC score and recall point with different mod-
the other two edges. Thus, the proposed plan is suitable only els in two cases: IID and non-IID. Based on Fig. 10, we see
when accuracy sharply increases in the first communication the AUC scores of Xception-IID, DenseNet121-non-IID, and
rounds. Federated transfer learning is the pre-trained process ResNet50-non-IID, the AUC scores data of Res-Net50-non-
with the ImageNet framework that sup-ports optimizing the IID and Xception-IID show that the process is under per-
call training phase. forming because it tends to concentrate data (median) at a

VOLUME 11, 2023 27473


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

TABLE 8. Evaluation of average precision, recall, F1-score results across classes in the case of FeAvg-CNN + MobileNet, IID and non-IID datasets.

TABLE 9. Average accuracy and AUC of five classes using non-Fold and 4-Fold validation in the case of balanced and imbalanced datasets.

FIGURE 9. Comparison of training, testing, and validation accuracy per


communication round for three edge servers with the same settings.

FIGURE 10. Comparing the AUC score, recall point on eight model FIGURE 11. Heat map distribution for breast cancer classification stages
learnings as MobileNet, DenseNet121, Xception, ResNet50 for both case (a to e) in the DDSM dataset after classification.
IID and non-IID data distributions.

high level, large variability. Meanwhile, the quality control on any input image, using a binary threshold to pick out the
of MobileNet and DenseNet121 was the best because the region (high, low). Next, we initialized roi_mask to define the
AUC score for concentration was low and fluctuated within a area of interest. Finally, a heat map and overlaying are initial-
narrow range. In addition, the degree of dispersion for recall ized after classification. The heat map provided a full heat
points in the communication loop rapidly met the require- index (MAM, MAC, BEM, BEC, NO), and live image of the
ments in the case of MobileNet and DenseNet121. At the central areas with prominent pixels in the image. A heatmap-
same time, the remaining scenarios did not converge. Once based assessment can highlight the type of partition to be
again, the effectiveness of the transfer learning method in diagnosed and indicate that it is worthy of investigation.
accurately classifying the area of breast cancer can be seen. In addition to staging breast cancer based on the assess-
ments described in the previous subsection, a heatmap-based
F. PERFORMANCE A HEAT MAP BASED ON A BREAST assessment offers a different perspective of breast cancer and
CANCER STAGES provides doctors and medical staff with another assessment
Consider directly at the breast cancer classification stage in method.
DDSM data. We performed a heat map review by applying
FeAvg-CNN + MobileNet (see Fig. 11). The results in order V. DISCUSSION
img input > thresh > ROI mask > heatmap > overlay shows We demonstrated that FL with CNNs can be trained to deter-
an overview of the breast cancer recognition partition based mine whether part of a mammogram is abnormal, improving

27474 VOLUME 11, 2023


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

multiclass classification through each stage was positive REFERENCES


with 100% recall ability and 99.804% AUC when using [1] (2021). Breast Cancer Now Most Common Form of Cancer: WHO Taking
Action. Accessed: Feb. 3, 2021. [Online]. Available: https://www.who.int
pre-trained MobileNet model to extract features. Adjusting [2] (2021). Breast Cancer Overtakes Lung Cancer in Terms of Number of New
the decision threshold and communication parameters further Cancer Cases Worldwide. Accessed: Feb. 4, 2021. [Online]. Available:
improved the use of SGD as a weight global update function. https://www.who.int
[3] Z. Wang, M. Li, H. Wang, H. Jiang, Y. Yao, H. Zhang, and J. Xin, ‘‘Breast
These methods can be used in premammogram screenings to cancer detection using extreme learning machine based on feature fusion
allow radiologists to focus on scans that are likely to contain with CNN deep features,’’ IEEE Access, vol. 7, pp. 105146–105158, 2019,
multi-hospital and multinational abnormalities. doi: 10.1109/ACCESS.2019.2892795.
[4] J. Zheng, D. Lin, Z. Gao, S. Wang, M. He, and J. Fan, ‘‘Deep learn-
Although training on many units with each small dataset, ing assisted efficient AdaBoost algorithm for breast cancer detection
FL yielded the same classification accuracy results as the and early diagnosis,’’ IEEE Access, vol. 8, pp. 96946–96954, 2020, doi:
centralized learning method. CNN is one of two topology 10.1109/ACCESS.2020.2993536.
[5] D. Lévy and A. Jain, ‘‘Breast mass classification from mammograms using
models that achieved the best and most accessible recall deep convolutional neural networks,’’ 2016, arXiv:1612.00542.
performance for distributed modeling. A key advantage of [6] N. Mobark, S. Hamad, and S. Z. Rida, ‘‘CoroNet: Deep neural network-
based end-to-end training for breast cancer diagnosis,’’ Appl. Sci., vol. 12,
combining FedAvg-CNN and MobileNet feature extraction
no. 14, p. 7080, Jul. 2022, doi: 10.3390/app12147080.
was the ability to customize the ConvNet layer volume to [7] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Arcas,
quickly obtain results that were equivalent to larger networks, ‘‘Communication-efficient learning of deep networks from decentral-
ized data,’’ in Proc. 20th Int. Conf. Artif. Intell. Statist., vol. 54, 2017,
but long-distance connectivity was not guaranteed. There-
pp. 1273–1282. [Online]. Available: https://proceedings.mlr.press/v54/
fore, FedAvg-CNN had the advantage of adapting well to mcmahan17a.html
mobile devices with hardware that allows the rapid process- [8] X. Peng, Z. Huang, Y. Zhu, and K. Saenko, ‘‘Federated adversarial domain
adaptation,’’ 2019, arXiv:1911.02054.
ing of medium tasks. Finally, performing simulations with [9] X. Li, M. Jiang, X. Zhang, M. Kamp, and Q. Dou, ‘‘FedBN: Feder-
a random number of stations for each round of FL demon- ated learning on non-IID features via local batch normalization,’’ 2021,
strated that CNN can function even in unstable network arXiv:2102.07623.
[10] Q. Wu, X. Chen, Z. Zhou, and J. Zhang, ‘‘FedHome: Cloud-edge based
situations. personalized federated learning for in-home health monitoring,’’ IEEE
However, FL generally requires longer training time in Trans. Mobile Comput., vol. 21, no. 8, pp. 2818–2832, Aug. 2022, doi:
simulations than centralized learning. The CNN model still 10.1109/TMC.2020.3045266.
[11] S. Bozinovski, ‘‘Reminder of the first paper on transfer learning in neural
requires many data processing steps and feature extraction networks, 1976,’’ Informatica, vol. 44, no. 3, pp. 293–295, Sep. 2020.
to achieve high accuracy for the request and the ability to [12] Q. Li et al., ‘‘A survey on federated learning systems: Vision, hype
recall avoiding false positive detection. Moreover, although and reality for data privacy and protection,’’ IEEE Trans. Knowl. Data
Eng., vol. 35, no. 4, pp. 3347–3366, Apr. 2023, doi: 10.1109/TKDE.
it works in situations with changing station numbers, the per- 2021.3124599.
formance of the learning model associated with CNN in the [13] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, Q. S. T. Quek,
report is still significantly degraded (98% down) in the real and H. V. Poor, ‘‘Federated learning with differential privacy: Algorithms
and performance analysis,’’ IEEE Trans. Inf. Forensics Security, vol. 15,
scenario with more individual data humanized and increases pp. 3454–3469, 2020.
over time and needs time to maintain the frequency of [14] X. Lu, Y. Liao, P. Lio, and P. Hui, ‘‘Privacy-preserving asynchronous
updates. federated learning mechanism for edge network computing,’’ IEEE Access,
vol. 8, pp. 48970–48981, 2020, doi: 10.1109/ACCESS.2020.2978082.
[15] C. He, S. Li, J. So, M. Zhang, H. Wang, X. Wang, P. Vepakomma,
VI. CONCLUSION AND FUTURE RESEARCH A. Singh, H. Qiu, L. Shen, P. Zhao, Y. Kang, Y. Liu, R. Raskar, Q. Yang,
M. Annavaram, and S. Avestimehr, ‘‘FedML: A research library and
In this study, we presented a solution for classifying breast benchmark for federated machine learning,’’ arXiv, vol. abs/2007.13518,
cancer images using feature extraction from multiple partici- 2020.
pating environments instead of a centralized learning facility. [16] A. Saber, M. Sakr, O. M. Abo-Seida, A. Keshk, and H. Chen,
‘‘A novel deep-learning model for automatic detection and classification of
The centralized environment consisted of an inter-national breast cancer using the transfer-learning technique,’’ IEEE Access, vol. 9,
group of hospitals and medical imaging centers that joined pp. 71194–71209, 2021, doi: 10.1109/ACCESS.2021.3079204.
collaborative efforts to train the model to be completely data- [17] H.-J. Chiu, T.-H.-S. Li, and P.-H. Kuo, ‘‘Breast cancer–detection sys-
tem using PCA, multilayer perceptron, transfer learning, and support
decentralized, without sharing any data between hospitals. vector machine,’’ IEEE Access, vol. 8, pp. 204309–204324, 2020, doi:
Moreover, we focused on analyzing recall performance more 10.1109/ACCESS.2020.3036912.
than accuracy because false negatives can be life-threatening. [18] R. Singh, T. Ahmed, A. Kumar, A. K. Singh, A. K. Pandey, and
S. K. Singh, ‘‘Imbalanced breast cancer classification using transfer learn-
By contrast, like studies, humans can consider false pos- ing,’’ IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 18, no. 1, pp. 83–93,
itives instead of a whole before. The results demonstrate Jan. 2021, doi: 10.1109/TCBB.2020.2980831.
that the accuracy was higher than that of other models. [19] A. Elmoufidi, ‘‘Deep multiple instance learning for automatic breast cancer
assessment using digital mammography,’’ IEEE Trans. Instrum. Meas.,
In the future, we plan to create a system that will scan vol. 71, pp. 1–13, 2022, doi: 10.1109/TIM.2022.3177141.
the entire mammogram as input, segment it, and analyze [20] T. A. Khoa, D.-V. Nguyen, M.-S. Dao, and K. Zettsu, ‘‘Fed xData:
each segment to yield results for the entire mammogram to A federated learning framework for enabling contextual health monitoring
in a cloud-edge network,’’ in Proc. IEEE Int. Conf. Big Data (Big Data),
make an end- the complete to-end for mammogram analysis. Dec. 2021, pp. 4979–4988, doi: 10.1109/BigData52589.2021.9671536.
In addition to improved data processing, simulations can be [21] T. A. Khoa, D.-V. Nguyen, P. V. Nguyen Thi, and K. Zettsu, ‘‘FedMCRNN:
extended with multiple clients or groups of clients on separate Federated learning using multiple convolutional recurrent neural net-
works for sleep quality prediction,’’ in Proc. 3rd ACM Workshop Intell.
devices, and individual patient interventions share privacy Cross-Data Anal. Retr., New York, NY, USA, Jun. 2022, pp. 63–69, doi:
security. 10.1145/3512731.3534207.

VOLUME 11, 2023 27475


Y. Nguyen Tan et al.: Transfer Learning Approach to Breast Cancer Classification in a FL Framework

[22] T. A. Khoa, D.-V. Nguyen, M.-S. Dao, and K. Zettsu, ‘‘SplitDyn: Federated Y. NGUYEN TAN is currently pursuing the Ph.D.
split neural network for distributed edge AI applications,’’ in Proc. IEEE degree with the Faculty of Electrical and Elec-
Int. Conf. Big Data (Big Data), Osaka, Japan, Dec. 2022, pp. 6066–6073, tronic Engineering, Ton Duc Thang University,
doi: 10.1109/BigData55660.2022.10020803. Ho Chi Minh City, Vietnam. He is a Lecturer
[23] D.-D. Le, A.-K. Tran, M.-S. Dao, K.-C. Nguyen-Ly, H.-S. Le, with Phan Thiet University. His research interests
X.-D. Nguyen-Thi, T.-Q. Pham, V.-L. Nguyen, and B.-Y. Nguyen-Thi, include machine learning, image processing, and
‘‘Insights into multi-model federated learning: An advanced approach
computer vision in the medical field.
for air quality index forecasting,’’ Algorithms, vol. 15, no. 11, p. 434,
Nov. 2022, doi: 10.3390/a15110434.
[24] D.-V. Nguyen and K. Zettsu, ‘‘Spatially-distributed federated learning of
convolutional recurrent neural networks for air pollution prediction,’’ in
Proc. IEEE Int. Conf. Big Data (Big Data), Dec. 2021, pp. 3601–3608,
doi: 10.1109/BigData52589.2021.9671336.
[25] C. I. Bercea, B. Wiestler, D. Rueckert, and S. Albarqouni, ‘‘FedDis:
Disentangled federated learning for unsupervised brain pathology segmen-
tation,’’ 2021, arXiv:2103.03705.
[26] W. Li, F. Milletarì, D. Xu, N. Rieke, J. Hancox, W. Zhu, and M. Baust, VO PHUC TINH is currently pursuing the bach-
‘‘Privacy-preserving federated brain tumour segmentation,’’ in Proc. Int. elor’s degree with the Faculty of Electrical and
Workshop Mach. Learn. Med. Imag. Cham, Switzerland: Springer, 2019, Electronic Engineering, Ton Duc Thang Univer-
pp. 133–141. sity, Ho Chi Minh City, Vietnam. His research
[27] L. Huang, A. L. Shea, H. Qian, A. Masurkar, H. Deng, and D. Liu, interests include the Internet of Things, embedded
‘‘Patient clustering improves efficiency of federated machine learning to systems, and artificial intelligence.
predict mortality and hospital stay time using distributed electronic medical
records,’’ J. Biomed. Informat., vol. 99, Nov. 2019, Art. no. 103291.
[28] X. Li, Y. Gu, N. Dvornek, L. H. Staib, P. Ventola, and J. S. Duncan,
‘‘Multi-site fMRI analysis using privacy-preserving federated learning and
domain adaptation: ABIDE results,’’ Med. Image Anal., vol. 65, Oct. 2020,
Art. no. 101765.
[29] Y. Yeganeh, A. Farshad, N. Navab, and S. Albarqouni, ‘‘Inverse distance
aggregation for federated learning with non-IID data,’’ in Domain Adap-
tation and Representation Transfer, and Distributed and Collaborative
Learning. Cham, Switzerland: Springer, 2020, pp. 150–159.
[30] Z. Wang, Q. Liu, and Q. Dou, ‘‘Contrastive cross-site learning with PHAM DUC LAM is currently a Lecturer with
redesigned net for COVID-19 CT classification,’’ IEEE J. Biomed. Health Nguyen Tat Thanh University, Vietnam. His
Informat., vol. 24, no. 10, pp. 2806–2813, Oct. 2020. research interests include health applications that
[31] H. R. Roth et al., ‘‘Federated learning for breast density classification: apply technologies, such as image processing,
A real-world implementation,’’ in Domain Adaptation and Representation embedded systems, and the Internet of Things.
Transfer, and Distributed and Collaborative Learning. Cham, Switzerland:
Springer, 2020, pp. 181–191.
[32] A. Jiménez-Sánchez, M. Tardy, M. A. González Ballester, D. Mateus, and
G. Piella, ‘‘Memory-aware curriculum federated learning for breast cancer
classification,’’ 2021, arXiv:2107.02504.
[33] Z. Ma, M. Zhang, J. Liu, A. Yang, H. Li, J. Wang, D. Hua, and M. Li,
‘‘An assisted diagnosis model for cancer patients based on federated
learning,’’ Frontiers Oncol., vol. 12, Mar. 2022, Art. no. 860532, doi:
10.3389/fonc.2022.860532.
[34] B. L. Y. Agbley, J. Li, M. A. Hossin, G. U. Nneji, J. Jackson, H. N. Monday,
and E. C. James, ‘‘Federated learning-based detection of invasive carci-
noma of no special type with histopathological images,’’ Diagnostics, NGUYEN HOANG NAM received the Ph.D.
vol. 12, no. 7, p. 1669, Jul. 2022, doi: 10.3390/diagnostics12071669. degree from the National Chiao Tung University,
[35] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: Taiwan, in 2017. Currently, he is a Researcher
A large-scale hierarchical image database,’’ in Proc. IEEE Comput. Vis. with the MERLIN Research Group and a Lec-
Pattern Recognit. (CVPR), 2009, pp. 248–252. turer with the Faculty of Electrical and Electronics
[36] M. Lorentzon, ‘‘Feature extraction for image selection using machine Engineering, Ton Duc Thang University, Ho Chi
learning,’’ M.S. thesis, Dept. Elect. Eng., Comput. Vis. Lab., Linköping Minh City, Vietnam. His research interests include
Univ., Linköping, Sweden, 2017, pp. 17–18. health applications using artificial intelligence,
[37] M. Heath, K. Bowyer, D. Kopans, P. Kegelmeyer Jr., R. Moore, K. Chang, image processing, and computer vision.
and S. Munishkumaran, ‘‘Current status of the digital database for
screening mammography,’’ in Digital Mammography. Berlin, Germany:
Springer, 1998, pp. 457–460.
[38] R. S. Lee, F. Gimenez, A. Hoogi, K. K. Miyake, M. Gorovoy, and D.
L. Rubin, ‘‘A curated mammography data set for use in computer-aided
detection and diagnosis research,’’ Sci Data, vol. 4, p. 170177, Dec. 2017,
doi: 10.1038/sdata.2017.177.
[39] M. Heath, K. Bowyer, D. Kopans, R. Moore, and W. P. Kegelmeyer, TRAN ANH KHOA received the Ph.D. degree
‘‘The digital database for screening mammography,’’ in Proc. 5th Int.
from the University of Siena, Siena, Italy, in 2017.
Workshop Digit. Mammogr., M. J. Yaffe, Ed., 2001, pp. 212–218.
Currently, he is a Researcher with the MERLIN
[40] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, ‘‘SMOTE:
Synthetic minority over-sampling technique,’’ J. Artif. Intell. Res., vol. 16, Research Group, Faculty of Electrical and Elec-
pp. 321–357, Jun. 2002. tronics Engineering, Ton Duc Thang University,
[41] R. S. Lee, F. Gimenez, and A. Hoogi, ‘‘Curated breast imaging subset of Ho Chi Minh City, Vietnam. His research interests
DDSM,’’ Tech. Rep., 2016. include health applications using artificial intelli-
[42] E. A. Scuccimarra. DDSM Mammography. Accessed: Feb. 3, 2022. gence and the Internet of Things.
[Online]. Available: https://www.kaggle.com/datasets/skooch/ddsm-
mammography

27476 VOLUME 11, 2023

You might also like