A Systematic Review On Federated Learning in Medical Image Analysis
A Systematic Review On Federated Learning in Medical Image Analysis
ABSTRACT Federated Learning (FL) obtained a lot of attention to the academic and industrial stakeholders
from the beginning of its invention. The eye-catching feature of FL is handling data in a decentralized manner
which creates a privacy preserving environment in Artificial Intelligence (AI) applications. As we know
medical data includes marginal private information of patients which demands excessive data protection from
disclosure to unexpected destinations. In this paper, we performed a Systematic Literature Review (SLR) of
published research articles on FL based medical image analysis. Firstly, we have collected articles from
different databases followed by PRISMA guidelines, then synthesized data from the selected articles, and
finally we provided a comprehensive overview on the topic. In order to do that we extracted core information
associated with the implementation of FL in medical imaging from the articles. In our findings we briefly
presented characteristics of federated data and models, performance achieved by the models and exclusively
results comparison with traditional ML models. In addition, we discussed the open issues and challenges of
implementing FL and mentioned our recommendations for future direction of this particular research field.
We believe this SLR has successfully summarized the state-of-the-art FL methods for medical image analysis
using deep learning.
INDEX TERMS Federated learning, machine learning, medical image analysis, data privacy, systematic
literature review.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
28628 VOLUME 11, 2023
M. F. Sohan, A. Basalamah: Systematic Review on FL in Medical Image Analysis
TABLE 1. Contribution of available SLR on FL-driven medical data analysis and out study.
can be very efficient in terms of building robust AI models. At this stage, it is time to look back, need to review and assess
Since models are trained in centralized individual locations what has been done till now, what are the impacts of FL on
in traditional ML, the collaboration between models is quite medical imaging. Meanwhile, some SLR have been published
tough. Contrastingly, X-rays, CT, MRI all of these are per- on the topic, however, they were about overall healthcare
sonal data pertaining individual patients which need to protect applications not particularly for the medical image analysis
from risk of this medical information being disclosed or context. A SLR has been presented in [5], they considered all
revealed to any unauthorized third party. In addition, even of the articles which have used all forms of medical data to
though data sharing is possible, the data store, processing, train their FL models. Similarly in [3], [9], and [6] the authors
and analysis are still difficult tasks in a centralized manner. have included the whole healthcare area to survey and review
For such that scenario, data encryption-decryption could be the papers. Some review articles presented specific medical
a potential solution to exchange information between partic- domains, for example, Naeem et al. [10] worked particularly
ipants; however the process could be complex, time consum- on brain tumor diagnosis using MRI images. Since FL is
ing and not sustainable [3]. So, instead of bringing the data to comparatively a new concept, most of the review articles
the location where the model is trained, why not bring the emphasized on the design and implementation. Secondly,
model to the data (institutions and the hospitals) and train they discussed the privacy or security opportunity, which is
directly there in-house, it allows collaborative learning with- the fundamental characteristic of FL. Some of them [5], [7],
out centralizing the dataset itself, this is called FL. It was first and [10] were formulated on different research questions,
introduced in 2016 [4] and gained a lot of attraction within a common question was regarding the state-of-the-art FL
last couple of years for the healthcare domain. It addresses methods; besides, data properties, impact, gaps and future
the privacy and data protection concern, which is currently research have been investigated. Alongside, several survey
an important problem in developing medical AI. In FL, the articles have been published on FL for healthcare informatics.
participants can train models locally and estimate different Xu et al. [11] surveyed the papers that focus FL in the biomed-
parameters for respective models, then share the parameters ical area to provide a review. Their effort was to summarize
to a centralized server for aggregating them. Therefore, the the privacy, statistical and system challenges that exist in
focus is not on which data is used or what algorithms can be this specific domain. A well-known article in this field [12],
trained, the concept is managing the data in a different way where the authors discussed the prime factors related to FL
where data privacy is reserved. in digital health with challenges and solutions.
This study is a SLR, we exclusively investigated the FL in
A. OBJECTIVE AND CONTRIBUTION medical image analysis and extensively touched every com-
Since medical images are sensitive data, it needs to be pro- ponent in the considered articles, specially the performance
tected and preserves the rights of users’ personal information. analysis and comparison with usual ML, which is the main
We already discussed FL is arrived to solve the data privacy distinction of our study corresponding to the previously pub-
issue in collaborative ML and within the short time the con- lished review papers. Our study consisted of several research
cept has applied in different fields including medical imaging. questions and by answering the questions we illustrated the
Already many articles have been published on FL oriented current research lay-out in the field of medical image process-
medical image analysis and they successfully applied this ing using FL. In addition, several observations were discussed
unique data management technique in their research articles. according to the findings extracted from the literature. Table 1
FIGURE 1. Two basic frameworks: working and communication flow of decentralised federated learning in left and usual machine learning in right
for a hospital environment.
shows comparative analysis of our contribution and related data between multiple clients in a privacy preserving manner.
review articles, our study explored the demographic data, For a practical example, suppose a hospital environment that
FL architecture, privacy preserving concern, federated data produces some data, also has a model and some computer
management, and performance of FL models. We did not find resources that would like to tackle a specific problem by an
any article which has worked particularly on medical images. AI system. Moreover, the dataset in the institution has not
Consequently, this study can be an outline for future research been sufficient to train the model which is able to address
of FL application in medical imaging. The following are the this problem. Another hospital dealing with similar difficulty
key contributions of our paper: wants to work together on this promise where they have a
• We surveyed the insights of FL solely in medical image common goal and can solve a common task. However, both
research in a systematic way. hospitals have different data locally and they need to use each
• We provided the latest implementation, advancement, other’s data without sharing data directly. This collaborative
and tendencies toward medical image analysis research model training without sharing the data is exactly the purpose
using FL in different aspects. of FL.
• We presented and compared the performance of differ- In Fig. 1, we have presented FL in left and traditional ML
ent FL architectures used in the reviewed articles with framework in right to illustrate the fundamentals of both for
traditional ML models, which is the first of its kind. a hospital environment. In association with that, as supple-
• For incoming contributors we discussed open issues, mentary information we have listed necessary keywords and
challenges, and future direction of the research field. their explanations related to decentralized FL implementation
Rest of the article is structured with six sections. Basic in Table 2. Since FL consists of multiple sources of data,
FL concept is introduced in Section II. Section III described we have shown four clients in the figure. Each of the clients
the procedures of this review. The results of this investi- has few common duties, they collect the data from the hospi-
gation are presented through different research questions in tals, train them using the local ML models and estimate some
Section IV. Open issues and challenges are discussed in parameters. These parameters are sent to the central server
Section V. Besides, Section VI includes the limitation of this from every client, not the data itself. Once the central server
study. Lastly, the conclusion and future directions is provided has received all the local modes’ parameters, it aggregates
in Section VII. them and takes the weighted average, this is known as the
global model and sent back to all of the clients. By this
II. FEDERATED LEARNING process a learning round is completed and repeated for the
In this section we have described an overview of FL architec- next round.
ture. The concept of FL is not related directly to the ML com- However, a well known federated averaging algorithm
ponents, it is all about a data management process to share is FedAvg [13], proposed by Google in 2016, it calculates
B. SEARCH PROCESS further steps. After that we screened the articles for two times
Since FL was first presented in 2016, the search process of under two different conditions, first we gently explored the
the review was limited over the time period from 1 January title and abstract which helps to remove 96 articles, besides,
2017 to 30 June 2022. We discovered all of the common we extensively investigated the full text of rest 42, where
databases considered by previous researchers; for exam- another 25 papers have been disqualified. Finally, we discov-
ple, Science Direct, IEEE Xplore digital library, Springer ered 17 from 161 articles to hold our review.
Link, Wiley Online Library, SPIE digital library, ACM dig-
ital library, Multidisciplinary Digital Publishing Institute D. DATA EXTRACTION
(MDPI), Nature Portfolio, Taylor & Francis, and Google Data collection mostly involved in research questions of our
Scholar. The searching criteria is different across the plat- study, we extracted information in order to cover the ques-
forms, we used advanced options of each database to search tions perfectly. At first we created a spreadsheet and input
articles with Boolean ‘‘AND’’ and ‘‘OR’’ expressions. Our respective information headers on the top. We worked on the
study focused on the implementation of FL in healthcare 17 articles individually, each time all of the information has
image processing, so that we carefully avoided the other been gathered distinctively on the spreadsheet and they were
applications. The search phrases looked over the titles, used as our findings. The following data are extracted from
abstracts, and keywords in each of the databases. Fig. 3 every articles:
depicts the PRISMA flow diagram where whole statistics of 1) Document title, publication year, and journal/
article consideration in this review has been presented. After conference name.
the search operation, primarily collected articles have gone 2) Used datasets and their federated settings.
through a selection process, we have described them in the 3) The security or privacy protocol used for FL.
next sections. 4) The algorithms used to train ML models.
5) Performance of the FL model.
C. INCLUSION AND EXCLUSION CRITERIA
Literature search strategy is a big challenge while it is needed IV. RESULTS
to find too many papers, these circumstances are solved by We assembled this section following the research questions
a predefined inclusion and exclusion criteria in SLR. This that we described in Section III. In the upcoming sections,
might include limiting the search to only those that contain first we have presented the demographic analysis (also known
certain types of studies. However, the processes ensure the as numerical analysis) data along with the key contributions
task achievement properly, reduce the possibility of bias and limitations of each reference work in Table 4, thereafter
and protect the selection process from irrelevant research we answered the 12 questions successively.
documents. We implemented the inclusion and exclusion on
the collected articles from the databases to reach the exact A. OVERVIEW
materials that are seeking the readers. We emphasized the RQ1 What are possible applications of FL?
following points to include articles for final analysis: We found the application of FL in different research fields,
• Article that studied medical image datasets. such as, Diabetic Retinopathy (DR), MRI classification, can-
• ML model developed with the FL environment. cer, pneumonia, COVID-19 detection, and few more. These
• FL was the main focus in the findings (result topics are popular in medical image processing research
analysis/comparison). with conventional ML. Hence, FL also creates new scope to
Since we performed keyword search, the articles were research due to the privacy production efficiency which is
collected based on the words present in the paper, even if it essential for this particular imaging research.
was mentioned for a single time. Therefore, we excluded the In 2019, coronavirus disease hitted all over the world
articles that are not relevant and does not fulfill our scope and created a crisis regarding identification of COVID-19
based on the given criteria: samples. The RT-PCR test is the most reliable diagnosis
• Articles that used private dataset(s) for the ML model. method of the diseases, since inadequate testing kits and
• Studies that are not mainly focused on FL and medical some technical limitations, researchers tried to explore alter-
image data. native ways of COVID screening. Therefore, hundreds of
• Hybridization or modify the theme of FL, e.g., federated ML based automated and time saving COVID-19 detection
reinforcement learning. models have been presented within the last two years [33].
• Abstract, short article, any pre-print, any book or book ML based COVID analysis is mostly carried out by radi-
part. ological chest images, i.e., X-ray and CT images. Among
• Articles do not have a clear presentation of the results the contributions, FL also discussed and implemented several
using ML based performance measures (e.g., [85], [86]). detection models as data privacy was a big concern there.
The functionalities of inclusion and exclusion are observed In this study, we found six articles out of 17 were specifically
in Fig. 3. It shows the number of initially collected arti- worked on COVID-19 detection. Feki et al. [18] proposed a
cles from different databases is 161. We have removed the collaborative FL for COVID-19 screening from chest X-ray
duplicate articles from there and 138 articles were taken for images; they cooperated with multiple medical institutions
TABLE 4. Numerical data of considered articles for this study which include publication year, name of publisher, and data analysis method.
FIGURE 3. Article consideration process of this review according to PRISMA flow diagram.
without sharing their data. Similarly, Zhang et al. [24] and FL. Moreover, Hashmani et al. [30] applied FL on a series of
Yan et al. [29] used X-ray and CT image data for different dermoscopy images to classify nine different skin diseases.
Convolutional Neural Network (CNN) architectures in FL Nowadays, important internal organs of human body, such
settings. References [21], [25], and [32] also have contributed as lung, breast cancer are the leading causes of cancer death.
to the COVID-19 infection in a multinational way. However, A FL oriented lung cancer detection model has been proposed
during the pandemic such that artificial intelligence tools by Adnan et al. [28]. They demonstrated that their model
were not clinically used significantly to diagnose COVID-19, achieved acceptable performance while decentralized data
all of them were experimental operations and hopefully the configuration applied.
contribution will help in future initiative. One of the domains is Diabetic retinopathy (DR) analysis.
Millions of patients are suffering from fatal diseases world- Diabetes is a chronic disease that affects millions of people
wide, cancer is top of them. Researchers have shown early globally and uncontrolled diabetes can lead to serious damage
detection of cancer can save a large number of lives [34]. to the body’s system including eyes. DR is a common diabetic
Consequently, deep learning has emerged as a potential eye disease and the number one cause of vision loss and
of early cancer detection by the help of medical images. blindness in the world. It occurs when diabetes damages
It extracts features from the raw images and provides deci- the small blood vessels on the retina. In the primary care
sions regarding cancer detection with notable performance. clinic, those retinal images can be transmitted to an eye care
As a part of ML technique, FL has been considered in sev- specialist who investigates the image and then provides a
eral cancer diagnosis techniques, Fig. 4 shows 29.4% arti- consultation. However, these days deep learning algorithms
cles (five out of 17) of this review were formed on cancer can detect the DR within seconds with high accuracy. Lo et
detection. Researcher Polap and their team have published al. [16] analyzed the retinal images to classify the DR posi-
three research papers [17], [19], [22], all of them focused on tive and non-DR samples using the FL approach. In another
skin cancer detection with the FL environment. They used article, Zhou et al. [31] introduced a FL framework which
seven different skin marks (classes) to train the detection classifies five scalability categories of DR, 0 to 4 (No DR
models and successfully implemented the privacy protected to Proliferative DR).
TABLE 5. List of dataset used in federated medical imaging with references for quick access.
FIGURE 5. Different types data samples taken from respective dataset. (a) Tuberculosis infected chest X-ray image, (b) COVID-19 positive chest X-ray
image, (c) COVID-19 positive CT image, (d) Brain MRI image for autism spectrum disorders identification, (e) Optical coherence tomography angiography
(OCT-A) image of eye, (f) Light-sensitive tissue data of retinal blood vessel, (g) Skin dermoscopy image, and (h) Lung tissue image for cancer detection.
is dangerous for health, which particularly affects our lungs define DR and non-DR samples. Others: Adnan et al. [28]
and a key reason for lung cancer. According to our inves- used tissue image data, more specifically they proposed a
tigation, six articles [18], [23], [25], [27], [29], [32] have privacy guaranteed ML model where lung tissue images were
used chest X-ray images out of 17. The X-ray datasets considered to classify cancer. In addition, Li et al. [20] and
considered in the articles are Cohen JP, TB x-ray, CheX- Linardos et al. [26] both used MRI images for their models,
pert, Mendeley data, COVIDx, Chest X-ray (CXR), and brain and heart MRI data consequently. We have included all
COVID 2019 dataset. Moreover, Zhang et al. [24] proposed of the dataset name with their references for easy access in
a FL oriented COVID-19 detection model where chest X-ray Table 5.
and CT images were considered from three datasets, Qatar- RQ4 Are the number of data samples sufficient?
Dhaka data, COVID-CT, and Figure 1 dataset. Skin image In ML research, it is very established that the more data
data: MNIST: HAM10000 is one of the leading datasets we have for training purposes the better prediction we will
used in skin cancer detection research with deep learning get from the models. Also, chances of model overfitting will
techniques. This repository contains 10,015 dermatoscopic increase when we have a smaller dataset; so, it is always
images divided into seven different classes. Połap and the advisable to use a larger dataset. For our study, we analyzed
groups used the dataset in a series of articles [17], [19], [22] the 17 articles by the range of data samples used in the
with FL environment. Similar data has been used in [30], respective research papers. First, we will discuss the arti-
which was released under a dataset challenge competition cles which have used less than 1,000 samples. As Table 5
called ISIC 2019 and contains 25,331 dermoscopy images. shows, four papers [16], [18], [20] and [26] used very small
Retina image data: We found two different articles which amounts of data, their number of elements are 153, 216, 370,
have applied retinal images for their FL models. In [31], and 180 respectively. Since larger dataset belongs to better
Zhou et al. used a DR dataset consisting of 3,662 images. The potentiality of inside analysis, literally 153 data samples are
images are noted as five different scalability categories, from not technically sound. Next, within the 10,000 sample range,
no DR to extreme. Their goal was to classify the different seven papers [21], [24], [25], [27], [28], [31], and [32] used
levels of DR cases. In another article, Lo et al. [16] collected data samples between 2,109 and 6,284 and this number is
a total of 153 data samples from four different sources. quite good. Finally, we found six papers [17], [19], [22], [23],
Their deep learning model performed binary classification to [29], [30] all of them have used more than 10,000 images
individually. CheXpert, the largest dataset (overall 223,414 TABLE 6. ML and FL methods for medical image analysis.
CXR image) found under our investigation was considered
by Chakravarty et al. [23].
RQ5 Are non-IID data distribution considered?
There are two forms of federated frameworks exist accord-
ing to the data distribution, IID and non-IID. IID refers to
independent and identically distributed. This can be divided
into two parts, independence and identical distribution; inde-
pendence means that the value (data) of an example does
not affect the value of the other. This particular scenario is
commonly described by a coin flipping experiment, when
a coin is flipped, every time the result of both roles does
not depend on the other die. Identically distributed means
that the probability of any specific outcome is the same, for
example every time flipping a coin there is a 50% chance of
getting heads and a 50% chance of getting tails and that value
does not change while flipping a coin every time. Non-IID
technically inverse from both of the sides. While IID data
feature distribution is same across clients, the feature distribu-
tion is different in non-IID. The problem is quite common in
real life, for example, the appearance of the medical image
sample using different machines across different hospitals
may not align due to different imaging protocols. Therefore,
non-IID data settings mean values are dependent on each
other and there are overall trends between them. Generally CNN models, such as VGG16, Inception, ResNet18, and
in FL, local models are trained independently where data many more. VGG16 is a widely considered, reliable, and
distribution is hidden to each other and as a result data type pre-trained model; five out of 17 surveyed papers considered
and features could be vary client to client [6], [53], this this CNN model. This model is constructed by 16 layers,
variation makes non-IID data consideration important in FL 13 convolutional and 3 fully connected layers. Likewise,
research. However, in this study, we investigated FL used VGG19 is a 19 layers CNN model and used by Lo et al.
in medical image analysis. We observed FL data structure [16]. Residual Network (ResNet) is also a commonly used
is complicated, especially while the local clients’ data are algorithm that can be constructed by different numbers of
significantly different to each other. Our results show only layers, e.g., ResNet18 ([23], [26], [27], [29]), ResNet50 ([18],
four papers (we did not find sufficient explanation from [26] [24]), ResNet101 ( [24]). Other pre-trained CNN models are
and [21]) considered non-IID type along with IID data and Inception ( [17], [19], [22]), AlexNet ( [17], [19]). Besides,
the rest 13 did not talk about the content. In [18], Feki et al. CNN associated customised deep learning models have been
divided the collected dataset into four parts for clients data, used in several articles which is listed in Table 6. Li et al. [20]
for IID, they used an equal number of images from both sides, have used multi-layer perceptron (MLP) classifier which was
client and class. Moreover, for non-IID data they allocated a deep neural network constructed by one input, hidden, and
the samples among classes unequally by a ratio of 66% and output layers. Adnan et al. [28] performed image segmenta-
44%. Likewise, Adnan et al. [28] performed FL with IID and tion using a supervised learning approach called Multiple-
non-IID data individually where number of samples were Instance Learning (MIL) to train the local models.
different in each client under non-IID scenario.
D. FL IMPLEMENTATION
RQ7 Are any additional security methods implemented?
C. ML FRAMEWORK Data privacy and security both are not similar in prac-
RQ6 Which ML algorithms are used to train local models? tice; privacy covers the use (control, access, and regulate)
Although FL is the leading focused topic of this investi- of data, on the other hand, security defines the poten-
gation, ML techniques make the actual difference when it tial threats of unauthorized access and malicious attacks.
comes to figure out the overall performance of the models. FL mainly preserves the privacy concern since trained models
As usual in the FL framework, each client server data is of stakeholders are shared instead of sharing data directly.
trained by ML algorithms. Since our review is based on med- Still, sharing models can be vulnerable while parameters are
ical image data and this image analysis or computer vision exchanged between clients and servers and could be a possi-
task is mostly conducted by CNN oriented deep learning ble threat against system security [28]. Several additional pri-
models. However, to answer the question we searched each vacy preserving methods have been described in a systematic
of the considered articles and found a variety of using built-in review article [83]. However, we found few articles that have
considered additional initiatives for security in FL based med- in two ways, some articles were driven by formerly proposed
ical imaging research. Most of the articles (three out of four) build-in FL algorithms and others with basic concepts for
have used Differential Privacy (DP), it allows companies to aggregation. FedAvg (discussed in Section II), which is a
collect information about their users without compromising commonly used method in federated aggregation, as Table 6
the privacy of an individual and the ultimate goal is to be shows six articles considered this algorithm. Likewise, [20]
able to share information about a dataset with other people and [27] used two different federated algorithms named Fed,
without revealing individuals Personally Identifiable Infor- secure aggregation (SecAgg) respectively. SecAgg is a secure
mation (PII) from the dataset [9], [84]. Li et al. [20] used model aggregation for FL also proposed by Google in 2016.
two different mechanisms of DP, Gaussian and Laplace. They Połap and Woźniak [19] proposed a meta-heuristic search
defined the noise level α which varied from 0.001 to 1. based federated model, first they calculated average loss of
Similarly, Kaissis et al. [27] have applied both techniques all local models and then selected only models that have
and Adnan et al. [28] have used only Gaussian noise in their scored higher than the average loss for aggregation in server.
experiments. In addition, Połap et al. [17] used encryption Mainly all of them pursue fundamental concepts of FL but
and blockchain techniques to make their FL model more they implemented it in different ways. However, the described
secure. They proposed three different learning agents where above federated aggregation process has no impact on the
blockchain technique was applied in Data Management model performance, it is all about engineering the data dis-
Agent (DMA). According to their description, all patients tribution in a decentralized and collaborative manner.
data (images) have to be their unique IDs, once a request
arrive to analysis, it will check whether the ID is exist or not E. EXPERIMENTAL
into the database, if not then it will create an unique ID and a RQ10 What are the performance measures used in the
block to the blockchain, then transfer the ID to the database studies?
with the image. The final and startling step of any ML setting is to assess
RQ8 What types federated data partitioning are used? how good the model is through performance evaluation. The
Mainly three categories of FL described in the previous basic idea is to develop a ML model using some training
literature based on the training data distributions across the samples and test this train model on some other unknown
models. Among the three types, Federated Transfer Learn- data. However, the training error is not very useful for actual
ing (FTL) and Vertical FL (VFL) are rarely considered in evaluation, because it is easy to overfit the training data by
medical research; another one, Horizontal FL (HFL) was using complex models which do not generalize well to future
used widely. So, in a horizontal partition the client’s database samples. Contrariwise, testing error is the key metric since
holds many different customers but they are collecting all the it has a better approximation of the true performance of the
same type of data on those customers, in other words ‘‘same model on future samples. Thereby, we only considered testing
features, different samples’’. In vertical FL, it has different performance throughout our review. As we found from this
customers in both but there is an overlap of those customers investigation, classification and segmentation both tasks were
and they are collecting different features, more specifically used and that is why their performance were also evaluated
‘‘different features, different samples’’ [3], [9], [84]. How- in different ways. In Fig. 6 we have presented the number
ever, in this investigation we focused on the medical image of articles using different performance metrics. Most of the
research and found most of the articles were based on HFL. experiments (14 out of 17) were evaluated by accuracy. Recall
For example, Feki et al. [18] utilized HFL, they used a chest was the second commonly used measurement criteria, consid-
X-ray image dataset where features are same for all clients but ered by five articles. Area Under the Curve (AUC) score three
samples are different. Interestingly, Kaissis et al. [27] used and precision were used two times.
two different datasets for training and testing their FL models,
the fact is both datasets contain X-ray images (same features)
and different data. Only two articles we defined as VFL; [24]
have taken three datasets, two X-ray and one CT image based.
In the article the authors combined the both types of images
and used them to train and test models. In [25], the authors
used X-ray and ultrasound images for their federated models.
X-ray with CT or ultrasound images are technically different,
thus their features will be also different and they used various
data features in different clients which makes a VFL scenario.
RQ9 What are the federated frameworks used?
Table 6 represents respective deep learning architectures
that were used for training their local models (we discussed in
RQ6) and next the federated framework which was mainly the
aggression approach of the collected local models in the cen-
tral server. We observed federated mechanisms are executed FIGURE 6. Number of articles applied different performance metrics.
RQ11 How is the performance of the FL frameworks TABLE 7. Performance (accuracy) comparison, the results of each of the
reviewed articles and their respective compatible ML models with
reported? references.
This question is for getting the overview of performance
achieved by the FL based models in the 17 articles. Perfor-
mance assessment is the ultimate part of any ML model where
the conducted experiment is evaluated by different matrices.
Our investigation revealed 14 articles worked on data classi-
fication (binary and multi-class), one article worked on data
segmentation, and remaining two considered both of them
(listed in Table 4). Usually performance of classification tasks
is assessed by accuracy, it represents the report of correctly
identified samples from all of the data [14]. We divided
the performance into three categories according to the
achieved accuracy by the 17 studies: high (>=90%), medium
(80%-89%), and low (<80%). Table 7 summarised the perfor-
mance scores of all articles.
High: We found eight articles have an accuracy of 90% or
more. Feki et al. [18] performed binary classification, their
accuracy score is highest, for FL+VGG16 with data aug-
mentation model 94.4% and for FL+VGG16 93.57%. Połap
and Woźniak [19] used the inception91 classifier for the FL
model and obtained an accuracy of 91%. Score of [25], [30],
and [24] is not clear, they discussed the accuracy between
90-95%. Article [32] and [27] achieved an accuracy of
90.61% and 90% respectively. Yan et al. [29] presented their so, we extensively investigated dozens of research papers
results using sensitivity, their highest score was 91.26%. that analyzed medical images by traditional ML to explore
Medium: In [22], the author classified the images as dis- best matching options which was essential for a reliable
eases and not a disease, their proposed VGG based FL model comparison. Several conditions were applied in this criteria
achieved 89.82% accuracy. Lo et al. [16] performed classi- based on the structural and experimental similarity between
fication and segmentation both tasks on different datasets, ML and FL papers, such as we considered the papers which
the classification and segmentation accuracy for SFU dataset used similar datasets, algorithms, and performance measures.
were 88% and 85% respectively, classification accuracy of We expect maintaining this condition will ensure an accurate
OHSU dataset was 89%. In [26], Linardos et al. considered comparison among the two parties. Our investigation shows
AUC, the highest score achieved by the FL model was 89%. in Table 7 that all of the ML models have improved accuracy
Adnan et al. [28] conducted binary classification with an compared to their respective FL models in existing literature,
accuracy of 85%. more specifically we found better ML results against every
Low: The rest three articles performed multi-class classi- FL article. For instance, Połap et al. [17] have achieved accu-
fication, where Chakravarty et al. [23] 14, Połap et al. [17] racy with federated VGG16 70% and Inception 67%, how-
seven, and Li et al. [20] have considered four classes, their ever in ML part, Jain et al. [56] achieved 79.23% accuracy
acquired performances are AUC 80%, accuracy 70% and with Inception and Liu et al. [57] 87% with ResNet50; all
76% respectively. of three have considered the MNIST: HAM10000 dataset.
RQ12 How perform the FL approach compared to the Then as well Chakravarty et al. [23] has an AUC score of
conventional models? 80% with FL environment, but with same dataset and ML
Last research question explores the comparative perfor- algorithm article [67] and [68] have 86% and 87% AUC
mance analysis between FL and traditional ML image pro- respectively.
cessing research. This query is important while we want to
discuss the effect, contribution, and drawback of using FL in V. OPEN ISSUES AND CHALLENGES
medical image analysis. To answer this question we inten- FL is still a young research field, so it is difficult to draw
sively collected experiment results from both areas, 17 FL a remark on the rejection and acceptance. However, here
articles and their relevant conventional models. We already we have discussed the issues and challenges found in the
described the performance of the FL models in the previous reviewed articles regarding the application of FL in medical
question and here we will present the results of usual ML image. Generally, FL is invented to fulfill the privacy concern
models and then the comparative analysis. In Table 7 we sum- of private data, unfortunately it does not cover all potential
marized the performance of all articles in this review and we privacy threats [93]. However, we described model perfor-
presented the results of one or more similar articles opposite mance, data heterogeneity, and federated model efficiency
to each of the articles to make a comparison chart. To do issues found from the review below:
A. PRIVACY AND SECURITY the field a couple of years ago, the research method and mate-
Medical image data is created by personal information of rials need to be more easily accessible to future researchers.
patients and no one can share this data for AI applications Besides, we usually have a very controlled setting in research,
without reliable data protection. FL makes the data sharing but the question comes when we try to aim for huge datasets
between the different institutions with some privacy guar- to simulate in a real-world scenario.
antees by an advanced data management and model con-
struction process, all we have described in Section II. FL is VI. LIMITATIONS
different compared to ML models where the training process In this section we have admitted the limitations of this study.
is exposed to multiple parties, we do not know the motive First, we searched all prominent databases for article collec-
of every participant, it is an issue of trust among them; so tion where some journals and conference proceedings were
this additional communication increases the risk of leakage with subscription download policy. In some of such cases,
data via reverse engineering. Meanwhile, we observed two we could not grab the papers from the sources. Although,
further privacy measures used in federated medical image we tried for an alternative way, sent email to the correspond-
processing, differential privacy and secure aggregation. Dif- ing authors and requested for a full text of the required article.
ferential privacy involves adding carefully selected noise to However, still we failed to reach some of them ( [94] and
the outputs and can either be done by the individual clients or [95]) which is limiting the range of this survey. In addition,
server level, secure aggregation is a cryptographic technique our inclusion and exclusion process removed articles from
(e.g., blockchain technology), ensures the server can only see the initial fleet and preprint articles were not included there,
the aggregate of thousands of updates rather than individual besides we could not explore all of the searching databases
model updates. But the reality is every privacy mechanism so it could be possible that we missed to include any relevant
comes with a significant computational cost on the federation. article(s) on the topic. We did not experiment the models used
in the 17 articles under our supervision, for a precise review
B. DATA HETEROGENEITY that would have been more effective. Overall, it is difficult
Our investigation shows data heterogeneity could occur in to conclude this study with strong and tested historical evi-
two ways: number of samples are different (non-IID data) and dence, because our review was on very limited time and with
data features are different (VFL) among the clients. Usually, insufficient resources since FL was recently introduced.
the number of produced data in hospitals are not identical and
VII. CONCLUSION AND FUTURE DIRECTIONS
in FL, clients can have different data distributions, this uneven
distribution of data of client sides might provide opposing One of the most popular and effective diagnosis methods
gradient updates to the server which is challenging to tackle. is imaging techniques in the medical sector. This practice
Furthermore, practically features of federated datasets are not is increasing day by day and produces tons of image data.
the same in many cases, for instance X-ray and CT images AI has lots of opportunities in medical imaging using this
data can be used in two different clients which makes trouble data, but clinical use of AI and ML is very limited right now.
during aggregate the models parameters centrally in a FL In research direction, creating a publicly shareable image
setting. dataset is very difficult for the medical domain. The major
hurdle behind data share and collaboration is privacy issues
which are less prioritized in typical centralized models. Apart
C. OVERALL MODEL PERFORMANCE
from this concept, federated or distributed learning is differ-
The first impression of an AI model is the performance, how ent, here a data-driven learning model is shared not the data
accurately the model accomplished the task. High perfor- directly. In this study, we systematically reviewed the articles
mance accuracy makes the model more acceptable than a that considered FL in their ML based medical image research.
model that achieved a lower score. We previously discussed We elaborately discussed from every perspective, including
the federated model performance and compared them with demographic data, privacy appearance, datasets, FL charac-
traditional ML models (RQ12). Our findings show FL failed teristics, model implementation, and performance compar-
to perform better than ML with similar model structures, this ison. We noticed in one of our previous articles [33] that
drawback claims us to reevaluate the usefulness of FL in deep learning oriented COVID-19 detection using X-ray and
medical image. CT images has high accuracy, most of them achieved more
than 95% accuracy. We further observed a similar trend in
D. FEDERATED ARCHITECTURE this study, here COVID-19 detection research articles are the
Training a personalized model on each of the clients is not top scorers with FL mechanism. Although, the scores under
difficult in FL, problems emerge when all of the model FL are comparatively lower than general models, as listed in
output transfers to the central server and passes through an Table 7. Performance of other application domains with FL
aggregation process. We observed that the federated models models were also not mentionable. Besides, previous articles
presented in the reviewed articles are mostly theoretical and point out the implementation of federated models is relatively
less practically implemented, few articles included their open complex, it requires extra communication and maintenance
source code with their articles. Since the research started in trouble. However, it is favorable to become acquainted that
the research field got lots of attention and publications within classification is very low (as described in RQ11) which
a very little time, that is why we can hope for promising needs to be addressed in future research.
progress of FL in medical image analysis in future. At this • Federated models achieved satisfactory performance in
stage, we have summarized our findings below for future some cases but we cannot narrate as an alternative in the
direction to the researchers who are interested to contribute accuracy race with ML models.
in the field: • There are many weaknesses observed in current publi-
• Privacy concern is not fully solved in FL, however, cations (papers investigated in this review) of this field,
we cannot deny the importance of decentralized con- we included the article quality checklist and results in
cepts. It could be effective for collaborative ML in med- A. Future research could consider the quality analysis
ical image research, thus researchers should emphasize questionnaires for article quality improvement.
on the implementation of additional privacy protection No doubt FL is something that might be in the future horizon.
in a cost effective way. But still there are some technical problems, that challenges
• Datasets in the research are collected from various need to be tackled before FL is going to be applied vastly.
sources and for various purposes where experimental Best of our knowledge this is the first SLR and we believe
results could differ enormously. There is no particu- this review is a reflection of FL research in the area of medical
lar or benchmark dataset available in federated med- imaging.
ical imaging research; need to build some standard
datasets to avoid biased data and data heterogeneity APPENDIX
problems. QUALITY ANALYSIS
• Similarly no benchmark FL model has been presented Table 8 shows 12 Quality Questions (QQ) and scores, mostly
yet in this field, such that initiative will assist to build motivated from our previous article [14]. The goal of such
robust AL models for further research. inquiry was to check the basic quality of the articles published
• In truth, collaborative models data are prone to be het- in FL oriented medical imaging. However, each question
erogeneous, various classes of data are collaborating has one score for one article and a total score of 12 for an
there. But our results show the accuracy of multi-class individual. We considered the QQ answer in three forms of
scoring, ‘‘Yes (1)’’, ‘‘Partially Yes (0)’’, and ‘‘No (−1)’’. The [17] D. Połap, G. Srivastava, and K. Yu, ‘‘Agent architecture of an intelligent
article which clearly supports the question is Yes, partially medical system based on federated learning and blockchain technology,’’
J. Inf. Secur. Appl., vol. 58, May 2021, Art. no. 102748.
supported or where no clear answer found is Partially Yes, [18] I. Feki, S. Ammar, Y. Kessentini, and K. Muhammad, ‘‘Federated learning
and lastly fully disagreed is No. We investigated each of for COVID-19 screening from chest X-ray images,’’ Appl. Soft Comput.,
the articles to find the answer and assigned the scores in vol. 106, Jul. 2021, Art. no. 107330.
[19] D. Połap and M. Wozniak, ‘‘Meta-heuristic as manager in federated
respective columns. As the table interprets most of the articles
learning approaches for image processing purposes,’’ Appl. Soft Comput.,
have failed to fulfill the quality requirement. Highest score is vol. 113, Dec. 2021, Art. no. 107872.
9 out of 12 gained by [27], followed by six for [20] and [28] [20] X. Li, Y. Gu, N. Dvornek, L. H. Staib, P. Ventola, and J. S. Duncan,
both articles individually. The score indicates in some areas ‘‘Multi-site fMRI analysis using privacy-preserving federated learning and
domain adaptation: ABIDE results,’’ Med. Image Anal., vol. 65, Oct. 2020,
quality has been maintained poorly in the research papers, Art. no. 101765.
a reason could be that lots of attention made a rush on FL [21] D. Yang, Z. Xu, W. Li, A. Myronenko, H. R. Roth, S. Harmon, S. Xu,
research among the contributors. B. Turkbey, E. Turkbey, X. Wang, W. Zhu, G. Carrafiello, F. Patella,
M. Cariati, H. Obinata, H. Mori, K. Tamura, P. An, B. J. Wood, and D. Xu,
‘‘Federated semi-supervised learning for COVID region segmentation in
REFERENCES chest CT using multi-national data from China, Italy, Japan,’’ Med. Image
[1] A. Maier, S. Steidl, V. Christlein, and J. Hornegger, Medical Imaging Anal., vol. 70, May 2021, Art. no. 101992.
Systems: An Introductory Guide. Cham, Switzerland: Springer, 2018. [22] D. Poap, ‘‘Fuzzy consensus with federated learning method in medical
[2] Z. Zhang and E. Sejdic, ‘‘Radiological images and machine learning: systems,’’ IEEE Access, vol. 9, pp. 150383–150392, 2021.
Trends, perspectives, and prospects,’’ Comput. Biol. Med., vol. 108, [23] A. Chakravarty, A. Kar, R. Sethuraman, and D. Sheet, ‘‘Federated learning
pp. 354–370, May 2019. for site aware chest radiograph screening,’’ in Proc. IEEE 18th Int. Symp.
[3] A. Rauniyar, D. Haileselassie Hagos, D. Jha, J. E. Håkegård, U. Bagci, Biomed. Imag. (ISBI), Apr. 2021, pp. 1077–1081.
D. B. Rawat, and V. Vlassov, ‘‘Federated learning for medical applications: [24] W. Zhang, T. Zhou, Q. Lu, X. Wang, C. Zhu, H. Sun, Z. Wang, S. K. Lo,
A taxonomy, current trends, challenges, and future research directions,’’ and F.-Y. Wang, ‘‘Dynamic-fusion-based federated learning for COVID-19
2022, arXiv:2208.03392. detection,’’ IEEE Internet Things J., vol. 8, no. 21, pp. 15884–15891,
[4] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Nov. 2021.
‘‘Federated learning of Deep Networks using model averaging,’’ 2016, [25] A. Qayyum, K. Ahmad, M. A. Ahsan, A. Al-Fuqaha, and J. Qadir,
arXiv:1602.05629. ‘‘Collaborative federated learning for healthcare: Multi-modal COVID-19
[5] B. Pfitzner, N. Steckhan, and B. Arnrich, ‘‘Federated learning in a medical diagnosis at the edge,’’ IEEE Open J. Comput. Soc., vol. 3, pp. 172–184,
context: A systematic literature review,’’ ACM Trans. Internet Technol., 2022.
vol. 21, no. 2, pp. 1–31, Jun. 2021. [26] A. Linardos, K. Kushibar, S. Walsh, P. Gkontra, and K. Lekadir, ‘‘Feder-
[6] Prayitno, C.-R. Shyu, K. T. Putra, H.-C. Chen, Y.-Y. Tsai, K. S. Hossain, ated learning for multi-center imaging diagnostics: A simulation study in
W. Jiang, and Z.-Y. Shae, ‘‘A systematic review of federated learning in the cardiovascular disease,’’ Sci. Rep., vol. 12, no. 1, p. 3551, Mar. 2022.
healthcare area: From the perspective of data properties and applications,’’ [27] G. Kaissis, A. Ziller, J. Passerat-Palmbach, T. Ryffel, D. Usynin,
Appl. Sci., vol. 11, no. 23, p. 11191, Nov. 2021. A. Trask, I. Lima, J. Mancuso, F. Jungmann, M.-M. Steinborn, A. Saleh,
[7] M. G. Crowson, D. Moukheiber, A. R. Arévalo, B. D. Lam, S. Mantena, M. Makowski, D. Rueckert, and R. Braren, ‘‘End-to-end privacy preserv-
A. Rana, D. Goss, D. W. Bates, and L. A. Celi, ‘‘A systematic review of ing deep learning on multi-institutional medical imaging,’’ Nature Mach.
federated learning applications for biomedical data,’’ PLOS Digit. Health, Intell., vol. 3, pp. 473–484, Jun. 2021.
vol. 1, no. 5, May 2022, Art. no. e0000033. [28] M. Adnan, S. Kalra, J. C. Cresswell, G. W. Taylor, and H. R. Tizhoosh,
[8] A. Chowdhury, H. Kassem, N. Padoy, R. Umeton, and A. Karargyris, ‘‘Federated learning and differential privacy for medical image analysis,’’
‘‘A review of medical federated learning: Applications in oncol- Sci. Rep., vol. 12, no. 1, p. 1953, Feb. 2022.
ogy and cancer research,’’ in Brainlesion: Glioma, Multiple Scle-
[29] B. Yan, J. Wang, J. Cheng, Y. Zhou, Y. Zhang, Y. Yang, L. Liu, H. Zhao,
rosis, Stroke and Traumatic Brain Injuries. Springer, Jul. 2022,
C. Wang, and B. Liu, ‘‘Experiments of federated learning for COVID-19
doi: 10.1007/978-3-031-08999-2_1.
chest X-ray images,’’ Advances in Artificial Intelligence and Security.
[9] D. C. Nguyen, Q.-V. Pham, P. N. Pathirana, M. Ding, A. Seneviratne, Springer, 2021, pp. 41–53.
Z. Lin, O. Dobre, and W.-J. Hwang, ‘‘Federated learning for smart health-
[30] M. A. Hashmani, S. M. Jameel, S. S. H. Rizvi, and S. Shukla, ‘‘An adaptive
care: A survey,’’ ACM Comput. Surv., vol. 55, no. 3, pp. 1–37, Apr. 2023.
federated machine learning-based intelligent system for skin disease detec-
[10] A. Naeem, T. Anees, R. A. Naqvi, and W.-K. Loh, ‘‘A comprehensive anal-
tion: A step toward an intelligent dermoscopy device,’’ Appl. Sci., vol. 11,
ysis of recent deep and federated-learning-based methodologies for brain
no. 5, p. 2145, Feb. 2021.
tumor diagnosis,’’ J. Personalized Med., vol. 12, no. 2, p. 275, Feb. 2022.
[31] S. Zhou, B. Landman, Y. Huo, and A. Gokhale, ‘‘Communication-efficient
[11] J. Xu, B. S. Glicksberg, C. Su, P. Walker, and J. Bian, ‘‘Federated learning
federated learning for multi-institutional medical image classification,’’ in
for healthcare informatics,’’ J. Healthc Inform. Res., vol. 5, pp. 1–19,
Proc. SPIE, 2022, pp. 6–12.
Dec. 2021.
[12] N. Rieke, J. Hancox, W. Li, F. Milletarì, H. R. Roth, S. Albarqouni, [32] M. A. Salam, S. Taha, and M. Ramadan, ‘‘COVID-19 detection using
S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, S. Ourselin, federated machine learning,’’ PLoS ONE, vol. 16, no. 6, Jun. 2021,
M. Sheller, R. M. Summers, A. Trask, D. Xu, M. Baust, and M. J. Cardoso, Art. no. e0252573.
‘‘The future of digital health with federated learning,’’ npj Digit. Med., [33] M. F. Sohan, A. Basalamah, and M. Solaiman, ‘‘COVID-19 detection
vol. 3, no. 1, p. 119, Sep. 2020. using machine learning: A large scale assessment of X-ray and CT image
[13] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, datasets,’’ J. Electron. Imag., vol. 31, no. 4, Mar. 2022, Art. no. 041212.
‘‘Communication-efficient learning of deep networks from decentralized [34] K. Munir, H. Elahi, A. Ayub, F. Frezza, and A. Rizzi, ‘‘Cancer diagnosis
data,’’ 2016, arXiv:1602.05629. using deep learning: A bibliographic review,’’ Cancers, vol. 11, no. 9,
[14] M. F. Sohan and A. Basalamah, ‘‘A systematic literature review and p. 1235, Aug. 2019.
quality analysis of Javascript malware detection,’’ IEEE Access, vol. 8, [35] M. Heisler, F. Chan, Z. Mammo, C. Balaratnasingam, P. Prentasic,
pp. 190539–190552, 2020. G. Docherty, M. Ju, S. Rajapakse, S. Lee, A. Merkur, A. Kirker, D. Albiani,
[15] D. Moher, A. Liberati, J. Tetzlaff, and D. G. Altman, ‘‘Preferred reporting D. Maberley, K. Bailey Freund, M. Faisal Beg, S. Loncaric, M. V. Sarunic,
items for systematic reviews and meta-analyses: The Prisma statement,’’ and E. V. Navajas, ‘‘Deep learning vessel segmentation and quantification
PLoS Med., vol. 6, no. 7, p. 264, 2009. of the foveal avascular zone using commercial and prototype OCT—A
[16] J. Lo, T. T. Yu, D. Ma, P. Zang, J. P. Owen, Q. Zhang, R. K. Wang, platforms,’’ 2019, arXiv:1909.11289.
M. F. Beg, A. Y. Lee, Y. Jia, and M. V. Sarunic, ‘‘Federated learning for [36] P. Tschandl, C. Rosendahl, and H. Kittler, ‘‘The HAM 10000 dataset, a
microvasculature segmentation and diabetic retinopathy classification of large collection of multi-source dermatoscopic images of common pig-
OCT data,’’ Ophthalmol. Sci., vol. 1, no. 4, Dec. 2021, Art. no. 100069. mented skin lesions,’’ Sci. Data, vol. 5, no. 1, Aug. 2018.
[37] J. P. Cohen, P. Morrison, L. Dao, K. Roth, T. Q. Duong, and M. Ghassemi, [58] D. Das, K. C. Santosh, and U. Pal, ‘‘Truncated inception net: COVID-19
‘‘COVID-19 image data collection: Prospective predictions are the future,’’ outbreak screening using chest X-rays,’’ Phys. Eng. Sci. Med., vol. 43,
2020, arXiv:2006.11988. no. 3, pp. 915–925, Sep. 2020.
[38] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. Wáng, P.-X. Lu, and [59] N. S. Punn and S. Agarwal, ‘‘Automated diagnosis of COVID-19 with
G. Thoma, ‘‘Two public chest X-ray datasets for computer-aided screen- limited posteroanterior chest X-ray images using fine-tuned deep neu-
ing of pulmonary diseases,’’ Quant. Imag. Med. Surg., vol. 4, no. 6, ral networks,’’ Int. J. Speech Technol., vol. 51, no. 5, pp. 2689–2702,
pp. 475–477, Nov. 2014, Accessed: Oct. 18, 2022. [Online]. Available: May 2021.
https://qims.amegroups.com/article/view/5132/6030 [60] G. Chowdhary, N. K. Toppo, and D. Das, ‘‘Skin lesion diagnosis in
[39] A. Di Martino et al., ‘‘The autism brain imaging data exchange: Towards healthcare-cyber physical system,’’ in Proc. IEEE Int. Conf. Innov. Technol.
a large-scale evaluation of the intrinsic brain architecture in autism,’’ Mol. (INOCON), Nov. 2020, pp. 1–6.
Psychiatry, vol. 19, no. 6, pp. 659–667, Jun. 2013. [61] M. Arshad, M. A. Khan, U. Tariq, A. Armghan, F. Alenezi, M. Younus
[40] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, Javed, S. M. Aslam, and S. Kadry, ‘‘A computer-aided diagnosis system
B. Haghgoo, R. Ball, K. Shpanskaya, J. Seekins, D. A. Mong, S. S. Halabi, using deep learning for multiclass skin lesion classification,’’ Comput.
J. K. Sandberg, R. Jones, D. B. Larson, C. P. Langlotz, B. N. Patel, Intell. Neurosci., vol. 2021, pp. 1–15, Dec. 2021.
M. P. Lungren, and A. Y. Ng, ‘‘CheXpert: A large chest radiograph dataset [62] M. R. Ahmed, Y. Zhang, Y. Liu, and H. Liao, ‘‘Single volume image
with uncertainty labels and expert comparison,’’ in Proc. AAAI Conf. Artif. generator and deep learning-based ASD classification,’’ IEEE J. Biomed.
Intell., 2019, pp. 590–597, Accessed: Oct. 18, 2022. [Online]. Available: Health Informat., vol. 24, no. 11, pp. 3044–3054, Nov. 2020.
https://ojs.aaai.org/index.php/AAAI/article/view/3834 [63] X. Li, N. C. Dvornek, J. Zhuang, P. Ventola, and J. S. Duncan, ‘‘Brain
[41] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, biomarker interpretation in ASD using deep learning and fMRI,’’ in Med-
Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. A. Emadi, M. B. Reaz, ical Image Computing and Computer Assisted Intervention—MICCAI.
and M. T. Islam, ‘‘Can ai help in screening viral and COVID-19 pneumo- Springer, 2018, pp. 206–214.
nia?’’ IEEE Access, vol. 8, pp. 132665–132676, 2020. [64] M. Kumar, M. Alshehri, R. AlGhamdi, P. Sharma, and V. Deep,
[42] X. Yang, X. He, J. Zhao, Y. Zhang, S. Zhang, and P. Xie, ‘‘COVID-CT- ‘‘A DE-ANN inspired skin cancer detection approach using fuzzy C-means
dataset: A CT scan dataset about COVID-19,’’ 2020, arXiv:2003.13865. clustering,’’ Mobile Netw. Appl., vol. 25, no. 4, pp. 1319–1329, Aug. 2020.
[43] Agchung. Agchung/Figure1-COVID-Chestxray-Dataset: Figure 1 [65] F. Afza, M. Sharif, M. A. Khan, U. Tariq, H.-S. Yong, and J. Cha, ‘‘Mul-
COVID-19 Chest X-Ray Dataset Initiative. GitHub. Accessed: ticlass skin lesion classification using hybrid deep features selection and
Oct. 18, 2022. [Online]. Available: https://github.com/agchung/Figure1- extreme learning machine,’’ Sensors, vol. 22, no. 3, p. 799, Jan. 2022.
COVID-chestxray-dataset [66] U. Bhimavarapu and G. Battineni, ‘‘Skin lesion analysis for melanoma
[44] V. M. Campello et al., ‘‘Multi-centre, multi-vendor and multi-disease detection using the novel deep learning model fuzzy GC-SCNN,’’ Health-
cardiac segmentation: The M&Ms challenge,’’ IEEE Trans. Med. Imag., care, vol. 10, no. 5, p. 962, May 2022.
vol. 40, no. 12, pp. 3543–3554, Dec. 2021. [67] L. Seyyed-Kalantari, G. Liu, M. McDermott, I. Y. Chen, and M. Ghassemi,
[45] O. Bernard et al., ‘‘Deep learning techniques for automatic MRI cardiac ‘‘CheXclusion: Fairness gaps in deep chest X-ray classifiers,’’ in Proc.
multi-structures segmentation and diagnosis: Is the problem solved?’’ Biocomputing, vol. 2021, 2020, pp. 232–243.
IEEE Trans. Med. Imag., vol. 37, no. 11, pp. 2514–2525, Nov. 2018. [68] A. Mitra, A. Chakravarty, N. Ghosh, T. Sarkar, R. Sethuraman, and
[46] D. Kermany, K. Zhang, and M. Goldbaum. (Jan. 1, 2018). Large Dataset D. Sheet, ‘‘A systematic search over deep convolutional neural network
of Labeled Optical Coherence Tomography (OCT) and Chest X-Ray architectures for screening chest radiographs,’’ in Proc. 42nd Annu. Int.
Images. Mendeley Data. Accessed: Oct. 18, 2022. [Online]. Available: Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Jul. 2020, pp. 1225–1228.
https://data.mendeley.com/datasets/rscbjbr9sj/3 [69] S. Thakur and A. Kumar, ‘‘X-ray and CT-scan-based automated detec-
[47] Project-MONAI. Project-Monai/Monai: AI Toolkit for Healthcare tion and classification of covid-19 using convolutional neural net-
Imaging. GitHub. Accessed: Oct. 18, 2022. [Online]. Available: works (CNN),’’ Biomed. Signal Process. Control, vol. 69, Aug. 2021,
https://github.com/Project-MONAI/MONAI/ Art. no. 102920.
[48] Medical Segmentation Decathlon. Accessed: Oct. 18, 2022. [Online]. [70] R. Kundu, R. Das, Z. W. Geem, G.-T. Han, and R. Sarkar, ‘‘Pneumonia
Available: http://medicaldecathlon.com/ detection in chest X-ray images using an ensemble of deep learning mod-
[49] K. Tomczak, P. Czerwinska, and M. Wiznerowicz, ‘‘Review the cancer els,’’ PLoS ONE, vol. 16, no. 9, Sep. 2021, Art. no. e0256630.
genome atlas (TCGA): An immeasurable source of knowledge,’’ Wspól- [71] V. Chouhan, S. K. Singh, A. Khamparia, D. Gupta, P. Tiwari, C. Moreira,
czesna Onkologia, vol. 1A, pp. 68–77, Jan. 2015. R. Damasevicius, and V. H. C. de Albuquerque, ‘‘A novel transfer learning
[50] L. Wang, Z. Q. Lin, and A. Wong, ‘‘COVID-Net: A tailored deep convolu- based approach for pneumonia detection in chest X-ray images,’’ Appl.
tional neural network design for detection of COVID-19 cases from chest Sci., vol. 10, no. 2, p. 559, Jan. 2020.
X-ray images,’’ Sci. Rep., vol. 10, no. 1, pp. 1–12, Nov. 2020. [72] N. Dey, Y.-D. Zhang, V. Rajinikanth, R. Pugalenthi, and N. S. M. Raja,
[51] ISIC Challenge. Accessed: Oct. 18, 2022. [Online]. Available: ‘‘Customized VGG19 architecture for pneumonia detection in chest
https://challenge.isic-archive.com/landing/2019/ X-rays,’’ Pattern Recognit. Lett., vol. 143, pp. 67–74, Mar. 2021.
[52] J. Y. Choi, T. K. Yoo, J. G. Seo, J. Kwak, T. T. Um, and T. H. Rim, ‘‘Multi- [73] L. Girard, J. Rodriguez-Canales, C. Behrens, D. M. Thompson,
categorical deep learning neural network to classify retinal images: A pilot I. W. Botros, H. Tang, Y. Xie, N. Rekhtman, W. D. Travis, I. I. Wistuba,
study employing small database,’’ PLoS ONE, vol. 12, no. 11, Nov. 2017, J. D. Minna, and A. F. Gazdar, ‘‘An expression signature as an aid to the
Art. no. e0187336. histologic classification of non–small cell lung cancer,’’ Clin. Cancer Res.,
[53] H. Zhu, J. Xu, S. Liu, and Y. Jin, ‘‘Federated learning on non-IID vol. 22, no. 19, pp. 4880–4889, 2016.
Data: A survey,’’ Neurocomputing, vol. 465, pp. 371–390, Sep. 2021, [74] S. Dong, Q. Yang, Y. Fu, M. Tian, and C. Zhuo, ‘‘RCoNet: Deformable
Accessed: Oct. 18, 2022. [Online]. Available: https://www.sciencedirect. mutual information maximization and high-order uncertainty-aware learn-
com/science/article/abs/pii/S0925231221013254 ing for robust COVID-19 detection,’’ IEEE Trans. Neural Netw. Learn.
[54] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, Syst., vol. 32, no. 8, pp. 3401–3411, Aug. 2021.
S. Patel, D. Ramage, A. Segal, and K. Seth, ‘‘Practical secure aggregation [75] A. Bar-El, D. Cohen, N. Cahan, and H. Greenspan, ‘‘Improved cycle-
for federated learning on user-held data,’’ 2016, arXiv:1611.04482. gan with application to COVID-19 classification,’’ in Proc. SPIE, 2021,
[55] M. Heisler, S. Karst, J. Lo, Z. Mammo, T. Yu, S. Warner, D. Maberley, pp. 296–305.
M. F. Beg, E. V. Navajas, and M. V. Sarunic, ‘‘Ensemble deep learning [76] F. Ucar and D. Korkmaz, ‘‘COVIDiagnosis-Net: Deep bayes-squeezenet
for diabetic retinopathy detection using optical coherence tomography based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray
angiography,’’ Transl. Vis. Sci. Technol., vol. 9, no. 2, p. 20, Apr. 2020. images,’’ Med. Hypotheses, vol. 140, Jul. 2020, Art. no. 109761.
[56] S. Jain, U. Singhania, B. Tripathy, E. A. Nasr, M. K. Aboudaif, and [77] H. El-Khatib, D. Popescu, and L. Ichim, ‘‘Deep Learning–Based methods
A. K. Kamrani, ‘‘Deep learning-based transfer learning for classification for automatic diagnosis of skin lesions,’’ Sensors, vol. 20, no. 6, p. 1753,
of skin cancer,’’ Sensors, vol. 21, no. 23, p. 8142, Dec. 2021. Mar. 2020.
[57] Y. Liu, Z. Wang, Z. Li, J. Li, T. Li, P. Chen, and R. Liang, ‘‘Multiscale [78] H. Nahata and S. P. Singh, ‘‘Deep learning solutions for skin cancer
ensemble of convolutional neural networks for skin lesion classification,’’ detection and diagnosis,’’ in Learning and Analytics in Intelligent Systems.
IET Image Process., vol. 15, no. 10, pp. 2309–2318, Aug. 2021. Springer, 2020, pp. 159–182.
[79] H. Fu, Y. Xu, S. Lin, D. W. Kee Wong, and J. Liu, ‘‘Deepvessel: Retinal ves- [92] P. R. Bassi and R. Attux, ‘‘A deep convolutional neural network for
sel segmentation via deep learning and conditional random field,’’ in Med- COVID-19 detection using chest X-rays,’’ Res. Biomed. Eng., vol. 38,
ical Image Computing and Computer-Assisted Intervention—MICCAI. no. 1, pp. 139–148, Mar. 2022.
Springer, 2016, pp. 132–139. [93] K. M. J. Rahman, F. Ahmed, N. Akhter, M. Hasan, R. Amin, K. E. Aziz,
[80] S. S. M. Sheet, T.-S. Tan, M. A. As’ari, W. H. W. Hitam, and J. S. Y. Sia, A. K. M. M. Islam, M. S. H. Mukta, and A. K. M. N. Islam, ‘‘Challenges,
‘‘Retinal disease identification using upgraded CLAHE filter and trans- applications and design aspects of federated learning: A survey,’’ IEEE
fer convolution neural network,’’ ICT Exp., vol. 8, no. 1, pp. 142–150, Access, vol. 9, pp. 124682–124700, 2021.
Mar. 2022. [94] K. Guo, T. Chen, S. Ren, N. Li, M. Hu, and J. Kang, ‘‘Federated learning
[81] C. Luo, C. Shi, X. Li, and D. Gao, ‘‘Cardiac MR segmentation based empowered real-time medical data processing method for smart health-
on sequence propagation by deep learning,’’ PLoS ONE, vol. 15, no. 4, care,’’ IEEE/ACM Trans. Comput. Biol. Bioinf., early access, Jun. 23, 2022,
Apr. 2020, Art. no. e0230415. doi: 10.1109/TCBB.2022.3185395.
[82] S. Tripathi, T. S. Sharan, S. Sharma, and N. Sharma, ‘‘An augmented deep [95] S. Sakib, M. M. Fouda, Z. Md Fadlullah, and N. Nasser, ‘‘On COVID-19
learning network with noise suppression feature for efficient segmentation prediction using asynchronous federated learning-based agile radiograph
of magnetic resonance images,’’ IETE Tech. Rev., vol. 39, no. 4, pp. 1–14, screening booths,’’ in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2021,
2021. pp. 1–6.
[83] L. Witt, M. Heyer, K. Toyoda, W. Samek, and D. Li, ‘‘Decentral and incen-
tivized federated learning frameworks: A systematic literature review,’’
2022, arXiv:2205.07855.
[84] T. R. Gadekallu, Q.-V. Pham, T. Huynh-The, S. Bhattacharya,
P. K. R. Maddikunta, and M. Liyanage, ‘‘Federated learning for big
MD FAHIMUZZMAN SOHAN received the
data: A survey on opportunities, applications, and future directions,’’
2021, arXiv:2110.04160. B.Sc. degree in software engineering from
[85] H. R. Roth et al., ‘‘Federated learning for breast density classification: A Daffodil International University, Bangladesh,
real-world implementation,’’ in Domain Adaptation and Representation in 2019. He has published several papers in reputed
Transfer, and Distributed and Collaborative Learning. Springer, 2020, journals and conferences. His research interests
pp. 181–191. include machine learning, computer vision, and
[86] Q. Dou et al., ‘‘Federated deep learning for detecting COVID-19 lung image processing.
abnormalities in CT: A privacy-preserving multinational validation study,’’
npj Digit. Med., vol. 4, no. 1, p. 60, Mar. 2021.
[87] N. N. Thilakarathne, G. Muneeswari, V. Parthasarathy, F. Alassery,
H. Hamam, R. Kumar Mahendran, and M. Shafiq, ‘‘Federated learning
for privacy-preserved medical Internet of Things,’’ Intell. Autom. Soft
Comput., vol. 33, no. 1, pp. 157–172, 2022.
[88] J. Born, N. Wiedemann, M. Cossio, C. Buhre, G. Brändle, K. Leidermann,
J. Goulet, A. Aujayeb, M. Moor, B. Rieck, and K. Borgwardt, ‘‘Accel- ANAS BASALAMAH received the M.Sc. and
erating detection of lung pathologies with explainable ultrasound image
Ph.D. degrees from Waseda University, Tokyo, in
analysis,’’ Appl. Sci., vol. 11, no. 2, p. 672, 2021.
2006 and 2009, respectively. He was a Postdoc-
[89] P. Patel. (Sep. 17, 2020). Chest X-ray (COVID-19 & Pneumonia). Kag-
toral Researcher with The University of Tokyo and
gle. Accessed: Oct. 30, 2022. [Online]. Available: https://www.kaggle.
com/datasets/prashant268/chest-xray-covid19-pneumonia the University of Minnesota, in 2010 and 2011,
[90] Srk. (Jun. 24, 2021). Novel Corona Virus 2019 Dataset. Kaggle. respectively. He is currently an Associate Profes-
Accessed: Oct. 30, 2022. [Online]. Available: https://www.kaggle.com/ sor with the Department of Computer Engineering,
datasets/sudalairajkumar/novel-corona-virus-2019-dataset Umm Al-Qura University. His research interests
[91] F. Sattler, K.-R. Müller, and W. Samek, ‘‘Clustered federated learn- include embedded networked sensing, smart cities,
ing: Model-agnostic distributed multitask optimization under privacy ubiquitous computing, participatory, and urban
constraints,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 8, sensing.
pp. 3710–3722, Aug. 2021.