Radia Fedr

CycleGAN with Spatial Self-Attention in Federated
Learning for Medical Image Translation in

Radiotherapy
1st HHHH 2nd BBB 3rd AAA
Data Science and AI Dept. (University) Data Science and AI Dept. (University) Data Science and AI Dept. (University)
AAAAA,JJJJ AAAAA,JJJJ AAAAA,JJJJ
[email protected] [email protected] [email protected]
OrchidID:0000-0000-0000-0000 OrchidID:0000-0000-0000-0000 OrchidID:0000-0000-0000-0000
4rd TTTTT
Data Science and AI Dept. (University)
AAAAA,JJJJ
[email protected]
OrchidID:0000-0000-0000-0000
Abstract—Globally, cancer remains a leading cause of death, tissues. This procedure requires the patient to undergo a CT
affecting millions of people each year. Accurate medical imaging scan that exposes the patient to excessive harmful ionizing ra-
is crucial for effective planning of radiotherapy, however, there diation that could potentially increase the patient’s probability
are several problems with this technique which include data
storage, motion artifacts in Magnetic Resonance Images (MRI) to develop other cancerous tumors with repeated scans. Our
and maintaining patient privacy and security. Additionally, medical image translation approach aims to protect the patients
repeated exposure to radiation from Computed Tomography from CT scan side effects by undergoing MRI scans instead,
(CT) scans during treatment planning can put patients at more which are less harmful, and then using the images as input for
risk. The recent improvement in automated cancer image-to- our model to yield the equivalent CT scan of the patient.
image translation using deep learning methods has reached a
human-level performance requiring a large amount of annotated When Generative Adversarial Networks (GAN) [1] was first
data assembled in one location, yet, finding such conditions introduced, it transformed the field of image translation as its
usually is not feasible. Recently, federated learning (FL) has been proposed architecture that enabled the creation of realistic syn-
proposed to train decentralized models in a privacy-preserved thetic images, instead of the commonly used simple mapping.
fashion depending on labeled data at the client-side, which is A GAN consists of a generator that creates realistic synthetic
usually not available and costly. To address this, we propose
RadiaSync to overcome medical and technical obstacles that arise images and a discriminator that classifies generated images
from medical imaging prior to radiotherapy. This includes the as real or fake. This highly competitive environment returns
problems we mentioned previously. RadiaSync inspects the use highly convincing outputs [2].
of Cycle-Consistent GAN (CycleGAN) architecture to translate The superior efficiency of Generative Adversarial Networks
MRI scans to CT scans within a FL environment, ensuring patient (GANs) [1], in comparison to other traditional methods, in
privacy and collaborative learning across different institutions.
Each client autonomously train the model using their respective the field of medical image translation was first introduced by
data, then the locally trained weights are subsequently sent back Denck et. al [3]. GAN architecture was utilized to enhance the
to the server. Federated Averaging (FedAvg) ,which is used in quality of MRI scans, by synthesizing MRIs with different
the central server, is an essential step in this process because it characteristics, such as contrast, scanner type, scan location
is one of the aggregation algorithms used in this decentralized etc. It offered a solution to the challenge of standardizing MRI
training methodology to combine the local model weights into a
central model, then send the aggregated weight to other clients. scans across different settings.
Index Terms—Federated Learning, Radiotherapy, MRI-to-CT CycleGAN, a specialized GAN for image translation, uti-
Translation, CycleGAN, Deep Learning, Spatial Self-Attention, lizes two sets of generator and discriminator models, each
Artificial Intelligence in Healthcare, Generative Adversarial Net- corresponding to a different domain. The generators translate
works. images between domains; for example, Generator-A translates
images from Domain-B to Domain-A, and Generator-B per-
I. I NTRODUCTION forms the opposite. The discriminators assess these generated
The healing journey of a cancer patient involves the detailed images to verify their authenticity within their particular do-
localization of the tumor for the radiologist to prescribe the mains. This architecture allows CycleGAN to translate images
beam configuration required for the patient in radiotherapy, en- without the need for paired training data [4].
suring maximal beneficence and minimal exposure to healthy This approach was basis for further investigation in a
research [5] aimed at testing the efficiency of Cycle-Consistent time model updates when new data is introduced. Since no
GAN, employing Attention-Awareness, in converting MRI actual patient data is being exchanged between servers, the
scans to CT scans. The novelty of this approach was applying paradigm provides all stated advantages with reduced cost and
an attention-gated classifier, multi-scale feature modulation, high bandwidth efficiency.
and a layer for efficient data compression and reconstruction Previous research [8] proposed a FL pipeline that incor-
with the already existing CycleGAN architecture. The integra- porated CycleGAN for translating brain images from one
tion of an attention mechanism layer within the discriminator MRI modality to another. However it was conducted using
network, allows the model to focus on specific regions of the the infamous CycleGAN architecture for translating between
images for more accurate translations. This further supported distinct MRI modalities. Due to our more complex image
the choice of CycleGAN as the design for our proposed translation approach, translating between different medical
pipeline. image modalities, we introduced a spatial self-attention mech-
Observing this proposal from a medical perspective, the anism within the CycleGAN architecture. This mechanism
environment or demographical area of a group of patients allows the model to get a better sight of long-range relations,
can affect certain characteristics in the MRI scans of these and at the same time, to pay more attention to the main features
patients, and this poses the issue that the model would be [9] yielding more accurate translations as well as images with
limited to predicting images of a specific race or patients better visual interpretability.
from a specific area. Accordingly, to enhance the compre- In this paper we propose RadiaSync, where the fundamental
hensive ability of the model, collaborative learning within components are medical image translation, utilizing accu-
different medical institutions across various physical borders rate deep learning architectures, and a decentralized learning
is required. This approach, however, demonstrates another paradigm to ensure patient privacy without limiting the ability
limitation of data storage centralization. Sensitive medical of cooperative learning. Given the lack of extensive research,
data cannot be shared and stored in a central global server, further investigation in the area of FL in the medical field
and if that was possible different scanning machines and is encouraged, leading to the proposal of this research. Our
image formats would introduce the unified format obstacle. contributions to previous research include the following:
All mentioned restraints are addressed when employing the • The translation of MRI scans to CT scans in an FL
CycleGAN in an FL environment. framework
The, centralized, FL environment [6] proposed in this re- • Leveraging spatial self-attention within the CycleGAN
search includes a central server and multiple local servers, that structure
would be equivalent to the number of participating medical
The multidisciplinary approach employed in this paper aims
institutions in the network. Every server in this paradigm
to address common issues in medical imaging, such as noise
possesses a CycleGAN architecture, as its local model. The
and motion artifacts. It also ensures decentralized data storage,
central server, initially, distributes its base model to all client
increasing data privacy while improving image quality and
servers where the local servers initialize their weights accord-
reliability.
ingly. Autonomous local training occurs in every client server,
on their local data, for a predetermined number of epochs. II. M ETHODOLOGY
When the set number of epochs is achieved, the locally trained
weights are subsequently sent back to the server, where they A. CycleGAN Architecture
are aggregated using algorithms such as FedAvg. FedAvg com- As previously explained, the CycleGAN architecture incor-
bines local model weights using stochastic gradient descent porates two distinct generators, the first converts MRIs into
(SGD), synchronizing learning rates and optimization epochs CT scans while the second performs the reverse operation.
among all participating devices. The aggregated average of As well as two distinct discriminators which are responsible
received weights is set as the weights of the central server. for validating the authenticity of the generated images, one
The cycle of distribution, training, and aggregation occurs per discriminator for each image modality. The generator attempts
round, until a fixed number of rounds is reached or until the to synthesize realistic images that the discriminators would not
model converges [7]. In this scenario, every client server and be able to detect, while the discriminators try to enhance their
its local data represent the local model possessed by every accuracy by increasing their detection strength. Generators are
participating medical institution, and their subsequent local the core of our research, since our main goal is to generate
data. the most accurate images possible. The following architectures
This FL approach facilitates collaborative learning across have been inspired by previous work [8], the implementation
different institutions with a specific shared goal while preserv- of CycleGAN was based on up-sampling and down-sampling
ing patient data privacy, through decentralized data storage, convolutional layers.
and addressing overfitting and restrictions followed by the 1) Generator: The basic unit of the CycleGAN generators
necessity of unifying the medical image format. Note that FL is UNet architecture. The input image is processed, as a tensor
paradigm is highly scalable, allowing any medical center to of normalized size, through five layers of convolutional down
join the network at any given time, and supports continuous sampling along with Leaky ReLU and 2D Instance Normal-
learning, meaning the model is adaptive and provides real- ization, to reduce spatial dimensions and increase channel
2
depth. This is displayed in the first five layers in Figure 1 image are representative to that of its original counterpart. The
. In downsampling layers the generator aims to compress Adam optimizer is employed to refine the generator parameters
complex features from the input image, that justifies the during training. This training cycle is executed per epoch in
choice of employing LeakyReLU. Since LeakyReLU allows every client before sending their state dictionaries, weights,
a small, non-zero unit, for negative gradients it is used for back to the server. Note that this process is done for both
downsampling to preserve important information and prevent modalities.
the ‘dead neuron’ issue that could be caused by zeroing the Assuming a and b are opposite modalities, the
effect of important neurons, this is essential when dealing with loss of each generator is calculated as shown in
complex-featured dataset such as medical images. Equation 3. The first two components represent the
At the layer where most important features are extracted, the weighted addition of loss generator from a to b and
spatial self-attention layer is added along with a residual block loss generator from b to a, each including the GAN loss
to prevent possible gradient explosion or vanishing. Further (MSE) as well as the pixel-wise loss (MAE). The third
architectural explanation of the spatial self-attention will be component represents the identity loss which is calculated as
presented afterwards. a ‘lambda identity’ hyperparameter multiplied by the sum
Consequently, the data undergoes up sampling via five of L1 loss, shown in Equation 4 , of fake image a and fake
layers of transposed convolutions, ReLU, and skip connections image b.
to restore the original spatial dimensions of the images. The
final convolutional layer in the generators ensures that the lossgenerator total = (1 · MSEab + 10 · MAEab )
output image matches the original input image in dimension + (1 · MSEba + 10 · MAEba ) (3)
and format. When upsampling images ReLU has is utilized + (10 · (L1fakea + L1fakeb ))
to ensure the pixel values in the output images are positive,
yielding more realistic images. ReLU also introduces non- Total CycleGAN Generator Loss
linearity into the network, capturing more complex patterns
to ensure the model can reconstruct intricate details of the n
X
original image. This is illustrated in the last five layers in L1 Loss = |ytrue,i − ypredicted,i | (4)
Figure 1. i=1
L1 Loss
2) Discriminator: Within the CycleGAN framework, two
discriminators are utilized: one to assess the authenticity of
generated CT scans, the other for MRI scans. The discrim-
inators consist of a sequence of four convolutional layers
that progressively downsample the image, apply Leaky ReLU
Fig. 1. CycleGAN Generator Architecture activations, and InstanceNorm2d normalization. The design
implemented allows the discriminators to extract the abstract,
After the output has been generated, it is assessed and most complex, features from the images. The final convo-
evaluated by its respective discriminator. Loss calculation lutional layer is set to output a raw scalar map that, after
involves Mean Squared Error (MSE), refer to Equation 1, for average pooling, outputs a single authenticity score per image.
assessing the error of discriminator predictions against valid The discriminator acts as a binary classification network, as
targets and Mean Absolute Error (MAE), refer to Equation 2, illustrated in Figure 2.
for pixel-wise comparison between the synthetically generated
images and their real counterparts.
n
1X
MSE = (ytrue − ypredicted )2 (1)
n i=1
Mean Squared Error
n
1X
MAE = |ytrue − ypredicted | (2)
n i=1
Fig. 2. CycleGAN Generator Architecture
Mean Absolute Error
The total loss for each generator is derived from the Training the discriminators consists of presenting a real
weighted sums of MSE and MAE for each modality, along image, followed by calculating MSE to determine the loss
with the identity loss that is calculated using MAE to ensure associated with real images based on the output, compared
the color composition and visual attributes of the new fake against a valid modality. The same procedure is applied to
3
assess the authenticity of generated images, calculating the discriminators of the central server. A for-loop is written to
loss from the output against an imitation modality. It is worth iterate over every couple of generators in the dictionary to
noting that the process of discriminating the fake images is train them, then through a nested for-loop that iterates over the
excluded from the generator’s optimization computations, this corresponding discriminators in the discriminators dictionary.
prevents the discriminators’ adjustments from influencing the After every client has been trained for the predetermined
gradients of the generator. This process is essential when number of epochs and the loss functions for their components
training both components to ensure the independent assess- is calculated, the state dictionaries of every client are stored
ment of the quality of generated data without influencing the in an array that is sent to the server. The server object accepts
generator’s internal state during this computation. this array, calculates the average of weights, and recognizes
The cumulative loss for each discriminator, shown in Equa- them as its own. It then sends the aggregated weights back to
tion 5, is the sum of L1 losses of the real and fake images the clients and continues this round until the set round number
within the same modality, with overall model loss being the is reached or until model convergence.
average of these sums, facilitating balanced training dynamics
across both image modalities using the Adam optimizer.
lossdiscriminator total = 0.5 · (L1 Lossrealb + L1 Lossfakeb )

(5)
+ 0.5 · (L1 Lossreala + L1 Lossfakea )
Total CycleGAN Discriminator Loss
3) Spatial Self-Attention: The spatial self-attention in
CNN-based networks improves feature representation by al-
lowing each pixel in the feature map to consider all other
pixels. The mechanism involves transforming the input fea-
tures into query, key, and value representations using 1x1
convolutions, as illustrated in Figure 3. The query and key
representations are multiplied to calculate attention scores. A
SoftMax function is then used to normalize the results. The
value representations are weighted by these attention scores,
aggregating important information from all positions in the Fig. 4. Pipeline Design
image. The summed weights of values combined with the
input via a skip connection, and this shapes up the input
III. E VALUATION M EASURES
features with global context information. The implementation
of this mechanism enhances tasks like capturing long-range Following previous works utilizing a FL setting [8], MAE,
dependencies and focusing on relevant features. referred to in Equation 2, was used to calculate the sum of
absolute difference between the predicted and actual values
and MSE, referred to in Equation 1, was used to calculate the
average square difference between the predicted and actual
values. The lower the values of both terms, the better the
reconstruction of the image compared to the original image.
PSNR, referred to in Equation 6, is mainly used to evaluate
the quality of the image reconstruction by evaluating the
context, or edge, of neuroimages. Since allowing radiologists
to visually interpret the generated images is an essential part
Fig. 3. Spatial Self-Attention Algorithm of this research, this measure is allegedly essential to ensure
the image quality is high enough to be humanly interpretable.
B. CycleGAN Within a Federated Learning Framework The higher the PSNR value the better the image quality.
The pipeline design revolves around a central server that n
!
1X 2
houses both private data and the CycleGAN model, as dis- P SN R = 10 log10 (ytrue − ypredicted ) (6)
played in Figure 4. Note that the implementation of this n i=1
simulation was done locally on one machine. Assume the
Peak Signal-to-Noise Ratio
number of clients is n. In practice, dictionary of total, 2 · n,
generators and a dictionary of total , 2 · n, discriminators are SSIM measures the similarity between two images, con-
initialized, where each component is named after the image sidering luminance, contrast, and structure. SSIM, alongside
modality they are responsible for, a for MRI and b for CT. the PSNR, helped achieve a comprehensive evaluation of the
Another dictionary is initialized to house the generators and accuracy and visual quality of the generated images.
4
B. Implementation Details
(2µx µy + C1 )(2σxy + C2 ) Dataset images were normalized in their original form
SSIM (x, y) = (7)
(µ2x + µ2y + C1 )(σx2 + σy2 + C2 ) before conversion to ’.npy’ format. Subsequently, they were
expanded into another tensor dimension and normalized using
Structural Similarity Index Measure
MinMax normalization to transform them into a PyTorch ten-
IV. E XPERIMENTS AND R ESULTS sor of size 224. Images pre and post processing are displayed
A. Dataset in Figure 6.
SynthRAD2023 dataset [10] is a carefully curated collection

of 1080 paired medical images; 540 paired MRI-CT images
and 540 paired CBCT-CT images in ‘.nii.gz’ format, from
patients receiving radiotherapy to the brain and pelvic regions.
The dataset was collected for testing synthetic CT generation
algorithms in modern radiotherapy. It was sourced between
2018 and 2022, to include patients aged between 3 and 93. The
total volume of the data is approximately 25.4 GB, consisting
extensive data across varied imaging modalities and patient
conditions. The task of this research is limited to translating
MRI to CT scans for the brain area, hence only relative
volumes were extracted, leaving us with approximately 6.69 Fig. 6. Images on the left are unprocessed while images on the right show
GB, a total of 180 paired MRI-CT images. the input images after pre-processing
The dataset was split in an Independent and Identically
Distributed format (IID). To start with, the dataset (180 paired For all experiments Adam Optimizer is used to optimize
images) was split as 80% training data (144 paired images) CycleGAN components during training, with β1, β2, decay
and 20% test images (36 paired images) to evaluate the rate and learning rate are 0.5, 0.999, 2, and 0.0001 respectively.
model’s final performance. Decentralized data distribution is The model was trained for 3 epochs per client, locally, and a
considered the core of FL, where each client’s model is tasked total of 10 rounds, of weight exchange globally in FL setting.
to learn and adapt to its local data’s characteristics. In this The participation rate of clients in the FL framework is 100%,
project, four clients were initialized with equal weights of indicating that all clients were chosen to train in every round,
0.25, indicating that each client possessed a quarter of the and the weights of all clients were aggregated every round.
total training dataset to locally learn, this leaves every client The time taken to train the pipeline from beginning to the end
with 36 paired images (144 images divided by 4 clients). The of 10 rounds ranged approximately from 12 hours to 14 hours.
model is designed to train on 2D images, thus every image Experiments that deploy spatial self-attention have an output
was sliced into 30 slices, leaving each client with 1080 tensor size, for query and key convolutions, of 8 with kernel size 1.
slices for training. However, when the single model inference All experiments with self-attention also underwent a residual
was implemented on the test data, only one slice per image was block that utilized a kernel size, stride, padding and bias of 3,
considered. The testing slice was slice number 100, since it 1, 1, and False respectively.
is located right in the middle of the image, and in the middle Results obtained from the model proposed in this research
of the range of slices taken for training; slices 85 to 115. are displayed below along with a set of alternate approaches
To facilitate the understanding of the splitting process it is that include the difference between the accuracy in the central
illustrated in Figure 5. server in comparison to each individual client in the setting,
the use of CycleGAN [11] in contrast with the UNet [12]
architecture, and the effect of utilizing spatial self-attention
within the proposed CycleGAN approach. Experiments were
conducted to reinforce the choice of the proposed model,
through qualitative and quantitative comparisons against pre-
viously established benchmarks.
1) Central Server vs. Client Accuracy in Federated Learn-
ing environment: The main proposal of this architecture was to
aid in the training of multiple models that are geographically
scattered, while maintaining client privacy and ensuring the
model captures diverse datasets. This would be a fair compari-
son since the data split on all clients resembles the ratio of data
a hospital would have acquired to the data all hospitals would
Fig. 5. Data Splitting Mechanism have when an FL environment is established, collaborating the
knowledge and results. The final SSIM of the FL environment
5
central server, after 10 rounds, was 0.6860, while clients’ 3) Spatial Self-Attention vs. Without Spatial Self-Attention:
SSIM values ranged from 0.6385 to 0.6962, as shown in Figure To inspect the effect of adopting the spatial self-attention
7. The results validate the benefits international organizations mechanism a test was conducted to observe the difference be-
would acquire from utilizing such an environment for training tween typical CycleGAN structure, and a modified CycleGAN
their systems. The reason behind the accuracy value increase that incorporates a spatial self-attention layer and a residual
in clients is due to the state dictionaries that were sent from the block at the final downsampling layer in the generator. Both
server to the clients every round, which include other clients’ architectures were evaluated within the same FL environment.
knowledge and dataset training. Therefore, the FL framework Images processed with self-attention recorded an SSIM of
was endorsed in further experiments. 0.6942, while those processed without self-attention achieved
a value of 0.6860. Self-attention was visually easier to interpret
and more closely resembled the ground truth compared to
CycleGAN without self-attention, as displayed in Figure 9.
This abides by the hypothesis of this research, since the
attributes of a CT scan rely on multiple factors that can
be scattered along an MRI scan, and not necessarily in the
corresponding position of pixels, allowing the model to take
the full image into consideration when generating one image
modality to another.
Fig. 7. PSNR values per round for every client and centralized server
Fig. 9. Top row showcases CycleGAN model results without self-attention,
2) CycleGAN vs. UNet: An experiment was conducted to while bottom row displays CycleGAN model results with self-attention, where
compare the baseline UNet architecture with the proposed Cy- the paired images A are the ground truth, the set B are the generated images
cleGAN architecture, run in a FL environment. After running from the set A, and the final set C are the synthetic images generated from
the set B
the CycleGAN model the central server reported an SSIM
value of 0.6860, while the UNet model central server recorded Qualitative results are displayed, to facilitate comparison, in
an SSIM value of -0.1328. The results confirm the choice of Table I.
CycleGAN has established a more stable and accurate training. U-Net Without Self-Attention Self-Attention
Figure 8 qualitatively compares the difference between the SSIM -0.1328 0.6860 0.6971
final result of both architectures on a testing data sample. PSNR 4.1069 14.7589 15.2728
Eventhough UNet is considered as the baseline of image MAE 0.3319 0.0373 0.0315
translation tasks, its ability to map images from one modality TABLE I
C OMPARISON OF EVALUATION METRICS ACROSS DIFFERENT
to another is rather weak in an FL environment due to EXPERIMENTS
asynchronous communication that could cause the model to
become unstable. Hence, the CycleGAN model was pursued
for further investigation to explore the possibility of enhancing In summary, the comprehensive evaluation enhanced the
its image translation ability. reason of model component choices. The proposed method
demonstrated superiority in terms of accuracy in comparison to
other possible pipelines, and in terms of visual interpretability.
This result emphasized the efficiency of the proposed model
to translate MRI scans to CT scans, through adopting a spa-
tial self-attention incorporated CycleGAN architecture. While
maintaining patient privacy and allowing collaborative learning
across medical institutions, by establishing an FL framework,
and its potential to assist radiologists in planning radiotherapy
while ensuring patient safety.
Fig. 8. Top row showcases CycleGAN model results, while bottom row V. D ISCUSSION
displays UNet results, where the paired images A are the ground truth, the
set B are the generated images from the set A, and the final set C are the As demonstrated, the pipeline design chosen has excelled in
synthetic images generated from the set B performance when compared to other possible architectures.
6
The finalized architecture utilizes a CycleGAN model that distributed on all four clients. In this work, we assume that all
incorporates spatial self-attention mechanism in a FL envi- clients are IID. However, this is not necessarily the case when
ronment. this architecture is implemented on a larger scale in real life.
The first test encompassed the advantage of employing a One could investigate non-IID settings and domain shifts with
FL learning paradigm for international industries that serve a FL. Further work could include developing more personalized
common purpose. Noticeably, the aggregation of client state federated learning techniques that would be able to handle
dictionaries in the central server allowed the central server to different data distributions.
learn from more diverse datasets, capturing a wider range of
R EFERENCES
patterns than a single local client could. Clients are prone to
effectively learn important aspects in their local data while [1] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and
A. A. Bharath, “Generative adversarial networks: An overview,” IEEE
neglecting others; aggregation of knowledge in the central Signal Processing Magazine, vol. 35, no. 1, pp. 53–65, 2018.
server gives it the power to effectively take all features [2] S. AI, “Image to image translation,” 2024, accessed: 2024-06-27.
in consideration simultaneously. It is to be noted that the [3] J. Denck, J. Guehring, A. Maier, and E. Rothgang, “Mr-contrast-aware
image-to-image translations with generative adversarial networks,” Inter-
convergence of clients’ updated files by the FedAvg algorithm national Journal of Computer Assisted Radiology and Surgery, vol. 16,
can serve as regularization and secure the privacy, i.e. it avoids pp. 2069–2078, 2021.
overfitting problems on a single client’s data. This can increase [4] A. Alotaibi, “Deep generative adversarial networks for image-to-image
translation: A review,” Symmetry, vol. 12, no. 10, p. 1705, 2020.
the generalization power of the model. Therefore, the server’s [5] V. Kearney, B. P. Ziemer, A. Perry, T. Wang, J. W. Chan, L. Ma,
performance is better than the performance of any single client. O. Morin, S. S. Yom, and T. D. Solberg, “Attention-aware discrimi-
The internal architecture of the CycleGAN model allowed nation for mr-to-ct image translation using cycle-consistent generative
adversarial networks,” Radiology: Artificial Intelligence, vol. 2, no. 2,
for the exploration of CT to MRI image translation. Given that p. e190027, 2020.
the CycleGAN already employs a generator responsible for [6] N. Rieke, “What is federated learning,” The NVIDIA Blog, 2019.
generating CT scans to MRI, that generator was being trained [7] Educative.io, “What is federated averaging (fedavg)?” 2024, accessed:
2024-06-27.
as well throughout the whole process. When both modalities [8] J. Wang, G. Xie, Y. Huang, J. Lyu, F. Zheng, Y. Zheng, and Y. Jin,
were compared to one another, it was clear that MRI scans “Fedmed-gan: Federated domain translation on unsupervised cross-
were more complex, detailed, and filled with information, as modality brain image synthesis,” Neurocomputing, vol. 546, p. 126282,
2023.
opposed to CT scans. This difference played a huge role in the [9] S. Li, X. Zhang, J. Xiong, C. Ning, and M. Zhang, “Learning spatial
difference of accuracies of both generators in the CycleGAN, self-attention information for visual tracking,” IET Image Processing,
as the generator responsible for MRI to CT scans always vol. 16, no. 1, pp. 49–60, 2022.
[10] A. Thummerer, E. van der Bijl, A. Galapon Jr, J. J. Verhoeff, J. A.
possessed a higher accuracy. Langendijk, S. Both, C. N. A. van den Berg, and M. Maspero,
When looking at the results of self-attention from that “Synthrad2023 grand challenge dataset: Generating synthetic ct for
perspective, the generated MRI scans are more detailed and radiotherapy,” Medical physics, vol. 50, no. 7, pp. 4664–4674, 2023.
[11] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
preserve more information from the CT scans than the tradi- translation using cycle-consistent adversarial networks,” in Proceedings
tional image translation task. This result suggests that when of the IEEE international conference on computer vision, 2017, pp.
further investigating the task of translating CT scans to MRI 2223–2232.
[12] J. Chi, C. Wu, X. Yu, P. Ji, and H. Chu, “Single low-dose ct image
scans, self-attention is a mechanism that can ensure the effec- denoising using a generative adversarial network with modified u-
tiveness of this process. The attention mechanism is important net generator and multi-level discriminator,” IEEE Access, vol. 8, pp.
in binding together the critical parts of the images, hence 133 470–133 487, 2020.
allowing the model to focus and translate the meaningful
features. Adding an attention mechanism in our work has also
yielded images that are better generated by highlighting and
preserving the essential details, hence improving the quality
and accuracy of the translated images.
VI. C ONCLUSION AND F UTURE W ORK
In conclusion, the implementation of a CycleGAN in a
FL environment has proven to be one of the most efficient
and accurate architectures to perform MRI-to-CT translation
task with an SSIM value of 0.6971, PSNR of 15.2728, and
MAE of 0.0315. Qualitative results indicate the synthetic
CT scan output is relatively close to the ground truth, with
high SSIM value and low error term, and a visually reliable
result indicated by the high value of PSNR. Results at hand
yielded promising results that suggest the future of image-
to-image translation tasks in the medical field have become
more reliable with the development of new computer vision
models. Future work and enhancements could address the data

Radia Fedr

Uploaded by

Copyright:

Available Formats

Radia Fedr

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Radia Fedr

Uploaded by

Copyright:

Available Formats

CycleGAN with Spatial Self-Attention in Federated

Learning for Medical Image Translation in

Mean Squared Error

lossdiscriminator total = 0.5 · (L1 Lossrealb + L1 Lossfakeb )

SynthRAD2023 dataset [10] is a carefully curated collection

You might also like