0% found this document useful (0 votes)

114 views8 pages

Edge Enhancement Based Transformer For Medical Image Denoising PDF

This document summarizes a research paper that proposes a new model called Eformer for medical image denoising. Eformer uses a transformer-based encoder-decoder architecture with non-overlapping window attention and learnable Sobel filters to enhance edges. It is evaluated on the AAPM-Mayo Clinic Low-Dose CT dataset and achieves state-of-the-art performance for metrics like PSNR, RMSE, and SSIM. The paper conducts experiments comparing residual learning vs direct prediction and finds residual learning outperforms for this medical image denoising task.

Uploaded by

Mohammed Kharbatli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views8 pages

Edge Enhancement Based Transformer For Medical Image Denoising PDF

Uploaded by

Mohammed Kharbatli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Eformer: Edge Enhancement based Transformer for Medical Image

Denoising
Achleshwar Luthra* Harsh Sulakhe* Tanish Mittal* Abhishek Iyer Santosh Yadav

Birla Institute of Technology and Science, Pilani

{f20180401, f20180186, f20190658, f20181105, santosh.yadav } @ pilani.bits-pilani.ac.in

Abstract cause for concern as the patient is exposed to radioactive

waves for varying durations. CT scans have been mainly
In this work, we present Eformer - Edge enhance- responsible for increasing the radiation received by humans
ment based transformer, a novel architecture that builds from medical procedures and have even led to medical pro-
an encoder-decoder network using transformer blocks for cedures becoming the second-largest source of radiation af-
medical image denoising. Non-overlapping window-based ter background radiation to affect humans. Reducing the
self-attention is used in the transformer block that reduces dose of the X-rays in CT scans is possible but leads to prob-
computational requirements. This work further incorpo- lems such as increased noise, reduction of contrast in edges,
rates learnable Sobel-Feldman operators to enhance edges corners, and sharp features, and over smoothing of images.
in the image and propose an effective way to concatenate We propose a method to help preserve the details and re-
them in the intermediate layers of our architecture. The ex- duce the noise generated from low dose scans so they may
perimental analysis is conducted by comparing determin- become a viable solution in place of high dose scans.
istic learning and residual learning for the task of medi- Medical Image Denoising has garnered considerable
cal image denoising. To defend the effectiveness of our ap- amount of attention from the computer vision research com-
proach, our model is evaluated on the AAPM-Mayo Clinic munity. There has been extensive research [19, 27, 4, 10,
Low-Dose CT Grand Challenge Dataset and achieves state- 14, 2] in this domain in the recent past. Although these
of-the-art performance, i.e., 43.487 PSNR, 0.0067 RMSE, methods have shown excellent results, they implicitly as-
and 0.9861 SSIM. We believe that our work will encourage sociate denoising with operations on a global scale rather
more research in transformer-based architectures for medi- than leveraging the local visual information. We argue that
cal image denoising using residual learning. we can benefit from the patch embedding operations that
form the basis of a vision transformer [8]. Recently, Vision
Transformers (ViT) have shown great success in many com-
1. Introduction
puter vision tasks including image restoration [25] but they
Modern methods for diagnosing medical conditions have have not been exploited on medical image datasets.
been developing rapidly in recent times and a tool of utmost To the best of our knowledge, this is the first work that
importance is the Computerized Tomography (CT) scan. It utilizes transformers for medical image denoising. The ma-
is used often to help diagnose complex bone fractures, tu- jor contributions of this paper are as follows:
mors, heart disease, emphysema, and more. It works in a
method similar to that of the X-Ray scan. A rotating source • We introduce a novel architecture - Eformer, for edge
of X-Ray beams is used to shoot narrow beams through a enhancement based medical image denoising using
certain section of your body with a highly sensitive detector transformers. We incorporate learnable Sobel filters
being placed opposite to the source which picks up these X for edge enhancement which results in improved per-
Rays and uses a highly advanced mathematical algorithm to formance of our overall architecture. We outperform
create 2D slices of a body part from one full rotation. This existing state-of-the-art methods and show how trans-
process is repeated until a number of slices are created. As formers can be useful for medical image denoising.
helpful as this procedure is in diagnosing, it does have some
• We conduct extensive experimentations on training our
* equal contribution network following the residual learning paradigm. To
Input Low-Dose Image Residual Noise Normal Dose image

LC2D Block LC2U Block

Input Projection Sobel Convolution Output Projection

LeWin Transformer
LeWin Transformer
Block
Block
x2
LC2D Block LC2U Block x2
(n=1, k=1) (n=1, k=1)

Edge Enhanced
Concatenation
Features Downsampling Convolution

Convolution LC2D Block LC2U Block

Edge Enhanced
(n=2, k=2) (n=2, k=2) Concatenation
Features

Downsampling
Upsampling
LeWin Transformer
Block

Figure 1. Detailed description of our method. All the steps involved have been explained in 3.6. LC2(D/U) stands for LeWin Transformer,
Concatenation block, Convolution block, and Downsampling/Upsampling.

prove the effectiveness of residual learning in image age. GAN based models such as [27, 14] use WGAN [1]
denoising tasks, we also show results using a deter- with Wasserstein Distance and Perceptual Loss for image
ministic approach where our model directly predicts denoising.
denoised images. In medical image denoising, resid- Recently, transformer-based architectures have also
ual learning clearly outperforms traditional learning achieved huge success in the computer vision domain pio-
approaches where directly predicting denoised images neered by ViT (Vision Transformer) [8], which successfully
becomes similar to formulating an identity mapping. utilized transformers for the task of image classification.
Since then, many models involving transformers have been
This paper follows the following structure - in Section proposed that have shown successful results for many low-
2 we discuss the previous work done in image denoising level vision tasks including image super-resolution [26], de-
and the use of transformers in related tasks. In Section 3, noising [25], deraining [3], and colorization [18]. Our work
we have explained our approach in a detailed manner. In is also inspired by one such denoising transformer, Uformer
Section 4, we compare our results with existing methods [25], which employs non-overlapping window-based self-
which is followed by some conclusive statements and future attention and depth-wise convolution in the feed forward
directions in Section 5. network to efficiently capture local context. We integrate
the edge enhancement module [19] and a Uformer-like ar-
2. Related Work chitecture in an efficient novel manner that helps us achieve
Low-dose CT (LDCT) image denoising is an active re- state-of-the-art results.
search area in medical image denoising due to its valuable
clinical usability. Due to the limitations in the amount of 3. Our Approach
data and the consequent low accuracy of conventional ap-
proaches [16], data-efficient deep learning approaches have In this section, we provide a detailed description about
a huge potential in this domain. The pioneering work of the components involved in our implementation.
Chen et al. [6] showed that a simple Convolutional Neural
3.1. Sobel-Feldman Operator
Network (CNN) can be used for suppressing the noise of
LDCT images. The models proposed in [11, 5, 23] show Inspired by [19], we use Sobel–Feldman operator [24],
that an encoder-decoder network is efficient in medical im- also called Sobel Filter, for our edge enhancement block.
age denoising. REDCNN [5] combines shortcut connec- Sobel Filter is specifically used in edge detection algorithms
tions into the residual encoder-decoder network and CPCE as it helps in emphasizing on the edges. Originally the oper-
[23] uses conveying-paths connections. Fully Convolu- ator had two variations - vertical and horizontal, but we also
tional Networks such as [10] uses dilated convolutions with include diagonal versions similar to [19] (See Supplemental
different dilation rates whereas [15] uses simple convolu- Material). Sample results of edge enhanced CT image have
tion layers with residual learning for denoising medical im- been shown in Figure 2. The set of image feature maps con-
Method MSE MSP Adv. VGG-P
REDCNN [5] ✓ - - -
WGAN [1] - - ✓ ✓
CPCE [23] - - ✓ ✓
EDCNN [19] ✓ ✓ - -
Eformer (ours) ✓ ✓ - -

Table 1. Comparison between losses used by different methods;

MSE - mean squared error, MSP - multi-scale perceptual, Adv. -
adversarial, and VGG-P - VGG network based perceptual loss.

taining edge information are efficiently concatenated with

the input projection and other parts of the network (refer to
Figure 1). Figure 2. Example of results obtained after convolution of images
with Sobel-filter. Input (left) and edge-enhanced images (right).
3.2. Transformer based Encoder-Decoder
Denoising Autoencoders [5, 23, 11], Fully Convolu-
can cause checkerboard artifacts which are not desirable for
tional Networks [19, 15, 10], and GANs [27, 14] have been
image denoising. [21] states, to avoid uneven overlap, the
successful in the past in the task of medical image de-
kernel size should be divisble by the stride. Hence, in our
noising, but transformers have not yet been explored for
upsampling layer, we use a kernel size of 4 × 4 and a stride
the same, despite their success in other computer vision
of 2.
tasks. Our novel network Eformer is one such step in
that direction. We take inspiration from Uformer [25] for 3.4. Residual Learning
this work. At every encoder and decoder stage, convolu-
tional feature maps are passed through a locally-enhanced The goal of residual learning is to implicitly remove the
window (LeWin) transformer block that comprises of a latent clean image in the hidden layers. We input a noisy
non-overlapping window-based Multi-head Self-Attention image x = y + v to our network, here x is the noisy im-
(W-MSA) and a Locally-enhanced Feed-Forward Network age, in our case the low-dose image, y is the ground truth,
(LeFF), integrated together (See Supplementary Material) . and v is the residual noise. Rather than directly outputting
the denoised image ŷ, the proposed Eformer predicts the
\begin {aligned} & \mathbf {X}_m^{\prime } = \text {W-MSA}(\text {LN}(\mathbf {X}_{m-1}))+\mathbf {X}_{m-1}, \\ & \mathbf {X}_m = \text {LeFF}(\text {LN}(\mathbf {X}_m^{\prime }))+\mathbf {X}_m^{\prime } \end {aligned} residual image v̂, i.e., difference between the noisy image
(1)
and the ground truth. According to [12], when the origi-
nal mapping is more like an identity mapping, the residual
here, LN represents the layer normalization. As shown in mapping is much easier to optimize. Discriminative denois-
Figure 1, the transformer block is applied prior to the LC2D ing models aim to learn a mapping function of F (x) = ŷ
block in each encoding stage and post the LC2U block in whereas we adopt residual formulation to train our network
each decoding stage, and also serves as the bottleneck layer. to learn a residual mapping R(x) = v̂ and then we obtain
3.3. Downsampling & Upsampling ŷ = x − R(x) =⇒ ŷ = x − v̂.

Pooling layers are the most common way of downsam- 3.5. Optimization
pling the input image signal in a convolutional network.
As a part of the optimization process, we employ multi-
They work well in image classification tasks as they help
ple loss functions to achieve the best possible results. We
in capturing the essential structural details but at the cost
initially use Mean Squared Error (MSE) which calculates
of losing finer details which we cannot afford, in our task.
the pixelwise distance between the output and the ground
Hence we choose strided convolutions in our downsampling
truth image defined as follows.
layer. More specifically, we use a kernel size of 3 × 3 with
stride of 2 and padding of 1.
Upsampling can be thought of as unpooling or reverse L_{mse} = \frac {1}{N}\sum _{i=1}^N\Big \|(x_i - R(x_i)) - y_i\Big \|^2 (2)
of pooling using simple techniques such as Nearest Neigh-
bor. In our network, we use transpose convolutions [9].
Transpose convolution reconstructs the spatial dimensions However, it tends to create unwanted artifacts such as over-
and learns its own parameters just like regular convolutional smoothness and image blur. To overcome this, we employ
layers. The issue with transpose convolutions is that they both, a ResNet [12] based Multi-scale Perceptual (MSP)
Method PSNR ↑ SSIM ↑ RMSE ↓
REDCNN 42.3891 0.9856 0.0076
WGAN 38.6043 0.9647 0.0108
CPCE 40.8209 0.9740 0.0093
EDCNN 42.0835 0.9866 0.0079
Eformer 42.2371 0.9852 0.0077
Eformer-residual 43.487 0.9861 0.0067

Table 2. Comparison with previous methods evaluated on AAPM

Dataset [20].

encoded feature map to another LeWin Transformer block

which is now ready to be decoded by the same number of
Input Images Our Results
stages as it was encoded. In each stage of the decoder, post
deconvolution, the earlier downsampled S(I) itself are con-
Figure 3. Sample Results on AAPM Dataset [20]. More results
have been added in the supplementary material.
catenated with the upsampled feature maps which are then
passed through a convolutional block. The decoder stage
can be viewed as an opposite of the encoder stage, with
a shared S(I). The final feature map produced after de-
Loss [19]. MSP can be described by the following equa- coding is then passed through a ’output projection’ block
tion to produce the desired residual. This ’output projection’ is
a convolutional layer, that simply projects the C-channel
L_{msp} = \frac {1}{NC}\sum _{i=1}^N\sum _{s=1}^C\Big \|\phi _s(x_i - R(x_i),\hat {\theta }) - \phi _{s}(y_i,\hat {\theta })\Big \|^2 feature map to a 1-channel grayscale image. In our experi-
ments, we set the depth of the LeWin block, attention heads
(3) and number of encoder-decoder stages each to 2. A concise
A ResNet-50 backbone was utilized as the feature extrac- representation of the architecture can be seen in Figure 1
tor ϕ. To be specific, the pooling layers from a ResNet-50 which resembles the alphabet ’E’ hence the name Eformer.
pretrained on the ImageNet dataset [7] were deleted, retain-
ing the convolutional blocks following which the weights 4. Results and Discussions
(θ̂) were frozen. To calculate perceptual loss, the denoised
output xi − R(xi ), where R(xi ) = vˆi (as described in Sec- This subsection highlights the results attained by mea-
tion 3.4) and ground truth (yi ) are passed to the extractor. suring three different metrics to judge noise reduction and
Following this, feature maps are extracted from four stages the quality of the reconstructed low dose CT images. We
of the backbone, as done in [19]. This perceptual loss, in use the following metrics for the evaluation - Peak Sig-
combination with MSE deals with both per-pixel similar- nal to Noise Ratio (PSNR), Structural Similarity (SSIM),
ity in addition to overall structural information. Our final and Root Mean Square Error (RMSE). PSNR is targeted at
objective is as follows, noise reduction and is a measure of the quality of recon-
struction. SSIM is a perceptual metric that focuses on the
L_{final} = \lambda _{mse} L_{mse} + \lambda _{msp} L_{msp} (4) visible structures in an image and is a measure of the vi-
sual quality. RMSE keeps track of the absolute pixel to
where, λmse and λmsp are pre-defined constants. pixel loss between the two images. We compare our re-
sults, examples shown in Figure 3, with architectures that
3.6. Overall Network Architecture
share similarities with our model in the sense they are based
Composing the aforementioned individual modules, our on a convolutional architecture. As seen in Table 1, CPCE
pipeline can be described as follows. An input image I is [23], WGAN [1] and EDCNN [19] like ours use a combi-
first passed through a Sobel Filter to produce S(I) followed nation of commonly used losses to train their model while
by a GeLU activation [13]. As a part of the encoding stages, REDCNN [5] only uses MSE. Table 2 shows that our pro-
at each stage, we pass the input through a LeWin trans- posed models, Eformer and Eformer-Residual, outperform
former block, proceeded by a concatenation with S(I) and the state-of-the-art methods in both the PSNR and MSE
consequent convolution operations, similar to [19] to pro- metrics, indicating efficient denoising and our comparable
duce an encoded feature map. The feature map, along with performance in SSIM also suggests that the visual quality
S(I) is then downsampled using the procedure described in of the image is high and important details are not lost in the
Section 3.3. Post encoding, at the bottleneck, we pass the reconstruction.
5. Conclusion Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl-
vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is
To conclude, this paper presents a residual learning worth 16x16 words: Transformers for image recognition at
based image denoising model evaluated in the medical do- scale, 2021. 1, 2
main. We leverage transformers, and an edge enhance- [9] Vincent Dumoulin and Francesco Visin. A guide to convo-
ment module to produce high quality denoised images, and lution arithmetic for deep learning, 2018. 3
achieve state-of-the-art performance using a combination of [10] M. Gholizadeh-Ansari, J. Alirezaie, and P. Babyn. Deep
multi-scale perceptual loss and the traditional MSE loss. Learning for Low-Dose CT Denoising Using Perceptual
We believe our work will encourage the use of transformers Loss and Edge Detection Layer. J Digit Imaging, 33(2):504–
in medical image denoising. In the future, we plan to ex- 515, 04 2020. 1, 2, 3
plore the capabilities of our model on a multitude of related [11] Lovedeep Gondara. Medical image denoising using convo-
tasks. lutional denoising autoencoders. In 2016 IEEE 16th Inter-
national Conference on Data Mining Workshops (ICDMW),
6. Acknowledgements pages 241–246, Dec 2016. 2, 3
[12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
We want to thank the members of Computer Vision Re- Deep residual learning for image recognition, 2015. 3
search Society (CVRS1 ) for their helpful suggestions and [13] Dan Hendrycks and Kevin Gimpel. Gaussian error linear
feedback. units (gelus), 2020. 4
[14] Zhanli Hu, Changhui Jiang, Fengyi Sun, Qiyang Zhang,
References Yongshuai Ge, Yongfeng Yang, Xin Liu, Hairong Zheng,
and Dong Liang. Artifact correction in low-dose dental ct
[1] Martin Arjovsky, Soumith Chintala, and Léon Bottou. imaging using wasserstein generative adversarial networks.
Wasserstein gan, 2017. 2, 3, 4 Medical Physics, 46(4):1686–1696, 2019. 1, 2, 3
[2] Nicholas Bien, Pranav Rajpurkar, Robyn L. Ball, Jeremy [15] Worku Jifara, Feng Jiang, Seungmin Rho, Maowei Cheng,
Irvin, Allison Park, Erik Jones, Michael Bereket, Bhavik N. and Shaohui Liu. Medical image denoising using convo-
Patel, Kristen W. Yeom, Katie Shpanskaya, Safwan Halabi, lutional neural network: a residual learning approach. The
Evan Zucker, Gary Fanton, Derek F. Amanatullah, Christo- Journal of Supercomputing, 75(2):704–718, Feb 2019. 2, 3
pher F. Beaulieu, Geoffrey M. Riley, Russell J. Stewart, [16] P. Kaur, Gurvinder Singh, and Parminder Kaur. A review of
Francis G. Blankenberg, David B. Larson, Ricky H. Jones, denoising medical images using machine learning
Curtis P. Langlotz, Andrew Y. Ng, and Matthew P. Lun- approaches. Current Medical Imaging Reviews, 14:675 –
gren. Deep-learning-assisted diagnosis for knee magnetic 685, 2018. 2
resonance imaging: Development and retrospective valida- [17] Diederik P. Kingma and Jimmy Ba. Adam: A
tion of mrnet. PLOS Medicine, 15(11):1–19, 11 2018. 1 method for stochastic optimization, 2014. cite
[3] Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping arxiv:1412.6980Comment: Published as a conference
Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and paper at the 3rd International Conference for Learning
Wen Gao. Pre-trained image processing transformer, 2021. Representations, San Diego, 2015. 7
2
[18] Manoj Kumar, Dirk Weissenborn, and Nal Kalchbrenner.
[4] Hu Chen, Yi Zhang, Mannudeep K. Kalra, Feng Lin, Colorization transformer, 2021. 2
Yang Chen, Peixi Liao, Jiliu Zhou, and Ge Wang. Low-
[19] Tengfei Liang, Yi Jin, Yidong Li, and Tao Wang. Edcnn:
dose ct with a residual encoder-decoder convolutional neu-
Edge enhancement-based densely connected network with
ral network. IEEE Transactions on Medical Imaging,
compound loss for low-dose ct denoising. 2020 15th IEEE
36(12):2524–2535, Dec 2017. 1
International Conference on Signal Processing (ICSP), Dec
[5] Hu Chen, Yi Zhang, Mannudeep K. Kalra, Feng Lin, Yang 2020. 1, 2, 3, 4
Chen, Peixi Liao, Jiliu Zhou, and Ge Wang. Low-dose ct
[20] Cynthia H McCollough, Adam C Bartley, Rickey E Carter,
with a residual encoder-decoder convolutional neural net-
Baiyu Chen, Tammy A Drees, Phillip Edwards, David R
work. IEEE Transactions on Medical Imaging, 36(12):2524–
Holmes III, Alice E Huang, Farhana Khan, Shuai Leng, et al.
2535, 2017. 2, 3, 4
Low-dose ct for the detection and classification of metastatic
[6] Hu Chen, Yi Zhang, Weihua Zhang, Peixi Liao, Ke Li, Jiliu liver lesions: results of the 2016 low dose ct grand challenge.
Zhou, and Ge Wang. Low-dose ct via convolutional neural Medical physics, 44(10):e339–e352, 2017. 4, 7
network. Biomed. Opt. Express, 8(2):679–694, Feb 2017. 2
[21] Augustus Odena, Vincent Dumoulin, and Chris Olah. De-
[7] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, convolution and checkerboard artifacts. Distill, 2016. 3
and Li Fei-Fei. Imagenet: A large-scale hierarchical image
[22] Adam Paszke, Sam Gross, Soumith Chintala, Gregory
database. In 2009 IEEE Conference on Computer Vision and
Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al-
Pattern Recognition, pages 248–255, 2009. 4
ban Desmaison, Luca Antiga, and Adam Lerer. Automatic
[8] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov,
differentiation in pytorch. 2017. 7
Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,
[23] Hongming Shan, Yi Zhang, Qingsong Yang, Uwe Kruger,
1 https://sites.google.com/view/thecvrs Mannudeep K. Kalra, Ling Sun, Wenxiang Cong, and Ge
Wang. 3-d convolutional encoder-decoder network for low-
dose ct via transfer learning from a 2-d trained network.
IEEE Transactions on Medical Imaging, 37(6):1522–1534,
Jun 2018. 2, 3, 4
[24] Irwin Sobel. An isotropic 3x3 image gradient operator. Pre-
sentation at Stanford A.I. Project 1968, 02 2014. 2
[25] Zhendong Wang, Xiaodong Cun, Jianmin Bao, and
Jianzhuang Liu. Uformer: A general u-shaped transformer
for image restoration, 2021. 1, 2, 3, 7, 8
[26] Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, and Bain-
ing Guo. Learning texture transformer network for image
super-resolution, 2020. 2
[27] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K.
Kalra, Y. Zhang, L. Sun, and G. Wang. Low-Dose CT Image
Denoising Using a Generative Adversarial Network With
Wasserstein Distance and Perceptual Loss. IEEE Trans Med
Imaging, 37(6):1348–1357, 06 2018. 1, 2, 3
Supplementary Material for Eformer: Edge Enhancement based
Transformer for Medical Image Denoising
Achleshwar Luthra* Harsh Sulakhe* Tanish Mittal* Abhishek Iyer Santosh Yadav

Birla Institute of Technology and Science, Pilani

{f20180401, f20180186, f20190658, f20181105, santosh.yadav } @ pilani.bits-pilani.ac.in

7. Dataset Details -2

For our research work, we have utilized the AAPM-

Mayo Clinic Low-Dose CT Grand Challenge Dataset [20] -
provided by The Cancer Imaging Archive (TCIA). The
dataset contains 3 types of CT scans collected from 140 0
patients. These 3 types of CT scans are abdomen, chest,
and head which are collected from a total of 48, 49, and
42 patients respectively. The data from each patient com-
prises of low-dose CT scans paired with its corresponding
normal-dose CT scans. The low dose CT scans are syn- 2
thetic CT scans which are generated by poisson noise in-
sertion into the projection data. Poisson noise was inserted
to reach a noise level of 25% of the full dose. Each CT Figure 4. Four different sets of Sobel-filters in our implementation.
scan is given in DICOM (Digital Imaging and Communica-
tions in Medicine) file format. It is a standard format which text. We use the Pytorch framework [22] to run our exper-
establishes rules for the exchange of medical images and iments. The convolutional layers are initialized using the
associated information between different vendors, comput- default scheme except the Sobel convolutional block. We
ers and hospitals. This format meets health information ex- enforce the filter parameters to follow the pattern as shown
change (HIE) standards and HL7 standards for transmission in Figure 4 where α is a learnable parameter. All our exper-
of health-related data. A DICOM file consists of a header iments were run on a 16GB NVIDIA TESLA P100 GPU.
and image pixel intensity data. The header contains infor- The model was trained with an ADAM [17] optimizer us-
mation regarding the patient demographics, study parame- ing a learning rate of 0.00002 and default parameters. The
ters, etc. stored in seperate ’tags’ and image pixel intensity model was trained using an input size of 128 × 128 pixels
data contains the pixel data of the CT scan which in our by resizing each image from its original size of 512 × 512
case contains pixel data of images of size 512 × 512. In our pixels. The results obtained have been shown in Figure 6
model, for training, we extracted the image pixel data from
a Dicom file to a NumPy array using Pydicom library 1 and 9. LeWin Transformer
then, the pixel data in NumPy array is scaled from 0 to 1 to
avoid heterogenous spanning of pixel data for different CT To make our submission self-containing, we have pro-
scans. vided architecture details of the LeWin transformer block
[25] in the supplementary material. LeWin transformer
8. Parameter Details and Network Training block (Figure 5) contains 2 core designs which are de-
scribed below. First, non-overlapping Window-based
The structure and architecture of the model have been Multi-head Self-Attention (W-MSA), which works on
previously described in Section 3.6 and Figure 1 of the main low-resolution feature maps and is sufficient to learn long-
* equal contribution range dependencies. Second, Locally-enhanced Feed-
1 https://pydicom.github.io/ Forward Network (LeFF), which integrates a convolution

7
+

LeFF

Layer Normalisation

W-MSA

Layer Normalisation

Figure 5. LeWin Transformer Block

operator with a traditonal feed-forward network and is vi-

tal in learning local context. In LeFF, the image patches
are first passed through linear projection layers followed by
3×3 depth-wise convolutional layers. Further the patch fea-
tures are flattened and finally passed to another linear layer
Figure 6. Results
to match the dimension of input channels. The structure of
the LeWin Transformer Block is pictorially represented in
Figure 5. Corresponding equations are as follows. X̂j denotes the output for the j-th head. Now, output for all
the heads can be concatenated and then linearly projected to
\begin {aligned} & \mathbf {X}_m^{\prime } = \text {W-MSA}(\text {LN}(\mathbf {X}_{m-1}))+\mathbf {X}_{m-1}, \\ \end {aligned} (5)
get the final results. We formulate attention calculation in
the same manner as done in [25].
\mathbf {X}_m = \text {LeFF}(\text {LN}(\mathbf {X}_m^{\prime }))+\mathbf {X}_m^{\prime } (6)
Here X′m and Xm are the outputs of the W-MSA module
and LeFF module respectively, LN represents layer nor-
malization. In the W-MSA module, the given 2D feature
map X ∈ RC×H×W is split into N non-overlapping win-
dows with window size M × M . Following this, self-
attention is performed on the flattened features of each win-
2
dow X i ∈ RM ×C . Suppose the head number is j and the
head dimension is dj = C/j. Then, consequent computa-
tions are,

\mathbf {X} = \{\mathbf {X}^1, \mathbf {X}^2, \dots , \mathbf {X}^N\}, ~~ N = HW/M^2 (7)

\begin {aligned} \mathbf {Y}î_j = \text {Attention}(\mathbf {X}î\mathbf {W}_j^Q, \mathbf {X}î\mathbf {W}_j^K, \mathbf {X}î\mathbf {W}_j^V)\\ i = 1 \dots N \end {aligned}
(8)

\mathbf {\hat {X}}_j =\{\mathbf {Y}^1_j, \mathbf {Y}^2_j, \dots , \mathbf {Y}^M_j\} (9)

Senior System Architect Student Guide
33% (3)
Senior System Architect Student Guide
369 pages
NEC SIP@Net - Installation Manual - ISS
100% (1)
NEC SIP@Net - Installation Manual - ISS
102 pages
Communication Interface
100% (1)
Communication Interface
45 pages
Red Teaming Toolkit Collection
No ratings yet
Red Teaming Toolkit Collection
50 pages
Microsoft Access Projects With Microsoft SQL Se
No ratings yet
Microsoft Access Projects With Microsoft SQL Se
691 pages
Metro Train Management System
No ratings yet
Metro Train Management System
80 pages
Manual I-Reader (EN) (6-2018)
No ratings yet
Manual I-Reader (EN) (6-2018)
48 pages
Pointer and Array Review & Introduction To Data Structure
No ratings yet
Pointer and Array Review & Introduction To Data Structure
39 pages
Print Vlsi
100% (1)
Print Vlsi
104 pages
FoEN Transient Analysis Color 1up
No ratings yet
FoEN Transient Analysis Color 1up
28 pages
Data Sheet: HCPL-0370, HCPL-3700, HCPL-3760
No ratings yet
Data Sheet: HCPL-0370, HCPL-3700, HCPL-3760
14 pages
DSD Lab Manual
No ratings yet
DSD Lab Manual
16 pages
Ir2117 Igbt Driver PDF
No ratings yet
Ir2117 Igbt Driver PDF
18 pages
Asr6501 Asr6502 Qa
No ratings yet
Asr6501 Asr6502 Qa
13 pages
Object Modeling Technique: OMT Is Given by James Rambal
No ratings yet
Object Modeling Technique: OMT Is Given by James Rambal
11 pages
Neural Networks: Chunwei Tian, Yong Xu, Wangmeng Zuo
No ratings yet
Neural Networks: Chunwei Tian, Yong Xu, Wangmeng Zuo
13 pages
EE101B: Common-Source Amplifier
No ratings yet
EE101B: Common-Source Amplifier
48 pages
Deep Learning
No ratings yet
Deep Learning
91 pages
Efficient Fourier-Wavelet Super-Resolution: EDICS Category: ISR-SUPR
No ratings yet
Efficient Fourier-Wavelet Super-Resolution: EDICS Category: ISR-SUPR
29 pages
Patholab Software - My School Project
No ratings yet
Patholab Software - My School Project
34 pages
Tmi 2018 2833635
No ratings yet
Tmi 2018 2833635
14 pages
Applsci 11 38 v2
No ratings yet
Applsci 11 38 v2
18 pages
A Proposed Framework To De-Noise Medical Images Based On Convolution Neural Network
No ratings yet
A Proposed Framework To De-Noise Medical Images Based On Convolution Neural Network
7 pages
FINAL Draft KarmanyaJishnuDhyani
No ratings yet
FINAL Draft KarmanyaJishnuDhyani
14 pages
Communication Hardware
No ratings yet
Communication Hardware
69 pages
Convolutional Autoencoder For Image Denoising
No ratings yet
Convolutional Autoencoder For Image Denoising
11 pages
Image Denoising Method Based On A Deep Convolution Neural Network
No ratings yet
Image Denoising Method Based On A Deep Convolution Neural Network
9 pages
Medical Image Denoising Using Convolutional Denoising Autoencoders
No ratings yet
Medical Image Denoising Using Convolutional Denoising Autoencoders
6 pages
Introduction To WCDMA: 3 Generation Technology
No ratings yet
Introduction To WCDMA: 3 Generation Technology
18 pages
J Patcog 2016 06 008
No ratings yet
J Patcog 2016 06 008
30 pages
KRNL
No ratings yet
KRNL
7 pages
Graphic Ass 01
No ratings yet
Graphic Ass 01
2 pages
Methods For Image Denoising Using Convolutional Neural Network: A Review
No ratings yet
Methods For Image Denoising Using Convolutional Neural Network: A Review
20 pages
11-Deep Learning in Medical Image Analysis
No ratings yet
11-Deep Learning in Medical Image Analysis
30 pages
Uformer A General U-Shaped Transformer For Image Restoration
No ratings yet
Uformer A General U-Shaped Transformer For Image Restoration
17 pages
Aristotle - On Sense and The Sensible
No ratings yet
Aristotle - On Sense and The Sensible
54 pages
A6V10744930 - en - FC - FV 922 924 Fire Voice Battery Calcs v3.0.3
No ratings yet
A6V10744930 - en - FC - FV 922 924 Fire Voice Battery Calcs v3.0.3
1 page
238 761 1 PB
No ratings yet
238 761 1 PB
11 pages
Ieietspc 202108 001
No ratings yet
Ieietspc 202108 001
5 pages
VHDL Statements
No ratings yet
VHDL Statements
34 pages
Smart Noise Cancellation Technical Paper Dec 2020-Carestream
No ratings yet
Smart Noise Cancellation Technical Paper Dec 2020-Carestream
35 pages
ADMM Based Deep Denoiser Prior For Enhancing Single Coil Magnitude MR Images
No ratings yet
ADMM Based Deep Denoiser Prior For Enhancing Single Coil Magnitude MR Images
7 pages
Paper 1-Deep Neural Network Based Methods For Brain Image
No ratings yet
Paper 1-Deep Neural Network Based Methods For Brain Image
5 pages
Medical Image Enhancement Using Super-Asma
No ratings yet
Medical Image Enhancement Using Super-Asma
13 pages
INTRODUCTION
No ratings yet
INTRODUCTION
25 pages
A Triple Deep Image Prior Model For Image Denoising Based On Mixed Priors and Noise Learning
No ratings yet
A Triple Deep Image Prior Model For Image Denoising Based On Mixed Priors and Noise Learning
19 pages
How To Get The Cherry in Castle Excellent (The Unofficial Way) - Tech Fairy
No ratings yet
How To Get The Cherry in Castle Excellent (The Unofficial Way) - Tech Fairy
1 page
Application of Neural Networks in Medical Image Processing: Zhenghao Shi, and Lifeng He
No ratings yet
Application of Neural Networks in Medical Image Processing: Zhenghao Shi, and Lifeng He
4 pages
15312-Toshiba NAND Flash Memory Fact Sheet
No ratings yet
15312-Toshiba NAND Flash Memory Fact Sheet
6 pages
Image Enhancement
No ratings yet
Image Enhancement
5 pages
Processing of MRI
100% (1)
Processing of MRI
23 pages
Deep Learning in Medical Image Analysis
No ratings yet
Deep Learning in Medical Image Analysis
28 pages
Google Cloud Fundamentals Core Infrastructure
No ratings yet
Google Cloud Fundamentals Core Infrastructure
330 pages
Gondara 2016
No ratings yet
Gondara 2016
6 pages
Deep Learning For Accelerated and Robust Mri Reconstruction A Review
No ratings yet
Deep Learning For Accelerated and Robust Mri Reconstruction A Review
53 pages
Machine-Learning-Based Nonlinear Decomposition of
No ratings yet
Machine-Learning-Based Nonlinear Decomposition of
8 pages
Bio-Medical Image Denoising Using Wavelet Transform
No ratings yet
Bio-Medical Image Denoising Using Wavelet Transform
7 pages
NERNet Noise Estimation and Removal Network For Image Denoising
No ratings yet
NERNet Noise Estimation and Removal Network For Image Denoising
12 pages
RETRACTED ARTICLE - Non-Sample Fuzzy Based Convolutional Neural Network Model For Noise Artifact in Biomedical Images
No ratings yet
RETRACTED ARTICLE - Non-Sample Fuzzy Based Convolutional Neural Network Model For Noise Artifact in Biomedical Images
14 pages
Attention-Guided CNN For Image Denoising
No ratings yet
Attention-Guided CNN For Image Denoising
25 pages
Low-Dose Lung CT Image Restoration Using Adaptive Prior Features From Full-Dose Training Database
No ratings yet
Low-Dose Lung CT Image Restoration Using Adaptive Prior Features From Full-Dose Training Database
14 pages
Full Termpaper
No ratings yet
Full Termpaper
14 pages
2021MP - Deep Learning With Noise-To-noise Training For Denoising in SPECT Myocardial Perfusion Imaging
No ratings yet
2021MP - Deep Learning With Noise-To-noise Training For Denoising in SPECT Myocardial Perfusion Imaging
30 pages
CT Metal Artefact Reduction For Hip and Shoulder Implants Using Novel
No ratings yet
CT Metal Artefact Reduction For Hip and Shoulder Implants Using Novel
17 pages
Article 1
No ratings yet
Article 1
28 pages
Project PPT For Mca Final
No ratings yet
Project PPT For Mca Final
38 pages
University Institute of Technology
No ratings yet
University Institute of Technology
16 pages
Shan 2018
No ratings yet
Shan 2018
12 pages
Addressing Signal Alterations Induced in CT Images
No ratings yet
Addressing Signal Alterations Induced in CT Images
13 pages
Implementation of Image Denoising Using Deep Neural Network - Pagenumber
No ratings yet
Implementation of Image Denoising Using Deep Neural Network - Pagenumber
6 pages
2022arxiv - ADL Adversarial Distortion Learning For Denoising and Distortion Removal
No ratings yet
2022arxiv - ADL Adversarial Distortion Learning For Denoising and Distortion Removal
22 pages
Fusion Party V3 User Guide 27-12-2023
No ratings yet
Fusion Party V3 User Guide 27-12-2023
2 pages
Project 1
No ratings yet
Project 1
14 pages
Interface hc-05 With Pic Controller
No ratings yet
Interface hc-05 With Pic Controller
5 pages
Image Denoising Based On Deep Learning
No ratings yet
Image Denoising Based On Deep Learning
7 pages
Image Denoising Using GANSAMPLE
No ratings yet
Image Denoising Using GANSAMPLE
9 pages
An Overview
No ratings yet
An Overview
42 pages
Witunet: A U-Shaped Architecture Integrating CNN and Transformer For Improved Feature Alignment and Local Information Fusion
No ratings yet
Witunet: A U-Shaped Architecture Integrating CNN and Transformer For Improved Feature Alignment and Local Information Fusion
14 pages
Real-Time Denoising of Ultrasound Images Based On Deep Learning
No ratings yet
Real-Time Denoising of Ultrasound Images Based On Deep Learning
16 pages
Wang Uformer A General U-Shaped Transformer For Image Restoration CVPR 2022 Paper
No ratings yet
Wang Uformer A General U-Shaped Transformer For Image Restoration CVPR 2022 Paper
11 pages
Medical Engineering and Physics: Zsolt Adam Balogh, Benedek Janos Kis
No ratings yet
Medical Engineering and Physics: Zsolt Adam Balogh, Benedek Janos Kis
15 pages
Deep-Learning Based Scatter Correction in Digital Radiography
No ratings yet
Deep-Learning Based Scatter Correction in Digital Radiography
4 pages
IEEE 2 Merged
No ratings yet
IEEE 2 Merged
38 pages
Cisco Certification Roadmap 2021
No ratings yet
Cisco Certification Roadmap 2021
1 page
Low-Dose CT Image Synthesis For Domain Adaptation Imaging Using A Generative Adversarial Network With Noise Encoding Transfer Learning
No ratings yet
Low-Dose CT Image Synthesis For Domain Adaptation Imaging Using A Generative Adversarial Network With Noise Encoding Transfer Learning
15 pages
Aggarwal 2014
No ratings yet
Aggarwal 2014
5 pages
Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising
No ratings yet
Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising
12 pages
Mitigating Noise From Biomedical Images Using Wavelet Transform Techniques
No ratings yet
Mitigating Noise From Biomedical Images Using Wavelet Transform Techniques
6 pages
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
From Everand
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
Fouad Sabry
No ratings yet
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet

Edge Enhancement Based Transformer For Medical Image Denoising PDF

Uploaded by

Edge Enhancement Based Transformer For Medical Image Denoising PDF

Uploaded by

Eformer: Edge Enhancement based Transformer for Medical Image

Birla Institute of Technology and Science, Pilani

Abstract cause for concern as the patient is exposed to radioactive

LC2D Block LC2U Block

Convolution LC2D Block LC2U Block

Table 1. Comparison between losses used by different methods;

taining edge information are efficiently concatenated with

Table 2. Comparison with previous methods evaluated on AAPM

encoded feature map to another LeWin Transformer block

Birla Institute of Technology and Science, Pilani

For our research work, we have utilized the AAPM-

Figure 5. LeWin Transformer Block

operator with a traditonal feed-forward network and is vi-

You might also like