0% found this document useful (0 votes)
15 views11 pages

Project1 Report

This project investigates deep learning methods for denoising low-dose CT images, focusing on CycleGAN and Noise2Score techniques. CycleGAN achieved a PSNR improvement of 2.85 dB and SSIM improvement, while Noise2Score improved PSNR by 3.06 dB and met project criteria. The report includes theoretical background, implementation details, and performance evaluations using the AAPM Low Dose CT Grand Challenge dataset.

Uploaded by

손호열
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views11 pages

Project1 Report

This project investigates deep learning methods for denoising low-dose CT images, focusing on CycleGAN and Noise2Score techniques. CycleGAN achieved a PSNR improvement of 2.85 dB and SSIM improvement, while Noise2Score improved PSNR by 3.06 dB and met project criteria. The report includes theoretical background, implementation details, and performance evaluations using the AAPM Low Dose CT Grand Challenge dataset.

Uploaded by

손호열
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

AI618: Generative Models & Unsupervised Learning

Individual Project 1: Unsupervised / Self-supervised CT


Denoising
Hoyeol Sohn
Student ID: 20244279
KAIST

May 20, 2025


Abstract
Computed Tomography (CT) is a vital medical imaging modality, but concerns over radiation
exposure motivate the use of low-dose protocols, which unfortunately introduce significant noise.
This project explores deep learning techniques for CT image denoising without requiring perfectly
matched low-dose/high-dose image pairs. We implement and evaluate two prominent methods:
CycleGAN, an unsupervised approach for unpaired image-to-image translation between quarter-
dose and full-dose CT images, and Noise2Score, a self-supervised method trained only on full-dose
images to remove synthetically added Poisson noise using score function estimation via Tweedie’s
formula. Both methods were implemented using PyTorch based on provided skeleton code and
evaluated on the AAPM Low Dose CT Grand Challenge dataset. CycleGAN achieved a PSNR
improvement of 2.85 dB (from 34.17 dB to 37.02 dB) and an SSIM improvement from 0.893 to
0.927, falling short of the target metrics outlined in the course evaluation guidelines. Noise2Score
successfully removed Poisson noise, improving PSNR by 3.06 dB (from 27.60 dB to 30.67 dB)
and SSIM from 0.777 to 0.853, meeting the project’s quantitative criteria. This report details the
theoretical background, describes the code implementation, and analyzes the quantitative/qualitative
results for both approaches.

2
Contents
1 Introduction 4
1.1 CT Denoising Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Motivation and Project Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Methods and Implementation 4


2.1 Dataset and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 CycleGAN Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Loss Functions and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Noise2Score Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Network Architecture (AR-DAE) . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Loss Function and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Results 7
3.1 CycleGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 Loss Curve Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.2 Qualitative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.3 Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Noise2Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Loss Curve Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 Qualitative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.3 Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Discussion 10

3
AI618 Project 1 Hoyeol Sohn (20244279)

1 Introduction
1.1 CT Denoising Problem
Computed Tomography (CT) imaging provides crucial cross-sectional views of the human body for
medical diagnosis. However, CT utilizes ionizing radiation, posing potential health risks. Reducing the
radiation dose is desirable but inherently increases image noise and reduces contrast. Low-Dose CT
(LDCT) denoising aims to computationally remove this noise, recovering an image quality comparable
to that of a full-dose CT (FDCT) scan.

1.2 Motivation and Project Goal


Supervised denoising requires paired LDCT/FDCT data, which is difficult to obtain. This project ex-
plores alternatives:

1. CycleGAN: An unsupervised method using unpaired LDCT (Quarter-dose) and FDCT images
[1].

2. Noise2Score: A self-supervised method using only FDCT images to learn to remove Poisson
noise [2].

We implement these methods based on course skeletons [3] using the AAPM dataset [4] and evaluate
their performance.

2 Methods and Implementation


2.1 Dataset and Preprocessing
The AAPM Low Dose CT Grand Challenge dataset [4] (3839 train / 421 test slices) was used. Images
are 512x512 ‘.npy‘ files. Preprocessing Pipeline:

1. Load ‘.npy‘ file (linear attenuation coefficients µ).

2. Convert to Hounsfield Units (HU): HU = (µ − 0.0192)/0.0192 × 1000.

3. Clip HU values below -1000 to -1000.

4. Normalize:

• CycleGAN: xnorm = HUclipped /4000.


• Noise2Score: ynorm = HUclipped /4000 + 0.256 (offset for non-negativity).

5. Training Augmentation: ‘ToTensor‘, Random Horizontal/Vertical Flips (p = 0.5), Random 128x128


‘RandomCrop‘.

6. Testing: ‘ToTensor‘ only, full 512x512 images.

This pipeline was implemented within the ‘CTD ataset‘classprovidedintheskeletonnotebooks.

2.2 CycleGAN Implementation


CycleGAN learns bidirectional mappings (GF 2Q : Full → Quarter, GQ2F : Quarter → Full) using
generators (G) and discriminators (D) trained with adversarial and cycle-consistency losses [1].

4
AI618 Project 1 Hoyeol Sohn (20244279)

2.2.1 Network Architecture


• Generators (GF 2Q , GQ2F ): Implemented as a ResNet-based architecture following [5], adapted
for single-channel (grayscale) input/output. Key parameters were ngf = 32 (filters in first layer)
and nresidual blocks = 6. Instance normalization and ReLU activations were used, with a final Tanh
activation. The Python class ‘Generator‘ encapsulates this structure, utilizing a ‘ResidualBlock‘
class for the core components.

• Discriminators (DF , DQ ): Implemented as a 70x70 PatchGAN, adhering strictly to the TA


guideline specifications (4x4 convolutions, specific strides [2,2,2,1,1], InstanceNorm after first
layer, LeakyReLU activations). Filter counts start at ndf = 32. The Python class ‘Discriminator‘
defines this network.

The networks were instantiated in the notebook as:


1 # Generators (Full <-> Quarter)
2 G_F2Q = Generator(in_channels=1, out_channels=1, ngf=32, n_residual_blocks=6).to(
device)
3 G_Q2F = Generator(in_channels=1, out_channels=1, ngf=32, n_residual_blocks=6).to(
device)
4 # Discriminators
5 D_F = Discriminator(in_channels=1, ndf=32).to(device) # For Full dose images
6 D_Q = Discriminator(in_channels=1, ndf=32).to(device) # For Quarter dose images

2.2.2 Loss Functions and Training

2.2.3 Loss Functions


• Adversarial Loss (LSGAN): Least Squares GAN loss [6] was used, implemented via nn.MSELoss().
For discriminator DY and generator GX→Y :

Ladv (GX→Y , DY ) = Ey∼PY [(DY (y) − 1)2 ] + Ex∼PX [(DY (GX→Y (x)))2 ] (1)
2
Ladv (GX→Y ) = Ex∼PX [(DY (GX→Y (x)) − 1) ] (2)

The total discriminator loss for DY during training is:


LDY = 0.5 × (E[(DY (y) − 1)2 ] + E[(DY (G(x)buffered ))2 ]). A similar loss applies for DX .

• Cycle Consistency Loss: Encourages GY →X (GX→Y (x)) ≈ x and GX→Y (GY →X (y)) ≈ y.
Implemented using nn.L1Loss(). Weighted by λcycle = 10.

Lcyc (GX→Y , GY →X ) = Ex∼PX [∥GY →X (GX→Y (x))−x∥1 ]+Ey∼PY [∥GX→Y (GY →X (y))−y∥1 ]
(3)

• Identity Loss: Encourages GX→Y (y) ≈ y and GY →X (x) ≈ x. Implemented using nn.L1Loss().
Weighted by λiden = 5.

Lidt (GX→Y , GY →X ) = Ey∼PY [∥GX→Y (y) − y∥1 ] + Ex∼PX [∥GY →X (x) − x∥1 ] (4)

Training Loop: Implemented by filling the placeholders in the skeleton notebook. Key steps per itera-
tion included:

1. Toggling discriminator gradients using set requires grad.

2. Calculating generator outputs (xF Q , xQF ) and reconstructed outputs (xF QF , xQF Q ).

3. Calculating identity mapping outputs (xF F , xQQ ).

5
AI618 Project 1 Hoyeol Sohn (20244279)

4. Computing individual loss components (Gadv , Gcyc , Giden ).

5. Backpropagating total generator loss LG and stepping the generator optimizer (G optim).

6. Calculating discriminator losses LDF , LDQ using real images and fake images retrieved from
a replay buffer (fake A buffer, fake B buffer), ensuring .detach() is called on fake
images fed to the discriminators.

7. Backpropagating discriminator losses (LDF and LDQ separately) and stepping the discriminator
optimizer (D optim).

Optimizer/Scheduler: Adam optimizer was used (β1 = 0.5, β2 = 0.999, lr = 2 × 10−4 ) with a
WarmupLinearLR scheduler (1000 warmup steps, followed by linear decay).
Hyperparameters: Batch size = 16, Epochs = 100, λcycle = 10, λiden = 5.

2.3 Noise2Score Implementation


Noise2Score estimates the score function ∇y log p(y) of noisy data using an AR-DAE network (RΘ )
and applies Tweedie’s formula for denoising [2].

2.3.1 Network Architecture (AR-DAE)


The project required using the same architecture as the CycleGAN generator for the AR-DAE (RΘ ). We
used the ‘UNet‘ class (structurally identical to ‘Generator‘, but without the final Tanh) with ngf = 64
and nresidual blocks = 9.
1 # AR-DAE Network (same architecture as CycleGAN Generator)
2 ARDAE = UNet(in_channels=1, out_channels=1, ngf=64, n_residual_blocks=9).to(device)

The network takes a noisy image ynoisy and outputs the score estimate ℓ̂′ (ynoisy ) ≈ RΘ (ynoisy ).

2.3.2 Loss Function and Training


• Loss: The AR-DAE loss (Eq. 5) was implemented using nn.MSELoss().

LAR−DAE (Θ) = E ∥u + σa RΘ (y + σa u)∥2 (5)

• Training Loop: Implemented within the skeleton notebook. Key steps per iteration:

1. Load clean full-dose image batch y (x F input).

2. Sample noise scale σa uniformly from [0.0, 0.2].

3. Sample Gaussian noise u ∼ N (0, I).

4. Create noisy input ynoisy = y + σa u.

5. Pass ynoisy through ARDAE to get output RΘ (ynoisy ).

6. Calculate loss: loss = criterion(u + σa × output, torch.zeros like(u)).

7. Backpropagate and step optimizer.

• Optimizer/Scheduler: Adam (β1 = 0.05, β2 = 0.999, lr = 4e − 4) with a ‘WarmupLinearLR‘


scheduler (1000 warmup steps).

• Hyperparameters: Batch size = 32, Epochs = 100, η = 0.01 (for Poisson inference).

6
AI618 Project 1 Hoyeol Sohn (20244279)

2.3.3 Inference
The inference step involves applying Tweedie’s formula for Poisson noise (η = 0.01) using the trained
network:
1 # Inside inference loop:
2 # y = x_F_noisy (Poisson noise added with eta=0.01)
3 l_hat = ARDAE(y) # Get score estimate
4 # Apply Tweedie’s formula for Poisson:
5 x_F_recon = (y + eta / 2.0) * torch.exp(eta * l_hat)

The output x F recon is then post-processed (offset removed, scaled back to HU).

3 Results
3.1 CycleGAN
3.1.1 Loss Curve Analysis
Figure 1 shows the adversarial losses. The generator losses (Gadv F , Gadv Q ) and discriminator losses
(Dadv F , Dadv Q ) exhibit some instability in the initial epochs (approx. 0-20), which is common in
GAN training. Subsequently, they converge and stabilize around the theoretical equilibrium of 0.25 for
LSGAN [6]. The losses remain stable with minimal oscillations in the later epochs, indicating successful
convergence of the adversarial training process. The final average losses (Dadv ≈ 0.248, Gadv ≈ 0.255)
are very close to the equilibrium.
Figure 2 shows the cycle-consistency and identity losses. Both types of losses decrease rapidly
in the early epochs and continue to decrease steadily throughout training, converging to low values
(Gcycle ≈ 0.007 − 0.011, Giden ≈ 0.006 − 0.009). This demonstrates that the generators are effectively
learning the structural mapping constraints imposed by these losses.

Figure 1: CycleGAN Adversarial Losses. Showing Gadv F , Gadv Q , Dadv F , Dadv Q vs. epochs.

7
AI618 Project 1 Hoyeol Sohn (20244279)

Figure 2: CycleGAN Cycle and Identity Losses. Showing Gcycle F , Gcycle Q , Giden F , Giden Q vs.
epochs.

3.1.2 Qualitative Example


Figure 3 shows a representative result. The input quarter-dose image (PSNR: 31.64 dB, SSIM: 0.859
relative to GT) is visibly noisy. The CycleGAN output achieves significant denoising and structural
recovery (PSNR: 35.06 dB, SSIM: 0.908 relative to GT), appearing much closer to the full-dose ground
truth.

Figure 3: CycleGAN Qualitative Results Example. Left: Ground Truth Full-dose CT. Middle: Input
Quarter-dose CT (PSNR 31.64 / SSIM 0.859). Right: CycleGAN Output (PSNR 35.06 / SSIM 0.908).

3.1.3 Quantitative Evaluation


Table 1 summarizes the average performance over the test set.
CycleGAN improved the average PSNR by 37.0158 − 34.1695 = 2.8463 dB and the SSIM to
0.9271. According to the evaluation criteria [3]:
• PSNR increase ≥ 3.5 dB? No (2.85 dB achieved).

8
AI618 Project 1 Hoyeol Sohn (20244279)

Table 1: CycleGAN Quantitative Evaluation (Average over Test Set)

Comparison Mean PSNR (dB) Mean SSIM


Input (Quarter-dose) vs. GT 34.1695 0.8932
Output (CycleGAN) vs. GT 37.0158 0.9271

• Output SSIM ≥ 0.95? No (0.927 achieved).

While demonstrating clear improvement and stable training convergence, the CycleGAN implementa-
tion under these settings did not meet the stringent quantitative targets for full credit. The performance
gap might be attributed to the limited network capacity (ngf = 32, 6 blocks) chosen for faster training
or the inherent complexity of unpaired translation.

3.2 Noise2Score
3.2.1 Loss Curve Analysis
Figure 4 displays the Noise2Score training loss (LAR−DAE ). The loss shows a characteristic sharp
decrease in the initial epochs, followed by stable convergence towards a low value (≈ 0.3). This behavior
indicates that the AR-DAE network effectively learned to estimate the score function required by the
training objective.

Figure 4: Noise2Score Training Loss (loss score). Showing loss vs. epochs.

3.2.2 Qualitative Example


Figure 5 shows denoising of Poisson noise (η = 0.01). The noisy input has PSNR 30.19 dB and SSIM
0.864. The Noise2Score output significantly reduces the noise, achieving PSNR 33.14 dB and SSIM
0.918, closely resembling the ground truth.

3.2.3 Quantitative Evaluation


Table 2 summarizes the average performance over the test set for Poisson denoising.
Noise2Score improved the average PSNR by 30.6683 − 27.6044 = 3.0639 dB and the SSIM to
0.8533. According to the evaluation criteria [3]:

9
AI618 Project 1 Hoyeol Sohn (20244279)

Figure 5: Noise2Score Qualitative Results Example. Left: Ground Truth (Clean Full-dose). Middle:
Input (Full-dose + Poisson Noise, η = 0.01, PSNR 30.19 / SSIM 0.864). Right: Noise2Score Output
(PSNR 33.14 / SSIM 0.918).

Table 2: Noise2Score Quantitative Evaluation (Average over Test Set, Poisson Noise η = 0.01)

Comparison Mean PSNR (dB) Mean SSIM


Input (Poisson Noisy) vs. GT 27.6044 0.7774
Output (Noise2Score) vs. GT 30.6683 0.8533

• PSNR increase ≥ 3.0 dB? Yes (3.06 dB achieved).

• Output SSIM ≥ 0.85? Yes (0.853 achieved).

The Noise2Score implementation successfully met both quantitative criteria for full credit, demonstrat-
ing its effectiveness for self-supervised denoising when the noise model is known.

4 Discussion
This project implemented and evaluated CycleGAN and Noise2Score for CT denoising under unsuper-
vised and self-supervised settings, respectively.
The CycleGAN implementation successfully learned the domain translation task, as evidenced by
the stable convergence of adversarial and cycle-consistency losses and clear visual noise reduction.
However, the final quantitative performance (2.85 dB PSNR gain, 0.927 SSIM) fell short of the high tar-
gets set by the evaluation guideline (3.5 dB, 0.95 SSIM). This suggests that while CycleGAN provides a
viable framework for unsupervised denoising, achieving top-tier quantitative results might require larger
network capacity, longer training times, or potentially more advanced cycle-consistency formulations for
this specific medical imaging task.
The Noise2Score implementation effectively addressed the self-supervised Poisson denoising task.
The AR-DAE network successfully learned the score function from clean data, and applying Tweedie’s
formula for Poisson noise yielded significant improvements, meeting the quantitative targets (3.06 dB
PSNR gain, 0.853 SSIM). This confirms the theoretical soundness and practical utility of the Noise2Score
framework for cases where the noise distribution is known. The results highlight the power of score-
based methods in self-supervised learning.
In conclusion, both methods offer valuable pathways for CT denoising without paired data. Cycle-
GAN addresses the direct domain translation problem but may face challenges in reaching the highest
quantitative benchmarks without extensive tuning. Noise2Score excels at removing specific noise types
by learning from clean data but relies on knowing the noise characteristics for inference. Noise2Score
met the performance criteria for this project’s specific task.

10
AI618 Project 1 Hoyeol Sohn (20244279)

References
[1] Jun-Yan Zhu et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Net-
works”. In: arXiv preprint arXiv:1703.10593 (2017). arXiv: 1703.10593 [cs.CV].
[2] Kwanyoung Kim and Jong Chul Ye. “Noise2Score: Tweedie’s Approach to Self-Supervised Image
Denoising without Clean Images”. In: arXiv preprint arXiv:2106.07009 (2021). arXiv: 2106 .
07009 [eess.IV].
[3] AI618 Teaching Staff. [AI618]project1 evaluation guideline.pdf. Course Material. 2025.
[4] American Association of Physicists in Medicine. Low Dose CT Grand Challenge. https : / /
www.aapm.org/GrandChallenge/LowDoseCT/. Accessed: May 20, 2025.
[5] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. “Perceptual losses for real-time style transfer and
super-resolution”. In: European conference on computer vision. Springer. 2016, pp. 694–711.
[6] Xudong Mao et al. “Least squares generative adversarial networks”. In: Proceedings of the IEEE
international conference on computer vision. 2017, pp. 2794–2802.

11

You might also like