Term
Term
The initial step in this methodology is the generation of self-augmented noisy images, a process
aimed at creating a secondary noisy image set from a single available noisy image. In traditional
Noise2Noise setups, two noisy images with identical content but different noise are required,
which can be challenging to obtain. To overcome this, we designed a self-augmented network
that learns to produce a slightly altered version of a noisy image. This network takes the original
noisy image as both input and validation data and, through the learning process, generates an
output that mimics the original image’s noise distribution without exact replication. This is
achieved by leveraging imperfections within deep learning’s training dynamics, where the
network balances learning outcomes across all training points without achieving complete
convergence to each point. Consequently, this behavior produces a new noisy image that closely
resembles the original but exhibits slight variations, enabling it to serve as an input for
subsequent Noise2Noise denoising. We employed three distinct deep learning models—ResNet,
U-Net, and Vision Transformer (ViT)—to test this self-augmentation approach. Each model
offers unique structural advantages suitable for image feature reconstruction: ResNet
incorporates skip connections to prevent data loss, U-Net uses an encoder-decoder structure that
collapses and reconstructs features while maintaining spatial context, and ViT leverages attention
mechanisms that capture relationships between image patches, enhancing feature retention.
These models were optimized using the mean squared error (MSE) loss function, paired with
variance normalization to mitigate overfitting risks. By adding variance-based normalization, the
loss function adapts to the noise distribution across the RGB channels, especially effective in
handling regions with uniform pixel values where the network might otherwise struggle to
predict noise accurately. This normalization step was crucial to producing realistic noisy images
that would serve effectively as training pairs in the Noise2Noise denoising process.
▪ Application in the Paper: The network generates a set of noisy images that resemble the
original noisy image but are different enough to be used for training the Noise2Noise
denoising network. This strategy is essential in scenarios where only one noisy image is
available for training.
2.Deep Learning Architectures for Self-Augmentation:
For the self-augmentation process, I explored three prominent deep learning architectures:
ResNet, U-Net, and Vision Transformer (ViT), each selected for its unique capacity to handle
image feature extraction and reconstruction, which are essential for generating realistic noisy
images.
• ResNet employs a convolutional neural network (CNN) structure with residual
connections, which link earlier layers directly to later layers, helping to retain important
features from initial layers throughout the network. This skip-connected design reduces
data loss during training and is especially beneficial for tasks requiring precise feature
preservation. However, due to these residual connections, ResNet occasionally produces
slight pixel errors in the generated noisy images as it learns to reconstruct fine details.
▪ Application in the Paper: A 16-layer ResNet model is used for the self-
augmentation task. While effective, it tends to overfit and produce pixel errors in
noisy image generation due to its skip connections that allow for potential overfitting
in deeper layers.
• U-Net operates as an encoder-decoder network designed to compress and reconstruct
image features, making it well-suited for tasks requiring feature localization. The
architecture includes skip connections between corresponding encoder and decoder
layers, which help retain spatial details and enhance noise distribution uniformity in the
output. This structure is commonly used in medical imaging but also proves effective for
self-augmented noisy image generation, as it balances noise across image regions while
preserving feature continuity.
▪ Application in the Paper: An 8-layer U-Net model is used for generating augmented
noisy images. U-Net showed better performance than ResNet due to its structure,
which allows it to capture important features during downsampling and upsampling.
• Vision Transformer (ViT) introduces an attention-based mechanism that divides the input
image into patches and processes them independently, learning relationships between
patches using attention layers rather than standard convolutions. This patch-based
approach, which uses key-query-value matching, makes ViT well-suited for capturing
distributed noise patterns. ViT demonstrated a high degree of accuracy in simulating
noisy images closely aligned with the original noise characteristics, as it effectively
captures both local and global noise distributions across image patches.
▪ Application in the Paper: ViT is used to generate noisy images by learning the
relationships between image patches. It demonstrated the best performance in
simulating noisy images, as its self-attention mechanism helped it closely mimic the
original noise distribution.
Each model was trained with mean squared error (MSE) loss as the primary optimization target,
alongside variance normalization to control the noise distribution across RGB channels,
particularly in uniform regions where standard MSE can lead to overfitting. The addition of
variance normalization ensures that each architecture adapts more effectively to the noise
characteristics, thereby generating noisy outputs that serve as reliable training pairs for the
Noise2Noise denoising network. Through this methodology, I was able to evaluate each
architecture's effectiveness in self-augmented noisy image generation, with ViT and U-Net
showing particular strengths in maintaining consistent noise distributions.
3.GANs (Generative Adversarial Networks):
In addition to ResNet, U-Net, and Vision Transformer (ViT), I also experimented with
Generative Adversarial Networks (GANs) for the self-augmentation process. GANs consist of
two competing networks: a generator, which creates new images, and a discriminator, which
evaluates the authenticity of these generated images by comparing them with the original noisy
input. This adversarial setup drives the generator to produce images that closely resemble the
input, as it aims to "fool" the discriminator into classifying generated images as real.
For this task, I used a U-Net structure as the generator due to its feature-preserving properties
and a convolutional neural network (CNN) with multiple layers as the discriminator. This
discriminator network was also supported by a feature extraction model from VGG-19 to
enhance the discriminator’s ability to differentiate subtle noise features. During training, the
generator received the original noisy images and produced output images that the discriminator
evaluated against the input using both MSE (mean squared error) for content loss and binary
cross-entropy for adversarial loss. This dual-loss approach allows the GAN to simultaneously
minimize pixel-level differences and produce visually realistic noisy images.GANs presented a
unique challenge, as the adversarial nature of training can often lead to overfitting, especially in
image regions with high feature variability. This sometimes resulted in minor artifacts in the
generated noisy images, as the generator focused on creating variations in an attempt to meet the
discriminator’s criteria. However, by carefully tuning the balance between MSE and adversarial
loss, I was able to mitigate some of these artifacts, allowing the GAN-generated noisy images to
approximate the input noise characteristics closely. Although GANs are highly effective in
capturing complex patterns, I found them less reliable than ViT and U-Net for generating
consistent noisy images due to their sensitivity to the adversarial training dynamics.
Nevertheless, GANs provided valuable insights into the role of adversarial learning in the self-
augmentation process, especially for cases where capturing detailed noise patterns is critical.
▪ Application in the Paper: The paper uses GANs for noisy image generation. U-Net
serves as the generator, while a CNN-based discriminator evaluates the generated
images. GANs performed poorly compared to the other models due to instability in
training and the generation of artifacts during noisy image simulation.
RELATED WORK
Qu, Z., Zhang, Y., Sun, Y., & Lin, X[1] has proposed a paper A New Generative Adversarial
Network for Texture Preserving Image Denoising. the Noise2Noise framework primarily builds
upon classical and deep learning-based image denoising techniques. Traditional methods often
utilized spatial and frequency domain filtering, sparse representations, and optimization
techniques to reduce noise. Recently, convolutional neural networks (CNNs), including DnCNN
and autoencoders, have achieved improved denoising by learning to map noisy images to cleaner
representations. Generative Adversarial Networks (GANs) have also shown promise in texture-
preserving denoising, with adversarial training guiding the generator to produce realistic, noise-
free images. This approach is beneficial in applications where maintaining image details, such as
textures, is critical.
Samik Banerjee, Sukhendu Das[2], SD-GAN: Structural and Denoising GAN reveals facial parts
under occlusion. Classical methods like Non-Local Means and BM3D rely on filtering
techniques and statistical models to remove noise. More recent work with deep learning includes
models like DnCNN, which uses convolutional neural networks trained with clean-noisy image
pairs to achieve better denoising results. Another significant development is the use of
Generative Adversarial Networks (GANs) for image denoising and inpainting, which generates
images with high perceptual quality by introducing adversarial training. However, Noise2Noise
diverges from these methods by leveraging pairs of noisy images instead of clean ones, which
allows denoising even when a clean reference image is unavailable. Extensions of this approach,
like self-supervised methods, have further adapted Noise2Noise for single-image denoising
scenarios by simulating multiple noisy versions of an image.
Xin Cheng , Jingmei Zhou , Jiachun Song, and Xiangmo Zhao[3], A Highway Traffic Image
Enhancement Algorithm Based on Improved GAN in Complex Weather Conditions. various
traditional and deep learning-based image enhancement methods, particularly in challenging
weather conditions. Traditional techniques, such as histogram equalization and Retinex, often
struggle with complex textures and may fail to restore details in overexposed images. In contrast,
deep learning approaches, including the Gated Context Aggregation Network (GCANet) and
specialized object detection networks, have been developed to improve clarity and adapt to
varying weather conditions. Generative Adversarial Networks (GANs) have also been employed
for tasks like transforming turbid underwater images into clearer versions, although they face
challenges with distribution differences between synthetic and real images. Despite these
advancements, many existing methods still fall short in restoring fine details and maintaining
color fidelity, underscoring the need for more effective solutions to enhance images captured
under adverse conditions
SONGKUI CHEN 1 , DAMING SHI[4], Image Denoising With Generative Adversarial
Networks and Its Application to Cell Image Enhancement. The paper reviews existing image
denoising methods, noting that traditional MSE-based techniques often produce blurry results by
converging on average solutions, which compromises texture detail recovery. While some
approaches integrate feature extraction networks, they remain limited by their sensitivity to
specific features relevant to their training tasks. In contrast, the proposed framework utilizes
Wasserstein Generative Adversarial Networks (WGAN) to effectively tackle the blurriness issue
by focusing on the distribution of real clean images, thereby enhancing detail recovery,
especially in cell image denoising
HAIBO ZHANG 1 AND KOUICHI SAKURA[5], Conditional Generative Adversarial Network-
Based Image Denoising for Defending Against Adversarial Attack. Key approaches include
Adversarial Logit Pairing (ALP), which aligns predictions for clean and adversarial samples to
improve accuracy. Other strategies involve gradient traps and high-level representation guided
denoisers, which enhance robustness by addressing discrepancies in adversarial robustness.
Additionally, free adversarial training optimizes training time while maintaining model accuracy.
The use of Generative Adversarial Networks (GANs) for defense without prior attack knowledge
is also highlighted. Lastly, purification processes introduced by Li et al. and Hwang et al. focus
on mitigating adversarial perturbations to protect against white-box attacks. These methods
collectively showcase the diverse strategies employed to enhance model security against
adversarial threats.
Shaobo Zhao, Sheng Lin , Xi Cheng[6], Dual-GAN Complementary Learning for Real-World
Image Denoising. several notable methods have been proposed, including DnCNNs, FFDNet,
and CBDNet, which leverage deep learning techniques to address noise in images. Anwar and
Barnes introduced RIDNet, a single-stage blind real-world image denoising network, further
contributing to the field. Chang et al. proposed SADNet, which focuses on efficient single image
blind noise removal. However, many existing CNN-based methods rely on conventional training
schemes using pixel-level L1 or L2 losses, which can result in image blurring and limit their
effectiveness in real-world scenarios. The challenges of handling complex noise distributions,
particularly those that are spatially varying, have prompted the exploration of complementary
learning strategies that combine the strengths of both denoised image learning and noise
learning. Despite these advancements, many single GAN-based methods still face issues related
to network complexity and training difficulties, highlighting the need for more effective
approaches like the proposed DGCL strategy, which aims to overcome these limitations and
improve denoising performance.
Shaobo Zhao, Sheng Lin , Xi Cheng[6], Dual-GAN Complementary Learning for Real-World
Image Denoising. various techniques like inpainting and denoising have been developed to
enhance image quality by utilizing contextual information. The Generative Adversarial Network
(GAN) framework, introduced in 2014, has been pivotal in this field, where a Generator creates
images and a Discriminator assesses their authenticity. The paper presents a novel method for
removing mesh stains, employing an Improved Conditional GAN followed by a Semantic
Network for detail refinement. This two-phase approach significantly outperforms existing
methods, marking a pioneering effort in addressing mesh stain removal in image restoration
HAIJUN HU 1 , BO GAO3 , ZHIYUAN SHEN[7], Image Smear Removal via Improved
Conditional GAN and Semantic Network. Image restoration techniques often struggle due to the
requirement of pairing defective images with target images, which limits their effectiveness.
Traditional methods like automatic encoders face challenges in minimizing reconstruction loss,
leading to significant differences between generated and target images. The advent of Generative
Adversarial Networks (GANs) has introduced a new paradigm, where the generator creates
images that mimic real ones, while the discriminator distinguishes between real and generated
images. However, conventional GAN training can encounter issues such as error oscillations and
gradient problems, prompting the use of more stable training methods like Wasserstein GANs
(WGAN). Recent advancements have also utilized deep learning architectures, such as VGG-16,
to enhance feature extraction by fusing low-dimensional and high-dimensional features,
improving the input for deep convolutional networks. Despite these advancements, challenges in
visual consistency and clarity in restored images persist, particularly in complex backgrounds.
The proposed method in this paper aims to overcome these limitations by employing GANs and
multi-scale feature fusion to enhance the accuracy and realism of restored images
Xin Jin1 , Ying Hu1 , Chu-Yue Zhang1[8], Image restoration method based on GAN and multi-
scale feature fusion. an image restoration method utilizing Generative Adversarial Networks
(GAN) and multi-scale feature fusion to address the limitations of existing algorithms that
require paired defective and target images. Traditional methods often struggle with accuracy and
visual consistency, while recent GAN-based approaches, such as multi-scale discriminators, can
complicate training and yield unnatural results. The proposed method employs a VGG-16-based
encoder-decoder structure to extract features from defective images, fusing low- and high-
dimensional features to enhance restoration quality. By incorporating Wasserstein GAN
(WGAN) principles and L1 loss, the method improves the similarity between restored and target
images. Experimental results demonstrate that this approach yields more realistic restorations,
achieving higher accuracy and visual consistency compared to traditional methods, although
some clarity gaps remain. Future work will focus on enhancing restoration capabilities in
complex backgrounds.
LUAN THANH TRINH AND TOMOKI HAMAGAMI[9], Latent Denoising Diffusion GAN:
Faster Sampling, Higher Image Quality. The paper introduces the Latent Denoising Diffusion
GAN (LDDGAN), a novel approach to image generation that addresses the slow inference speed
of diffusion models while enhancing image quality and diversity. By utilizing pre-trained
autoencoders, LDDGAN compresses images into a low-dimensional latent space, allowing for
faster sampling and improved computational efficiency. The model employs a Weighted
Learning strategy to balance adversarial and reconstruction losses, which enhances image quality
while maintaining diversity. Experimental results demonstrate that LDDGAN achieves state-of-
the-art inference speeds and competitive image quality compared to existing models like
DiffusionGAN and Wavelet Diffusion, making it suitable for real-time applications. The findings
suggest that eliminating the reliance on Gaussian distributions in the latent space can
significantly improve performance, paving the way for future research in high-fidelity diffusion
models.
IBRAHIM H. EL-SHAL 1 , OMAR M. FAHM[10], License Plate Image Analysis Empowered
by Generative Adversarial Neural Networks. License Plate Recognition (ALPR) encompasses
various methodologies aimed at enhancing the detection and recognition of license plates under
challenging conditions. Traditional approaches include edge-based, color-oriented, and texture-
based methods, each with its strengths and limitations. Edge-based techniques leverage the
distinct rectangular shape of license plates but struggle with complex backgrounds. Color-
oriented methods utilize the unique color characteristics of plates but are sensitive to lighting
variations. Texture-based approaches focus on pixel intensity distributions, offering more
discriminative features but requiring higher computational resources. Recent advancements have
shifted towards Deep Learning (DL) techniques, particularly Deep Convolutional Neural
Networks (DCNNs) and Generative Adversarial Networks (GANs), which have shown promise
in learning high-level features for improved generalization across various computer vision tasks.
Notable contributions include the Super-Resolution Generative Adversarial Network (SRGAN),
which enhances image quality for better recognition accuracy, and various adaptations of GANs
that address specific challenges in license plate image enhancement and character recognition.
These developments highlight the ongoing evolution of ALPR systems, emphasizing the need for
robust solutions that can operate effectively in real-world scenarios.
CONCLUSION
This research presents a novel approach to enhancing the Noise2Noise (N2N) algorithm for
image denoising, specifically targeting single-image and blind noise scenarios by generating
noisy images from limited datasets. By leveraging the imperfections inherent in the deep learning
training process, the proposed self-augmented noisy image network enables the generation of
new noisy images that can be used for training and validation, effectively facilitating noise
removal. Experimental results demonstrate that this self-augmentation strategy achieves
performance comparable to other unsupervised denoising methods at lower noise levels,
although it struggles with higher noise levels and real-world images due to a lack of
understanding of image features. The findings underscore the importance of accurately
estimating noise characteristics in real-world applications to enhance denoising performance and
minimize pixel errors.
references
1. Qu, Z., Zhang, Y., Sun, Y., & Lin, X(2018), A New Generative Adversarial Network
for Texture Preserving Image Denoising ,IEEE,5356-5361
2. Samik Banerjee, Sukhendu Das(2020), SD-GAN: Structural and Denoising GAN
reveals facial parts under occlusion,arxiv,1-24
3. Xin Cheng , Jingmei Zhou , Jiachun Song, and Xiangmo Zhao,(2023), A Highway
Traffic Image Enhancement Algorithm Based on Improved GAN in Complex
Weather Conditions,IEEE,8716-8726
4. SONGKUI CHEN 1 , DAMING SHI(2020), Image Denoising With Generative
Adversarial Networks and Its Application to Cell Image Enhancement ,IEEE, 82819-
82831
5. HAIBO ZHANG 1 AND KOUICHI SAKURA(2022), Conditional Generative
Adversarial Network-Based Image Denoising for Defending Against Adversarial
Attack,IEEE Access, 169031-169043
6. Shaobo Zhao, Sheng Lin , Xi Cheng(2024), Dual-GAN Complementary Learning
for Real-World Image Denoising,IEEE, 355-366
7. HAIJUN HU 1 , BO GAO3 , ZHIYUAN SHEN(2020), Image Smear Removal via
Improved Conditional GAN and Semantic Network,IEEE Access, 113104-113111
8. Xin Jin1 , Ying Hu1 , Chu-Yue Zhang1(2020), Image restoration method based on
GAN and multi-scale feature fusion ,IEEE Xplore, 2305-2310
9. LUAN THANH TRINH AND TOMOKI HAMAGAMI(2024), Latent Denoising
Diffusion GAN: Faster Sampling, Higher Image Quality,IEEE Access, 78161-78172
10. IBRAHIM H. EL-SHAL 1 , OMAR M. FAHM(2022), License Plate Image Analysis
Empowered by Generative Adversarial Neural Networks (GANs) ,IEEE Access,
30846-30857
11. Huihui Li , Cang Gu , Dongqing Wu , Gong Cheng/(2022), Multiscale Generative
Adversarial Network Based on Wavelet Feature Learning for SAR-to-Optical Image
Translation,IEEEE, 5236115-5236125
12. WENDA LI 1,2,3 AND JIAN WANG(2021), Residual Learning of Cycle-GAN for
Seismic Data Denoising,IEEE Access, 11585-11597
13. Qi-Feng Sun1 · Jia-Yue Xu1 · Han-Xiao Zhang1(2021), Random noise suppression
and super-resolution reconstruction algorithm of seismic profle based
on GAN,Springer, 2107–2119
14. ASAVARON LIMSUEBCHUEA 1 , RAKKRIT DUANGSOITHONG(2024), Self-
Augmented Noisy Image for Noise2Noise Image Denoising,IEEE Access, 71076-
71087
15. Joonyoung Song , Jae-Heon Jeong , Dae-Soon Park (2021), Unsupervised Denoising
for Satellite Imagery Using Wavelet Directional CycleGAN,IEEE Xplore, 6823-
6839