Efficient Image Colorization

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Efficient Image Colorization with Conditional

GANs: A Lightweight Approach for High-Quality


Results Using Limited Data
Bhavya Chopra, Abhiram Chodapaneedi, Ishi Prashar, Dr. Sakshi Indolia
STME, SVKM’s Narsee Monjee Institute of Management Studies (NMIMS) Deemed-to-be-University,
Navi Mumbai, Maharashtra, India

Abstract—This paper proposes a deep learning-based


approach to the colorization of images from grayscale to particular ensure controlled generation of images conditioned
colored ones on images of size 32 32 in low resolution without on input data, such as a low resolution grey-scale images.
transfer learning using × Conditional Generative Adversarial In this work of CGAN-based colourisation, the generator is
Networks. For smaller sizes of images, new architectures have to provide the image to be colored from the given grayscale
been proposed for the generator and discriminator motivated by image, and the discriminator learns to distinguish real and
the Pix2Pix framework so that the small image sizes are handled
efficiently. The paper experimentally studies such models’ fake colored images in the latent space, pushing the generator
performance on 1,000–10,000 subsets of CIFAR-10 in terms of to produce more realistic output images [2], [3].
dataset size affecting a model’s performance. It can be noticed This paper proposes a lightweight CGAN-based approach
here that the trade-off does exist between the quality of the for the colourization of low-resolution grayscale images. Ex-
dataset and image since the model performs approximately in the
periments conducted as part of this work focus on 32 × 32
smaller dataset sizes under SSIM and PSNR. Such findings are
relevant particularly for application areas like medical imaging, pixel images from the CIFAR-10 dataset, exploring whether
where access to clear, high-resolution images or large datasets is CGANs are suitable for training in the presence of sparse data
often limited, hence the demonstration that CGANs can deliver without pre-trained models. This is particularly important in
high- quality results with limited data and without any form of medical imaging domains in which high resolution images
pre- trained models.
Index Terms—Conditional GANs, image-to-image translation,
or large datasets are not usually available. The proposed
small datasets, SSIM, PSNR, grayscale-to-color translation, method indeed shows that CGANs is capable of generating
Pix2Pix. colorization equivalent in quality while utilizing far fewer data
samples and lower-resolution images, which is very practical
I. INTRODUCTION for resource-limited scenarios [4], [5].
Colorization is the prediction of what colors should be A. Contributions
applied to a grayscale image and converting them into three-
dimensional RGB images. Traditionally, colorization involved Some significant contributions in the paper include • Em-
the long and arduous process of doing it manually. But with ploying CGANs to devise colorizations of lowresolution
the advancements in machine learning and computer vision, grayscale images with a dimension of (3232) as their results
techniques for automatic colorization have developed as an can be transferred for small scale datasets which differ be-
efficient and accurate alternative. Applications range from tween 1, 000 and 10, 000 samples • Developed architecture
restoring images, entertainment, to even medical imaging. solely for generator as well as discriminator constructed with
There are basically two categories based on which col- a greater general adaptation from Pix2Pix framework enabling
orization approaches are made: user-guided and automatic. much lesser sizes in contrast with standard small sized images.
Basically, the user-guided method uses manual input when • The paper makes an elaborate comparison with state-of-the-
the users provide color hints or reference images to make art colorization techniques and benchmarks the results using
the algorithm propagate color across that grayscale image. PSNR and SSIM scores across different methods in Table IV.
Although resource-intensive and requiring much labor for an • The model, when trained with a limited amount of data, still
image with intensive details, these methods become effective. achieves competitive results in terms of Structural Similarity
The automatic method never requires any manual input. The Index (SSIM) and Peak Signal-to-Noise Ratio (PSNR). It
dominance in approach using large datasets of colored images established how far model performance depends on the size
to learn the mapping from grayscale to color is dominated by of the dataset in the paper. The model performed comparable
Convolutional Neural Networks as deep learning rose. up to just 5, 000 images and improves with dataset size. Thus,
Generative Adversarial Networks are the invention of the methods based on CGANs can indeed be applied to low-
Good- fellow et al. that have changed the face of image resolution images, making them applicable to domains such
generation tasks such as colorizing images [1]. Conditional as medical images where resolution and possibly also dataset
GANs in size are sometimes limited.
Remainder of the paper: Section II gives an overview of ing, and it is thus applicable in scenarios with limited both
previous research in image colorization using GAN-based data and computational resources.
methods and other related works. It elucidates the More recent work has attempted to push training efficiency
experimental methodology to be used with the architectures of with the help of self-supervised learning. Suarez et al. [4] and
CGANs used along with the methodology for analyzing the Vitoria et al. [10] discussed the utilization of self-supervised
results. Results analysis and discussion is presented on the for colorization in which GANs are effective even when data
findings obtained in this paper in Section V while potential availability is small. They degrade a bit in the colorization
improvements and future research directions are proposed in quality if training is done on very minimalistic data since
Section VI and a conclusion drawn with that of the entire these still need some level of supervision.
paper and its findings in Section VII. Other studies, for instance, Kumar et al. [11], used CGANs
for specialized tasks such as medical image colorization;
II. RELATED WORK hence its capability across different domains. For the portrait
colorization, Serrano et al. [12] demonstrated the applicability
Image colorization is one area of research that has made
of CGANs in coloring a low-resolution grayscale portrait.
remarkable progress with advancement through deep learning
Although these works affirm the suitability of CGANs in
methods, particularly generative models. Zhang et al. [19]
a variety of applications, they are typically more interested
first presented a CNN-based architecture that was utilized
in high-resolution images and large datasets that cannot be
to automatically colorize the grayscale images. It forecasted
feasible in resource-limited settings.
color distributions for each pixel within a grayscale image and
gained good success on large-scale datasets. However, this Apart from GAN-based methods, the performance assess-
method, although effective, was computationally expensive ment of colorized images has usually been very dependent
and had to depend on large training data, making it less prac- on SSIM [13] and PSNR [14] as well. To be more specific,
tical for a resource-scarce scenario or where the availability these two metrics have served as default benchmark metrics
of the data is a concern. for determining the quality of colorized images in a perceptual
sense. Therefore, this paper leverages the two as an
Generative Adversarial Networks (GANs) [1] offered much
established ground for comparison of model performance.
stronger framework than the former one for image coloriza-
tion, allowing for much more realistic synthesis of images by
A. Limitations of Previous Approaches
exploiting the adversarial process between the generator and
discriminator. Mirza and Osindero have introduced While GANs and CGANs have revolutionized the process
conditional GANs, an extension to these ideas: instead of of image colorization significantly, a lot of the weaknesses
using any additional information for both generators and remain. In the first place, most of the approaches work based
discriminators to get conditioned. Isola, et al. have made on a large set of datasets that are sufficient for adequate
significant contri- butions on this and it is extended in Pix2Pix performance. Most models, including Pix2Pix in general, fail
as widely used architecture by the model for performing all to make strong generalizations when few train data are avail-
image-to-image translation tasks which include colorization. able. This makes them more non-applicable in domains where
Although these methods based on the CGAN-approach like these high-resolution images may not be available, especially
Pix2Pix are quite successful, they are often based on a large when dealing with medical image scanning. Besides, transfer
amount of training data to ensure generalization, and thus learning methods, as proposed in [9], introduce additional
are not possible in environments with data constraint. overhead and reliance on external pre-trained models, which
Cao et al have further enhanced this work by using GANs is not always possible. Lastly, self-supervised approaches, as
as a technique for automatic image colorization, focusing described in [4], typically sacrifice quality when the data is
more on the bright colors and realistic colors. Nazeri et al highly sparse.
improved this by proposing perceptual losses that improve the
perceptual quality of the colored image. These methods are B. Paper’s Approach
producing really high-quality results, but they depend on To overcome such limitations, we propose in this work a
really large- scale datasets for best performance. CGAN-based model specific to colorization of low-resolution
Transfer learning has also been explored to be applied for grayscale images at a dimensionality of (32 ×32) size in
improvement of performance. Iizuka et al. Introduced a model smaller datasets. Transfer learning can now be eliminated
using both local and global context features to colorize but due to a limitation of constrained image resolutions and
with the advantage of using pretrained networks to bring out constrained size of datasets. Experimental evaluations with
high-quality results. Such transfer learning-based approaches this model demonstrate its competitive ability in obtaining
do bring overhead in terms of computation and rely on pre- performance metrics on PSNR and SSIM even within very
trained models that may not always be available or resource-constrained environments. This would allow the pro-
appropriate, especially in fields such as medical imaging. This posed approach to apply well in specialized fields such as
work is quite different from the above approaches because it medical imaging, where high resolution is extremely difficult
emphasizes a lightweight CGAN architecture independent of to obtain and data is often scarce.
transfer learn-
III. METHODOLOGY • Conv2DTranspose (64 filters, kernel size = 4, strides =
A. Dataset and Preprocessing 2) + Batch Normalization + ReLU
In this work, the authors used CIFAR-10 composed of • Concatenate with input image
32 ×32 RGB images to test how such a model performs • Conv2D (3 filters, kernel size = 4, strides = 1, activation
at data sizes. For the analysis, the paper experimented on = tanh)
six subsets of CIFAR-10: 1000, 2000, 3000, 4000, 5000, 10000
samples, respectively.
The dataset is preprocessed such that the pixel values of
the image are normalized between [ − 1, 1] range and RGB
images are converted to grayscale input to the generator. The
normalizing function used for preprocessing normalize the
images in the following ways:
x
x′ = 1

127.5
where x represents the original pixel values of the image.
Each grayscale image is paired with its corresponding color
image from the dataset. This grayscale image is passed to the
generator, and the discriminator receives both grayscale
(input) and color (target) images.

Fig. 1: Overall GAN Pipeline

B. Generator Architecture
The generator in this model is designed to convert 32 × 32
grayscale images into 32 ×32 RGB images. It consists of
an encoder-decoder structure with skip connections similar to
the Pix2Pix framework. The generator network begins with
an input grayscale image and progressively downsamples the
image to a low-dimensional latent space and then upsamples
it back to the original resolution while adding the necessary
color information. Fig. 2: Transformation from Original to Grayscale to Gener-
The architecture of the generator is as follows: ated Color Image
Encoder:
• Conv2D (64 filters, kernel size = 4, strides = 2) + Given a grayscale input image, the generator will output
LeakyReLU a color 32
× 32 3 image. During training, the loss function
• Conv2D (128 filters, kernel size = 4, strides = 2) + Batch consists of adversarial loss combined with an L1 loss in
Normalization + LeakyReLU order to force the generator to produce images close to the
target color images:
Bridge:
The generator will output a color image × 32 32 3,
• Conv2D (256 filters, kernel size = 4, strides = 1) + Batch
generated from a given grayscale input image. The used
Normalization + LeakyReLU loss function for training the generator is an adversarial
Decoder: loss combined with an L1 loss in order to drive the
• Conv2DTranspose (128 filters, kernel size = 4, strides = generator to output images close to target color images:
2) + Batch Normalization + ReLU
• Concatenate with encoder output LG = Ex,y[log(D(x, G(x)))] + λ · Ex,y[∥y − G(x)∥1]
where D(x, G(x)) is the discriminator’s output for the where D(x, y) is the discriminator’s output for a real image,
generated image, and λ = 100 controls the weight of the L1 and D(x, G(x)) is its output for a generated image.
loss.

Fig. 3: Generator Architecture

TABLE I: Generator Architecture


Fig. 4: Discriminator Architecture
Layer Output Size Kernel Activation
Size/Stride
Input 32x32x1 - -
Conv2D 16x16x64 4x4 / 2 LeakyReLU(0.2) TABLE II: Discriminator Architecture
Conv2D + Batch- 8x8x128 4x4 / 2 LeakyReLU(0.2)
Norm Layer Output Size Kernel Activation
Conv2D + Batch- 8x8x256 4x4 / 1 LeakyReLU(0.2) Size/Stride
Norm (Bridge) Input (Image + 32x32x4 - -
Conv2DTranspose 16x16x128 4x4 / 2 ReLU Target)
+ BatchNorm Conv2D 16x16x64 4x4 / 2 LeakyReLU(0.2)
Concatenate 16x16x192 - - Conv2D + Batch- 8x8x128 4x4 / 2 LeakyReLU(0.2)
Conv2D + Batch- 16x16x128 3x3 / 1 ReLU Norm
Norm Conv2D + Batch- 4x4x256 4x4 / 2 LeakyReLU(0.2)
Conv2DTranspose 32x32x64 4x4 / 2 ReLU Norm
+ BatchNorm Conv2D (Output 4x4x1 4x4 / 1 -
Concatenate 32x32x65 - - Layer)
Conv2D + Batch- 32x32x64 3x3 / 1 ReLU
Norm
Conv2D (Output 32x32x3 4x4 / 1 Tanh D. Training Procedure
Layer)
The model is trained using Adam optimizers with a learning
rate of 2 × 10−4 and β1 = 0.5. The batch size is set to
C. Discriminator Architecture 32, and the models are trained for 100 epochs for each of
The discriminator is designed to distinguish between real the following dataset sizes: 1000, 2000, 3000, 4000, 5000, and
color images and those generated by the generator. It takes 10000 samples.
both the grayscale input image and the corresponding color Reason for Choosing 100 Epochs: Based on preliminary
image (real or generated) as input and predicts whether the experiments and prior studies in the field, it was observed
color image is real or fake. The architecture consists of several that training the model for 100 epochs provided an optimal
convolutional layers, which reduce the resolution of the input balance between training time and performance. This allowed
progressively while increasing the number of feature maps. the model to converge effectively while minimizing the risk of
The architecture of the discriminator is as follows: overfitting. Similar findings have been observed in the works
of Goodfellow et al. [1], Radford et al. [18], and Zhang et al.
• Input: 32x32 grayscale image concatenated with a 32x32 [19], where the authors used a comparable number of epochs
RGB image (either real or generated) in GAN-based architectures to achieve stable results without
• Conv2D (64 filters, kernel size = 4, strides = 2) + excessive training.
LeakyReLU
• Conv2D (128 filters, kernel size = 4, strides = 2) + Batch E. Dataset and Preprocessing
Normalization + LeakyReLU
This study uses the CIFAR-10 dataset, which consists of
• Conv2D (256 filters, kernel size = 4, strides = 2) + Batch
32 × 32 RGB images. To evaluate the performance of the
Normalization + LeakyReLU
model under different data sizes, the paper experiments with
• Conv2D (1 filter, kernel size = 4, strides = 1, no activa-
six subsets of CIFAR-10: 1000, 2000, 3000, 4000, 5000, and
tion)
10000samples.
The discriminator’s output is a patch-based prediction, The dataset is preprocessed by normalizing pixel values
which evaluates the realism of different patches of the image. to the range of [−1, 1] and converting the RGB images to
The discriminator loss is defined as: grayscale as input to the generator. The preprocessing
function normalizes the images as follows:

LD = x
−E x′ = 1
x,y [log D(x, y)] − x,z [log(1− D(x, −
E G(x)))] 127.5
where x represents the original pixel values of the image.
Each grayscale image is paired with its corresponding color
image from the dataset. This grayscale image is passed to the
generator, and the discriminator receives both grayscale
(input) and color (target) images.

IV. RESULTS AND ANALYSIS

Within this section, the model performed results on the dif-


ferent parts of the CIFAR-10 dataset based on two measures;
these are SSIM as well as PSNR. Those are the reported
measures using different sample sizes such that include 1000,
2000, 3000, 4000, 5000, as well as 10000 sample sizes, with
related epochs from 1 up to 100. The plot includes graphical
visualizations for both averaged SSIM as well as PSNR for Fig. 5: SSIM across sample sizes and epochs.
better visualization.

A. Performance across Different Sample Sizes and Epochs

Table III gives detailed SSIM and PSNR results for each
sample size. SSIM and PSNR values are reported for epochs
from 1 to 100 for each sample size. SSIM ranges between
0.36 and 0.89, and PSNR ranges between 14.43 and 24.35. It
is also found that with an increase in the number of epochs,
the values of both metrics improve and larger sample sizes
provide better overall performance.

B. Visual Representation of Results

The SSIM and PSNR values for the different sample sizes
are visualized in Figures 5 and 6, respectively. These plots il- Fig. 6: PSNR across sample sizes and epochs.
lustrate the trends in model performance as training
progresses.
V. FINDINGS AND DISCUSSION
• Figure 5 demonstrates how the SSIM values increase
Analysis of Conditional Generative Adversarial Networks
significantly after the first epoch and then stabilize across
applied to the CIFAR-10 dataset as a test set for colorization
different sample sizes. For all sample sizes, SSIM con-
gives several important insights into the relationship between
verges to values above 0.8 after approximately 10
dataset size and number of epochs and model performance.
epochs, indicating that the generated images become
The key takeaways extracted from the results of SSIM and
increasingly similar to the ground truth images over
PSNR across sample sizes and epochs are:.
time.
• Figure 6 shows a similar trend for the PSNR values, A. Impact of Dataset Size on Model Performance
where performance improves rapidly in the early stages
of training and stabilizes as the number of epochs There’s a positive trend in SSIM as well as PSNR results
increases. Larger sample sizes achieve slightly higher by increasing the sample size from 1000 up to 10,000 and
PSNR values, with 5000 and 10000 samples reaching especially at larger epochs. Results are also proved to be
PSNR values above 23 by the 100th epoch. This suggests robust at smaller-size datasets, but larger sizes of data have
that the reconstruction quality of the generated images constantly given high performance:
improves with more training data. • For 1000 samples, the SSIM improves from 0.3618 at
epoch 1 to 0.8773 at epoch 100, a relative improvement
Both figures emphasize the effectiveness of increasing the of 142.4%, while the PSNR improves from 14.43 dB to
dataset size and training duration. As the figures show, larger 22.66 dB, a 57.1% increase.
datasets and more epochs result in higher similarity (SSIM)
and better reconstruction quality (PSNR).
TABLE III: SSIM and PSNR results for different sample sizes across epochs.
Epoch 1000 2000 3000 4000 5000 10000
SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR
1 0.3618 14.43 0.4690 15.29 0.5413 16.05 0.6631 17.70 0.7380 19.27 0.7447 19.47
10 0.7915 20.33 0.8365 21.63 0.8128 20.51 0.8554 21.91 0.8226 20.70 0.8701 22.17
20 0.8135 21.24 0.8395 21.19 0.7937 19.22 0.8616 22.33 0.8625 22.01 0.8564 21.70
30 0.8182 21.14 0.8595 22.17 0.7976 20.17 0.8151 21.17 0.8693 22.31 0.8326 21.75
40 0.8539 22.24 0.8326 21.62 0.8408 21.19 0.8631 22.26 0.8669 22.27 0.8747 22.28
50 0.8253 20.92 0.8502 22.09 0.8629 22.21 0.8235 21.09 0.8683 22.50 0.8756 22.64
60 0.8334 21.45 0.8743 22.88 0.8691 22.49 0.8912 23.17 0.8768 22.64 0.8635 22.09
70 0.8639 22.40 0.8679 22.39 0.8675 22.29 0.8888 23.02 0.8889 23.12 0.8892 23.08
80 0.8535 22.35 0.8595 22.14 0.8832 22.95 0.8864 23.20 0.8899 23.52 0.8540 23.17
90 0.8766 22.61 0.8855 22.90 0.8862 23.14 0.8926 23.33 0.8912 23.88 0.8219 23.98
100 0.8773 22.66 0.8885 22.98 0.8977 23.63 0.8931 23.50 0.8975 24.31 0.7878 24.35

TABLE IV: PSNR and SSIM results for different image colorization methods.
Citation / Source Dataset Size (Images) Image Resolution Total Pixels (approx.) Epochs PSNR (dB) SSIM
Zhang et al., 2016 [19] 1,000 256x256 65,536,000 100 24.8 0.86
Iizuka et al., 2016 [9] 3,000 224x224 150,528,000 50 24.5 0.85
Isola et al., 2017 [3] 400 256x256 26,214,400 200 22.9 0.81
Nazeri et al., 2018 [8] 2,000 128x128 32,768,000 50 23.1 0.79
Vitoria et al., 2020 [15] 1,500 256x256 98,304,000 100 24.3 0.83
Su et al., 2019 [16] 600 256x256 39,321,600 150 23.8 0.82
Sartaj et al., 2021 [17] 1,200 128x128 19,660,800 80 23.5 0.84
Bhattacharjee et al., 2022 [5] 800 128x128 13,107,200 120 24.1 0.81
Proposed Methodology 5,000 32x32 5,120,000 100 24.31 0.8975

• With 2000 samples, the SSIM increases from 0.4690 to C. Comparison Between Small and Large Datasets
0.8885 (89.4%) over 100 epochs, and the PSNR from
15.29 dB to 22.98 dB, a 50.3% increase. The final SSIM and PSNR value varies consistently with
• For 3000 samples, the SSIM rises from 0.5413 to 0.8977 larger datasets at the same number of epochs. For instance,
(65.8%), and the PSNR improves from 16.05 dB to 23.63 at epoch 100, the SSIM for 10,000 samples is 0.7878 versus
dB, a 47.2% gain. 0.8773 for 1000 samples. Similarly, PSNR for 10,000 samples
• When the dataset size is 4000, the SSIM improves from is 24.31 dB versus 22.66 dB for 1000 samples. However, it is
0.6631 to 0.8931 (34.7%), while the PSNR increases interesting that, even when the dataset is small, the model has
from 17.70 dB to 23.50 dB, a 32.8% increase. reasonably competitive performance, which shows that, even
• At 5000 samples, the SSIM starts at 0.7380 and reaches on a small dataset, CGANs could perform reasonably well.
0.8923 (20.9%), while the PSNR increases from 19.27
D. Overall Findings
dB to 24.26 dB, a 25.9% rise.
• For the largest dataset size (10,000 samples), SSIM The conducted experiments demonstrate that in general,
improves from 0.7447 to 0.7878 (5.8%), while the PSNR a boost in size for a given dataset improves performance,
rises from 19.47 dB to 24.31 dB, a 24.9% increase. yet important gains can be also achieved by longer training
on smaller datasets. As the example shows the comparison
between the number of samples 5000 and 10,000 at epoch
B. Impact of Epochs on Model Performance 100 is barely differentiable with SSIM enhanced by only 0.6
percent, and PSNR increased by 0.2 dB. These results show
Across all dataset sizes, increasing the number of training
that even with relatively small datasets, the colorization based
epochs results in a steady improvement in both SSIM and
on CGAN can be quite efficient, and such approach appears
PSNR. The model shows rapid improvement within the first
viable when data are scarce.
10 epochs, with diminishing returns at later epochs:
• For 1000 samples, SSIM increases by 118.8% from E. Comparison with Related Work
epoch 1 to 10, while PSNR improves by 40.9%. When compared with the results of previous research stud-
However, the change from epoch 10 to epoch 100 is ies, some major differences can be noticed. Zhang et al. uses
more modest, with SSIM improving by 10.8% and a dataset size of 1,000 images at 256 x 256 resolution and has
PSNR by 11.5%. achieved 24.8 dB PSNR and 0.86 SSIM. Whereas, the PSNR
• Similar trends are observed for other dataset sizes, with value becomes slightly low as 24.31 dB, but with a higher
rapid initial improvements followed by slower gains. For SSIM of 0.8975, which is higher compared with Zhang et
instance, for 5000 samples, SSIM improves by 11.8% al.’s technique, although implemented on much smaller 32x32
between epochs 10 and 100, while PSNR improves by images and sample size is considered to be 5,000. It depicts
17.1%.
how the approach can produce the perceptually better images, skip connections, such as in U-Net architecture, might
even at the lower resolutions. im- prove image restoration as they preserve high-
Along the same lines, Iizuka et al. also achieve PSNR of resolution details from earlier layers.
24.5 dB and SSIM of 0.85 from a dataset size of 3,000 images • Deeper Generator Network: Increasing the depth of the
at a higher resolution of 224x224. Because this approach uses generator by adding more convolutional and transpose
fewer pixels but a comparable number of epochs, it is convolutional layers might help capture more complex
excellent beyond this concerning SSIM. This implies that the features from the input images.
CGAN- based approach can attain very competitive results • Residual Blocks: Introducing residual blocks into the
relatively easily even from lower-resolution images. generator could help improve image reconstruction.
Apart from that, in comparison with Vitoria Residual connections allow better gradient flow and may
accelerate training convergence.
textitet al.
C. Loss Functions and Optimization
citevitoria2020chromagan, which trained the model on • Perceptual Loss: Instead of relying solely on L1 loss,
1,500 images at 256x256 resolution for 100 epochs, the consider adding perceptual loss (VGG-based loss), which
proposed method has PSNR and SSIM higher respectively compares high-level features of generated and real im-
(24.31 dB vs. 24.3 dB) and 0.8975 vs. 0.83 using a much ages. This could improve the visual quality of generated
larger dataset but significantly fewer pixels. That further images.
proves the scalability of the proposed method when training • Gradient Penalty: In the discriminator, you could intro-
on lower-resolution data. duce a gradient penalty, especially if the GAN suffers
Moreover, Nazeri et al. Nazeri2018Image and Isola et al. from instability during training. This could be imple-
Isola2017Image, using 128 × 128 and 256 × 256 images, mented using Wasserstein GAN with Gradient Penalty
achieved comparable results at a comparable amount of epoch (WGAN-GP) for more stable training.
numbers. However, the efficiency of that model with 32 × 32
images demonstrates that CGANs are also capable of creating D. Training Process
high-quality outputs even if the image resolution is lower, so • Dynamic Learning Rate: The learning rate could be
it gives an opportunity to further develop that work in more adjusted dynamically using learning rate schedulers. A
extended ranges of the sizes of images. As an example, Sartaj decreasing learning rate over time could help the models
converge more efficiently.
textitet al. • Discriminator Training Frequency: Try updating the
discriminator more or less frequently relative to the
citesartaj2021cgan uses images of size 128x128 and 80 generator, which might improve training stability.
epochs but it is still superior to their model’s performance at
epoch 100 as in TableIV. E. Metrics and Evaluation
• FID Score: In addition to SSIM and PSNR, consider
VI. POTENTIAL IMPROVEMENTS using the Fre´chet Inception Distance (FID) score,
A. Data Preprocessing which compares distributions of real and generated
images using a pre-trained Inception network, and
• Normalization Range: Currently, the preprocessing step provides a more comprehensive measure of image
normalizes images to the range of [−1, 1]. A potential im- quality.
provement could be trying different normalization • Visual Quality of Results: Visual inspection of the
ranges, such as [0, 1], depending on the activation generated results at intermediate epochs can provide
functions used in the generator and discriminator more intuition about the quality of generated images over
models. time. Include more frequent visual checkpoints.
• Data Augmentation: Introduce data augmentation tech-
niques such as random flips, rotations, and crops. This VII. CONCLUSION
would increase the diversity of the training data and
This paper introduced the lightweight Conditional GAN
might improve the generalization of the model.
model based on CIFAR-10 for the task of colorizing grayscale
• Larger Dataset: The number of samples for both
images. Based on the compact architecture with the capacity
training and testing is restricted (3,000 for training and
to work regardless of whether large-scale or high-resolution
100 for testing). A larger dataset could improve
inputs are presented, this model is extremely useful for fields
performance, given that models like GANs typically
such as medical imaging, where one usually deals with scarce
benefit from more data.
amounts of data and high-quality labeled datasets are expen-
B. Model Architecture sive.
From start to end during the experimentation process,
• Skip Connections in Generator: The skip connections performance was promising for the developed model at such
currently only link certain layers. Using more detailed low PSNR and SSIM scores although working at low image
resolution with a tiny dataset as compared to almost all
prior work. Here, considering this research focuses on images
using a less extensive size dataset and resolutions, where data
availability indeed is critical or can be the constraint during
their realistic application, with the resultant superiority in
images’ quality as compared with the colourized images - the
PSNR and SSIM being remarkably impressive.
Future improvements may include data augmentation, per-
ceptual loss, and deeper network architectures, which could
make the model even more photorealistic and high-quality.
REFERENCES
[1] I. Goodfellow et al., ”Generative Adversarial Nets,” NIPS, 2014.
[2] M. Mirza and S. Osindero, ”Conditional Generative Adversarial Nets,”
arXiv preprint arXiv:1411.1784, 2014.
[3] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, ”Image-to-image translation
with conditional adversarial networks,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), 2017, pp. 1125–1134.
[4] J. Suarez et al., ”Self-Supervised Learning for Image Colorization,”
IEEE Access, 2022.
[5] S. Bhattacharjee, V. Singh, and D. S. Kushwaha, ”Efficient image
colorization using conditional GANs and perceptual loss,” in Proc.
IEEE Conf. Image Process. (ICIP), 2022.
[6] R. Zhang, P. Isola, and A. A. Efros, ”Colorful image colorization,” in
Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 649–666.
[7] Y. Cao, Z. Zhu, and Z. Zhang, ”Image Colorization Using Generative
Adversarial Networks,” International Journal of Advanced Computer
Science and Applications, 2020.
[8] K. Nazeri, E. Ng, T. Joseph, F. Qureshi, and M. Ebrahimi, ”Image
colorization using generative adversarial networks,” in Proc. Int. Conf.
Artif. Intell. Appl. (AAIA), 2018.
[9] S. Iizuka, E. Simo-Serra, and H. Ishikawa, ”Let there be color!: Joint
end-to-end learning of global and local image priors for automatic
image colorization with simultaneous classification,” ACM Trans.
Graph., vol. 35, no. 4, pp. 110:1–110:11, 2016.
[10] P. Vitoria, L. Zhang, ”ChromaGAN: Colorization with a GAN using
chrominance and luminance color spaces,” Pattern Recognition, 2022.
[11] S. Kumar, R. Srivastava, and S. Yadav, ”Image Colorization Using
Generative Adversarial Networks and Transfer Learning,” International
Journal of Innovative Technology and Exploring Engineering (IJITEE),
2021.
[12] P. Serrano, S. Bharadwaj, and R. Diaz, ”Portrait Image Colorization
Using Conditional GANs,” WACV, 2017.
[13] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, ”Image Quality
Assessment: From Error Visibility to Structural Similarity,” IEEE
Trans- actions on Image Processing, 2004.
[14] D. Huynh-Thu and M. Ghanbari, ”Scope of Validity of PSNR in
Image/Video Quality Assessment,” Electronics Letters, 2008.
[15] P. Vitoria, L. Sousa, and P. Quelhas, ”ChromaGAN: Adversarial picture
colorization with semantic class distribution,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2020, pp. 0–0.
[16] Z. Su, J. Wang, and C. Hu, ”Lightweight image colorization with gen-
erative adversarial networks,” IEEE Access, vol. 7, pp. 170804–170816,
2019.
[17] S. N. Ali, P. Kumar, and S. Jain, ”cGAN-based image colorization using
semantic segmentation,” in Proc. Int. Conf. Pattern Recognit. Mach.
Intell. (PRMI), 2021, pp. 334–342.
[18] A. Radford, L. Metz, and S. Chintala, ”Unsupervised Representation
Learning with Deep Convolutional Generative Adversarial Networks,”
ICLR, 2016.
[19] R. Zhang, P. Isola, and A. Efros, ”Colorful Image Colorization,” ECCV,
2016.

You might also like