Efficient Image Colorization
Efficient Image Colorization
Efficient Image Colorization
B. Generator Architecture
The generator in this model is designed to convert 32 × 32
grayscale images into 32 ×32 RGB images. It consists of
an encoder-decoder structure with skip connections similar to
the Pix2Pix framework. The generator network begins with
an input grayscale image and progressively downsamples the
image to a low-dimensional latent space and then upsamples
it back to the original resolution while adding the necessary
color information. Fig. 2: Transformation from Original to Grayscale to Gener-
The architecture of the generator is as follows: ated Color Image
Encoder:
• Conv2D (64 filters, kernel size = 4, strides = 2) + Given a grayscale input image, the generator will output
LeakyReLU a color 32
× 32 3 image. During training, the loss function
• Conv2D (128 filters, kernel size = 4, strides = 2) + Batch consists of adversarial loss combined with an L1 loss in
Normalization + LeakyReLU order to force the generator to produce images close to the
target color images:
Bridge:
The generator will output a color image × 32 32 3,
• Conv2D (256 filters, kernel size = 4, strides = 1) + Batch
generated from a given grayscale input image. The used
Normalization + LeakyReLU loss function for training the generator is an adversarial
Decoder: loss combined with an L1 loss in order to drive the
• Conv2DTranspose (128 filters, kernel size = 4, strides = generator to output images close to target color images:
2) + Batch Normalization + ReLU
• Concatenate with encoder output LG = Ex,y[log(D(x, G(x)))] + λ · Ex,y[∥y − G(x)∥1]
where D(x, G(x)) is the discriminator’s output for the where D(x, y) is the discriminator’s output for a real image,
generated image, and λ = 100 controls the weight of the L1 and D(x, G(x)) is its output for a generated image.
loss.
LD = x
−E x′ = 1
x,y [log D(x, y)] − x,z [log(1− D(x, −
E G(x)))] 127.5
where x represents the original pixel values of the image.
Each grayscale image is paired with its corresponding color
image from the dataset. This grayscale image is passed to the
generator, and the discriminator receives both grayscale
(input) and color (target) images.
Table III gives detailed SSIM and PSNR results for each
sample size. SSIM and PSNR values are reported for epochs
from 1 to 100 for each sample size. SSIM ranges between
0.36 and 0.89, and PSNR ranges between 14.43 and 24.35. It
is also found that with an increase in the number of epochs,
the values of both metrics improve and larger sample sizes
provide better overall performance.
The SSIM and PSNR values for the different sample sizes
are visualized in Figures 5 and 6, respectively. These plots il- Fig. 6: PSNR across sample sizes and epochs.
lustrate the trends in model performance as training
progresses.
V. FINDINGS AND DISCUSSION
• Figure 5 demonstrates how the SSIM values increase
Analysis of Conditional Generative Adversarial Networks
significantly after the first epoch and then stabilize across
applied to the CIFAR-10 dataset as a test set for colorization
different sample sizes. For all sample sizes, SSIM con-
gives several important insights into the relationship between
verges to values above 0.8 after approximately 10
dataset size and number of epochs and model performance.
epochs, indicating that the generated images become
The key takeaways extracted from the results of SSIM and
increasingly similar to the ground truth images over
PSNR across sample sizes and epochs are:.
time.
• Figure 6 shows a similar trend for the PSNR values, A. Impact of Dataset Size on Model Performance
where performance improves rapidly in the early stages
of training and stabilizes as the number of epochs There’s a positive trend in SSIM as well as PSNR results
increases. Larger sample sizes achieve slightly higher by increasing the sample size from 1000 up to 10,000 and
PSNR values, with 5000 and 10000 samples reaching especially at larger epochs. Results are also proved to be
PSNR values above 23 by the 100th epoch. This suggests robust at smaller-size datasets, but larger sizes of data have
that the reconstruction quality of the generated images constantly given high performance:
improves with more training data. • For 1000 samples, the SSIM improves from 0.3618 at
epoch 1 to 0.8773 at epoch 100, a relative improvement
Both figures emphasize the effectiveness of increasing the of 142.4%, while the PSNR improves from 14.43 dB to
dataset size and training duration. As the figures show, larger 22.66 dB, a 57.1% increase.
datasets and more epochs result in higher similarity (SSIM)
and better reconstruction quality (PSNR).
TABLE III: SSIM and PSNR results for different sample sizes across epochs.
Epoch 1000 2000 3000 4000 5000 10000
SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR
1 0.3618 14.43 0.4690 15.29 0.5413 16.05 0.6631 17.70 0.7380 19.27 0.7447 19.47
10 0.7915 20.33 0.8365 21.63 0.8128 20.51 0.8554 21.91 0.8226 20.70 0.8701 22.17
20 0.8135 21.24 0.8395 21.19 0.7937 19.22 0.8616 22.33 0.8625 22.01 0.8564 21.70
30 0.8182 21.14 0.8595 22.17 0.7976 20.17 0.8151 21.17 0.8693 22.31 0.8326 21.75
40 0.8539 22.24 0.8326 21.62 0.8408 21.19 0.8631 22.26 0.8669 22.27 0.8747 22.28
50 0.8253 20.92 0.8502 22.09 0.8629 22.21 0.8235 21.09 0.8683 22.50 0.8756 22.64
60 0.8334 21.45 0.8743 22.88 0.8691 22.49 0.8912 23.17 0.8768 22.64 0.8635 22.09
70 0.8639 22.40 0.8679 22.39 0.8675 22.29 0.8888 23.02 0.8889 23.12 0.8892 23.08
80 0.8535 22.35 0.8595 22.14 0.8832 22.95 0.8864 23.20 0.8899 23.52 0.8540 23.17
90 0.8766 22.61 0.8855 22.90 0.8862 23.14 0.8926 23.33 0.8912 23.88 0.8219 23.98
100 0.8773 22.66 0.8885 22.98 0.8977 23.63 0.8931 23.50 0.8975 24.31 0.7878 24.35
TABLE IV: PSNR and SSIM results for different image colorization methods.
Citation / Source Dataset Size (Images) Image Resolution Total Pixels (approx.) Epochs PSNR (dB) SSIM
Zhang et al., 2016 [19] 1,000 256x256 65,536,000 100 24.8 0.86
Iizuka et al., 2016 [9] 3,000 224x224 150,528,000 50 24.5 0.85
Isola et al., 2017 [3] 400 256x256 26,214,400 200 22.9 0.81
Nazeri et al., 2018 [8] 2,000 128x128 32,768,000 50 23.1 0.79
Vitoria et al., 2020 [15] 1,500 256x256 98,304,000 100 24.3 0.83
Su et al., 2019 [16] 600 256x256 39,321,600 150 23.8 0.82
Sartaj et al., 2021 [17] 1,200 128x128 19,660,800 80 23.5 0.84
Bhattacharjee et al., 2022 [5] 800 128x128 13,107,200 120 24.1 0.81
Proposed Methodology 5,000 32x32 5,120,000 100 24.31 0.8975
• With 2000 samples, the SSIM increases from 0.4690 to C. Comparison Between Small and Large Datasets
0.8885 (89.4%) over 100 epochs, and the PSNR from
15.29 dB to 22.98 dB, a 50.3% increase. The final SSIM and PSNR value varies consistently with
• For 3000 samples, the SSIM rises from 0.5413 to 0.8977 larger datasets at the same number of epochs. For instance,
(65.8%), and the PSNR improves from 16.05 dB to 23.63 at epoch 100, the SSIM for 10,000 samples is 0.7878 versus
dB, a 47.2% gain. 0.8773 for 1000 samples. Similarly, PSNR for 10,000 samples
• When the dataset size is 4000, the SSIM improves from is 24.31 dB versus 22.66 dB for 1000 samples. However, it is
0.6631 to 0.8931 (34.7%), while the PSNR increases interesting that, even when the dataset is small, the model has
from 17.70 dB to 23.50 dB, a 32.8% increase. reasonably competitive performance, which shows that, even
• At 5000 samples, the SSIM starts at 0.7380 and reaches on a small dataset, CGANs could perform reasonably well.
0.8923 (20.9%), while the PSNR increases from 19.27
D. Overall Findings
dB to 24.26 dB, a 25.9% rise.
• For the largest dataset size (10,000 samples), SSIM The conducted experiments demonstrate that in general,
improves from 0.7447 to 0.7878 (5.8%), while the PSNR a boost in size for a given dataset improves performance,
rises from 19.47 dB to 24.31 dB, a 24.9% increase. yet important gains can be also achieved by longer training
on smaller datasets. As the example shows the comparison
between the number of samples 5000 and 10,000 at epoch
B. Impact of Epochs on Model Performance 100 is barely differentiable with SSIM enhanced by only 0.6
percent, and PSNR increased by 0.2 dB. These results show
Across all dataset sizes, increasing the number of training
that even with relatively small datasets, the colorization based
epochs results in a steady improvement in both SSIM and
on CGAN can be quite efficient, and such approach appears
PSNR. The model shows rapid improvement within the first
viable when data are scarce.
10 epochs, with diminishing returns at later epochs:
• For 1000 samples, SSIM increases by 118.8% from E. Comparison with Related Work
epoch 1 to 10, while PSNR improves by 40.9%. When compared with the results of previous research stud-
However, the change from epoch 10 to epoch 100 is ies, some major differences can be noticed. Zhang et al. uses
more modest, with SSIM improving by 10.8% and a dataset size of 1,000 images at 256 x 256 resolution and has
PSNR by 11.5%. achieved 24.8 dB PSNR and 0.86 SSIM. Whereas, the PSNR
• Similar trends are observed for other dataset sizes, with value becomes slightly low as 24.31 dB, but with a higher
rapid initial improvements followed by slower gains. For SSIM of 0.8975, which is higher compared with Zhang et
instance, for 5000 samples, SSIM improves by 11.8% al.’s technique, although implemented on much smaller 32x32
between epochs 10 and 100, while PSNR improves by images and sample size is considered to be 5,000. It depicts
17.1%.
how the approach can produce the perceptually better images, skip connections, such as in U-Net architecture, might
even at the lower resolutions. im- prove image restoration as they preserve high-
Along the same lines, Iizuka et al. also achieve PSNR of resolution details from earlier layers.
24.5 dB and SSIM of 0.85 from a dataset size of 3,000 images • Deeper Generator Network: Increasing the depth of the
at a higher resolution of 224x224. Because this approach uses generator by adding more convolutional and transpose
fewer pixels but a comparable number of epochs, it is convolutional layers might help capture more complex
excellent beyond this concerning SSIM. This implies that the features from the input images.
CGAN- based approach can attain very competitive results • Residual Blocks: Introducing residual blocks into the
relatively easily even from lower-resolution images. generator could help improve image reconstruction.
Apart from that, in comparison with Vitoria Residual connections allow better gradient flow and may
accelerate training convergence.
textitet al.
C. Loss Functions and Optimization
citevitoria2020chromagan, which trained the model on • Perceptual Loss: Instead of relying solely on L1 loss,
1,500 images at 256x256 resolution for 100 epochs, the consider adding perceptual loss (VGG-based loss), which
proposed method has PSNR and SSIM higher respectively compares high-level features of generated and real im-
(24.31 dB vs. 24.3 dB) and 0.8975 vs. 0.83 using a much ages. This could improve the visual quality of generated
larger dataset but significantly fewer pixels. That further images.
proves the scalability of the proposed method when training • Gradient Penalty: In the discriminator, you could intro-
on lower-resolution data. duce a gradient penalty, especially if the GAN suffers
Moreover, Nazeri et al. Nazeri2018Image and Isola et al. from instability during training. This could be imple-
Isola2017Image, using 128 × 128 and 256 × 256 images, mented using Wasserstein GAN with Gradient Penalty
achieved comparable results at a comparable amount of epoch (WGAN-GP) for more stable training.
numbers. However, the efficiency of that model with 32 × 32
images demonstrates that CGANs are also capable of creating D. Training Process
high-quality outputs even if the image resolution is lower, so • Dynamic Learning Rate: The learning rate could be
it gives an opportunity to further develop that work in more adjusted dynamically using learning rate schedulers. A
extended ranges of the sizes of images. As an example, Sartaj decreasing learning rate over time could help the models
converge more efficiently.
textitet al. • Discriminator Training Frequency: Try updating the
discriminator more or less frequently relative to the
citesartaj2021cgan uses images of size 128x128 and 80 generator, which might improve training stability.
epochs but it is still superior to their model’s performance at
epoch 100 as in TableIV. E. Metrics and Evaluation
• FID Score: In addition to SSIM and PSNR, consider
VI. POTENTIAL IMPROVEMENTS using the Fre´chet Inception Distance (FID) score,
A. Data Preprocessing which compares distributions of real and generated
images using a pre-trained Inception network, and
• Normalization Range: Currently, the preprocessing step provides a more comprehensive measure of image
normalizes images to the range of [−1, 1]. A potential im- quality.
provement could be trying different normalization • Visual Quality of Results: Visual inspection of the
ranges, such as [0, 1], depending on the activation generated results at intermediate epochs can provide
functions used in the generator and discriminator more intuition about the quality of generated images over
models. time. Include more frequent visual checkpoints.
• Data Augmentation: Introduce data augmentation tech-
niques such as random flips, rotations, and crops. This VII. CONCLUSION
would increase the diversity of the training data and
This paper introduced the lightweight Conditional GAN
might improve the generalization of the model.
model based on CIFAR-10 for the task of colorizing grayscale
• Larger Dataset: The number of samples for both
images. Based on the compact architecture with the capacity
training and testing is restricted (3,000 for training and
to work regardless of whether large-scale or high-resolution
100 for testing). A larger dataset could improve
inputs are presented, this model is extremely useful for fields
performance, given that models like GANs typically
such as medical imaging, where one usually deals with scarce
benefit from more data.
amounts of data and high-quality labeled datasets are expen-
B. Model Architecture sive.
From start to end during the experimentation process,
• Skip Connections in Generator: The skip connections performance was promising for the developed model at such
currently only link certain layers. Using more detailed low PSNR and SSIM scores although working at low image
resolution with a tiny dataset as compared to almost all
prior work. Here, considering this research focuses on images
using a less extensive size dataset and resolutions, where data
availability indeed is critical or can be the constraint during
their realistic application, with the resultant superiority in
images’ quality as compared with the colourized images - the
PSNR and SSIM being remarkably impressive.
Future improvements may include data augmentation, per-
ceptual loss, and deeper network architectures, which could
make the model even more photorealistic and high-quality.
REFERENCES
[1] I. Goodfellow et al., ”Generative Adversarial Nets,” NIPS, 2014.
[2] M. Mirza and S. Osindero, ”Conditional Generative Adversarial Nets,”
arXiv preprint arXiv:1411.1784, 2014.
[3] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, ”Image-to-image translation
with conditional adversarial networks,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), 2017, pp. 1125–1134.
[4] J. Suarez et al., ”Self-Supervised Learning for Image Colorization,”
IEEE Access, 2022.
[5] S. Bhattacharjee, V. Singh, and D. S. Kushwaha, ”Efficient image
colorization using conditional GANs and perceptual loss,” in Proc.
IEEE Conf. Image Process. (ICIP), 2022.
[6] R. Zhang, P. Isola, and A. A. Efros, ”Colorful image colorization,” in
Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 649–666.
[7] Y. Cao, Z. Zhu, and Z. Zhang, ”Image Colorization Using Generative
Adversarial Networks,” International Journal of Advanced Computer
Science and Applications, 2020.
[8] K. Nazeri, E. Ng, T. Joseph, F. Qureshi, and M. Ebrahimi, ”Image
colorization using generative adversarial networks,” in Proc. Int. Conf.
Artif. Intell. Appl. (AAIA), 2018.
[9] S. Iizuka, E. Simo-Serra, and H. Ishikawa, ”Let there be color!: Joint
end-to-end learning of global and local image priors for automatic
image colorization with simultaneous classification,” ACM Trans.
Graph., vol. 35, no. 4, pp. 110:1–110:11, 2016.
[10] P. Vitoria, L. Zhang, ”ChromaGAN: Colorization with a GAN using
chrominance and luminance color spaces,” Pattern Recognition, 2022.
[11] S. Kumar, R. Srivastava, and S. Yadav, ”Image Colorization Using
Generative Adversarial Networks and Transfer Learning,” International
Journal of Innovative Technology and Exploring Engineering (IJITEE),
2021.
[12] P. Serrano, S. Bharadwaj, and R. Diaz, ”Portrait Image Colorization
Using Conditional GANs,” WACV, 2017.
[13] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, ”Image Quality
Assessment: From Error Visibility to Structural Similarity,” IEEE
Trans- actions on Image Processing, 2004.
[14] D. Huynh-Thu and M. Ghanbari, ”Scope of Validity of PSNR in
Image/Video Quality Assessment,” Electronics Letters, 2008.
[15] P. Vitoria, L. Sousa, and P. Quelhas, ”ChromaGAN: Adversarial picture
colorization with semantic class distribution,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2020, pp. 0–0.
[16] Z. Su, J. Wang, and C. Hu, ”Lightweight image colorization with gen-
erative adversarial networks,” IEEE Access, vol. 7, pp. 170804–170816,
2019.
[17] S. N. Ali, P. Kumar, and S. Jain, ”cGAN-based image colorization using
semantic segmentation,” in Proc. Int. Conf. Pattern Recognit. Mach.
Intell. (PRMI), 2021, pp. 334–342.
[18] A. Radford, L. Metz, and S. Chintala, ”Unsupervised Representation
Learning with Deep Convolutional Generative Adversarial Networks,”
ICLR, 2016.
[19] R. Zhang, P. Isola, and A. Efros, ”Colorful Image Colorization,” ECCV,
2016.