Image_Colorization_Using_Color-Features_and_Adversarial_Learning
Image_Colorization_Using_Color-Features_and_Adversarial_Learning
ABSTRACT In this study, we introduce an innovative image colorization method that not only improves
color accuracy and realism but also addresses common issues found in existing methods, such as desaturation
and color bleeding. Our proposed method features a novel component called the Color Encoder, which
extracts intrinsic color features. Moreover, the proposed Color Encoder aligns essential color features
systematically, drawn from a random normal distribution, with real colors. These aligned features are fused
at the bottleneck and serve as the foundation for subsequent colorization. Complementing the Color Encoder
is our Color Loss mechanism, which aims to align the extracted features from the Color Encoder with the
ground-truth color features, enhancing overall color representation accuracy. We also employ a Conditional
Wasserstein Generative Adversarial Network (CWGAN) architecture within the framework of a Generative
Adversarial Network (GAN) to improve adversarial training and colorization accuracy. To enhance feature
representation, we incorporate an attention mechanism at the bottleneck of each encoder layer, further
refining our model’s ability to capture essential image details. Experimental results show that our approach
significantly outperforms other state-of-the-art methods in terms of both realism and precision, striking a
well-balanced performance.
to color involves converting a single-channel image into a hand-colored with water colors, oils, crayons, or other
three-channel image using the RGB color model. The map- dyes [11]. Although the manual technique described above
ping process is likely to be non-deterministic, indicating that a was characterized by its subjectivity and reliance on the
grayscale image can be colorized in numerous ways. Human artist’s expertise and understanding, it nonetheless marked
perception is vital for this process. The perception of color the early desire to infuse life into monochrome imagery.
labeled as ‘realistic’ can vary among individuals [9]. Another With the introduction of digital image processing, semi-
significant obstacle to image colorization is color bleeding, automated image colorization techniques have become
which can be a severe problem because the assigned col- available. Levin et al. [12] proposed a semiautomatic
ors extend beyond the object borders, resulting in unnatural image colorization technique in which an interactive and
effects. Color bleeding mainly occurs when colors from one optimization-based strategy with minimal user input was
object bleed into nearby areas, with a loss of object differenti- introduced. The user is required to provide a few color
ation. In addition, the phenomenon of ‘‘hallucinated’’ details scribbles and the framework subsequently distributes these
in image colorization also remains a crucial problem. This colors across the entire image by considering the similar-
phenomenon involves neural networks generating colors or ities between the pixels. Luan et al. [13] introduced an
intricate details that were not present in the original image. optimization-based method for colorization by extending [12]
While such additions can enhance visual appeal, they may and enhanced the propagation of colors by reducing user
deviate from historical accuracy [10]. scribbles. Qu et al. [14] further improved the technique in [13]
These issues have made it necessary to perform intensive by integrating spatial and range regularization to manage
research and develop novel methods for image colorization. color propagation effectively.
Hence, we introduce a novel image colorization model that The emergence of machine learning has had a profound
tackles the above mentioned issues and improves the overall influence on image colorization. Ironi et al. introduced
quality and accuracy using the proposed Color Encoder and automated techniques for colorization in which colors are
GAN architecture. The Color Encoder plays a crucial role transferred from a reference image to a grayscale target image
in addressing the issue of undersaturation by successfully using feature similarities [3]. Welsh et al. also used color
incorporating color features, resulting in a more realistic and transfer but matched the source and target images using
vivid colorization result. However, the integration of the Con- texture descriptors [4]. Particularly, deep-learning algorithms
volutional Block Attention Module (CBAM) aims to mitigate allow image colorization to be more automated, accurate,
color bleeding problems by improving the network’s ability to and consistent. Cheng et al. proposed an early deep-learning
focus on crucial image components, leading to more accurate model that utilized a deep CNN as a regression task for col-
and localized color deployment. Furthermore, the incorpo- orization to predict the chrominance value of each pixel [15].
ration of Conditional Wasserstein Generative Adversarial Zhang et al. made a notable contribution with an end-to-end
Networks (CWGAN) increases the overall visual appeal model that employed a classification objective function rather
by guiding the colorization process to align with natural than regression loss, leading to the production of more vivid
color distributions and enhancing image finer details. These and realistic colorizations [6]. Taking advantage of the nature
elements collectively empower our method to offer a com- of human color perception, this work used the CIELAB color
prehensive solution to the challenges encountered in image space to anticipate the ‘a’ and ‘b’ channels from the input ‘L’
colorization, providing both accuracy and visual appeal. The channel. The instance-aware image colorization method pro-
key contributions of this paper are given below. posed by Su et al. focuses on achieving precise colorization
• Introduction of a novel Color Encoder, capable of using object instances [16].
learning and generating essential color features to GANs have also been investigated for image-colorization
tackle undersaturation issues and produce colorizations tasks. The Pix2Pix framework [17] employs a conditional
that closely resemble real-world colors. GAN (cGAN) to acquire knowledge of the mapping between
• Integration of novel Color Loss to measure the disparity the input and output images. This framework has also been
between color features generated by the Color Encoder used effectively for colorization [17]. Moreover, there has
and actual color features obtained from ground truth been a surge in the popularity of open-source projects
(GT). The Color Loss plays a crucial role in training such as ‘‘DeOldify’’ that focus on adding color to his-
the Color Encode. torical images [18]. A further noteworthy contribution in
• Incorporation of the CBAM within the architecture of the field is the research conducted by Vitoria et al., titled
the CWGAN. CBAM enhances the visual appeal of ‘‘ChromaGAN.’’ This study uses adversarial learning tech-
colorization results by focusing attention on impor- niques to achieve colorization, with a specific emphasis
tant image features, contributing to more accurate and on the distribution of semantic classes [19]. Furthermore,
visually pleasing colorizations. CBAM [20] is integrated into the ChromaGAN architec-
ture to improve colorization [21]. Wang et al. [22] pro-
II. RELATED WORKS posed DualGAN, a dual-path generative adversarial network
The process of colorizing images extends back to the early for image colorization, which consists of two generator
twentieth century, when black-and-white photographs were streams, one for low-frequency and one for high-frequency
132812 VOLUME 11, 2023
H. Shafiq, B. Lee: Image Colorization Using Color-Features and Adversarial Learning
colorization. The two streams are trained jointly with a Unlike conventional techniques, the proposed Color Encoder
discriminator to generate photorealistic colorizations. Sub- enhances the colorization process with a deeper under-
sequently, Kim et al. presented ‘‘BigColor,’’ a technique standing of the network with contextual color information,
that utilizes a generative color prior to enhancing natural resulting in significantly improved outcomes. Moreover, our
image colorization [23]. Moreover, the authors propose a novel Color Loss mechanism plays a crucial role in training
GAN-based image colorization method for self-supervised the Color Encoder to generate color features that closely
visual feature learning [38]. This GAN-based image coloriza- resemble the original color features. The focus on the Color
tion method is based on the cGAN architecture, with a new Encoder and the specific Color Loss significantly improves
loss function, a multi-scale discriminator, and a channel and the effectiveness and quality of image colorization, setting
spatial attention mechanism. Furthermore, Liu and Tu [39] our model apart from conventional approaches.
propose a PatchGAN-based image colorization model incor-
porating a CBAM. Liu et al. show that CBAM can help the III. PROPOSED METHODOLOGY
model focus on important regions of the image, leading to The proposed image colorization model leverages the power
improved performance on benchmark datasets. of deep learning, specifically using GAN to achieve accurate
Recently, the popularity of transformers [24] in and visually appealing colorization results. The proposed
vision-based applications has increased. Transformers are a architecture consists of a generator network that learns to
class of neural networks that are designed to effectively pro- map the luminance (L) channel of the input grayscale image
cess and manage long-range relationships. The Colorization to the chrominance (a and b) channels in the CIELAB color
Transformer, by Kumar et al. emerged as an early exam- space [31], and a discriminator network that learns to identify
ple of transformers employed in image colorization [25]. real images from the GT and the fake image generated by the
The grayscale image was coarsely colored using a coloriza- generator. The CIELAB color space was chosen in this study
tion transformer composed of a conditional autoregressive because it closely approximates human vision, making it
transformer. Subsequently, two fully parallel networks are ideal for perceptually meaningful color transformations [31].
employed to upsample the coarse color, resulting in a finely With the L ∗ component for luminance and the a∗ and b∗
colored high-resolution image. components for color along the green-red and blue-yellow
In addition to colorization transformers, subsequent studies axes, respectively, CIELAB successfully isolates color infor-
have employed transformers for image colorization. CT2 [26] mation from intensity information. Moreover, the CIELAB
is another image colorization technique that employs a color space provides perceptually uniform representations of
transformer-based approach. In this method, colors are colors, ensuring that the Euclidean distance between col-
encoded as tokens, and the interaction between grayscale ors aligns more precisely with human perception of color
image patches and color tokens is guided by the color distinctions.
attention and query modules. Furthermore, DDColor [27] Let x L be the input grayscale image with a luminance
proposes a dual decoder GAN architecture for image col- channel (L) and x ab be the GT image with chrominance
orization. The first decoder generates a coarse colorization, channels (ab). x L is the input to the generator, and the output
while the second decoder refines the colorization and adds chrominance image yab is generated using (1).
semantic details.
Despite considerable advancements in image colorization, yab = G(x L ) (1)
the existing techniques have several limitations. Manual and
semiautomated techniques are associated with significant where G represents the generator function trained to produce
time consumption, require a certain level of artistic expertise, an image containing the chrominance channels yab . These
and may yield inconsistent outcomes [28]. Although pow- chrominance channels produce a colorized image, referred
erful, deep-learning models are computationally expensive, to as yLab when combined with the original input luminance
require large amounts of data, and often produce desaturated channel, x L . In other words. yLab is the final colorized image
outputs owing to the conservative nature of loss functions [8]. produced by the generator G, which combines the luminance
Furthermore, they struggle with multiple complicated images information x L with the chrominance channel yab . The out-
because of the difficulty in learning high-level semantics [29]. put colorized image is characterized by its coherency and
The issue of color ambiguity remains unresolved in most visual appeal. Subsequently, it is converted from the CIELAB
existing models. Models also tend to ‘hallucinate’ details, color space to the RGB color space for the final output.
which can be problematic for historical and archival image Figure 1 shows the overall architecture of the proposed net-
colorization, where color authenticity is crucial [30]. work. As shown in Figure 1, the luminance image is extracted
To address these problems, we propose a novel image from the input image and fed to the generator. The gener-
colorization method based on a CWGAN, which uses atten- ator contains an encoder, a decoder, and a Color Encoder.
tion and Color Encoder in the generator. In our proposed The generator produced a yab image containing chrominance
method, we propose a novel colorization element, the Color channels. The generated image is fed to the discriminator
Encoder, which significantly amplifies the colorization pro- along with the input image as a condition. The discriminator
cess by incorporating a comprehensive array of color features. generates scores based on real and false images.
Furthermore, a CWGAN architecture was utilized during Colorization was improved using a novel Color Encoder
training to enhance the colorization performance of the pro- that creates color characteristics by sampling from a nor-
posed architecture. The CWGAN expands the conventional mal distribution, effectively modeling the color distribution.
GAN framework by integrating the Wasserstein distance as The Color Encoder consists of several convolutional layers
the training criterion in the cGAN [17], which can facili- that extract intrinsic color features from the input grayscale
tate stable and interpretable gradients during training. The image. The convolutional block sizes in the Color Encoder
conditional approach enables the generator to learn col- correspond to 64, 128, 256, and 512 channels, sequentially,
orization patterns specific to the input grayscale images, and it helps to enhance to capture and represent color-related
thereby ensuring that the colorization process is contextually information effectively. The color features of the Color
relevant. Encoder are integrated with the output of the encoder at
the network bottleneck. The integration of Color Encoder
A. GENERATOR ARCHITECTURE and encoder features ensures that the process of colorization
Figure 2 shows the architecture of the generator for the integrates both the local context and global color information,
proposed image-colorization model. The generator is specif- allowing the model to generate colors that are coherent and
ically designed to capture complex color relationships and realistic. The Color Encoder, a key component of our model,
accurately maintain spatial coherence. It comprises an not only generates rich color features but also undergoes a
encoder-decoder structure that effectively utilizes both con- crucial training process to ensure their accuracy. This train-
tent and color information. The encoder begins with the ing is facilitated by a specialized metric known as Color
input grayscale image x L and gradually reduces its spatial Loss. Color Loss serves as a guiding mechanism, quantifying
dimensions over the layers while simultaneously extract- the disparity between the features produced by the Color
ing increasingly detailed and intricate features from the Encoder and those derived from GT chrominance images.
image. The encoder consists of six convolutional blocks, Subsequently, the decoder network utilizes the fused features
each followed by batch normalization and ReLU activation. from the encoder and Color Encoder, and gradually upsam-
The convolutional block sequentially reduces the size of an ples the features to the original resolution. Skip connections
image and increases the channels to 64, 128, 256 and 512, from the encoder layers enable the decoder to access detailed
respectively. The decoder, a mirror of the encoder, upsam- spatial information and preserve fine textures in the colorized
ples low-resolution feature maps obtained from the encoder. output. Finally, the output of the decoder is a chrominance
Like the encoder, the decoder consists of six convolutional image.
blocks, utilizing batch normalization and ReLU activation.
The convolutional block sizes in reverse order are 512, 256, 1) COLOR ENCODER
128, and 64 channels, respectively. Moreover, as shown in In our image colorization model, we propose a novel Color
Figure 2, CBAM [20] is used at each encoding stage, where Encoder module that enhances colorization performance by
the attention mechanism enhances feature representations by producing detailed color characteristics. The Color Encoder
selectively focusing on salient regions while disregarding module with a normally distributed input and convolution
unnecessary information. The encoder outputs are subse- blocks is shown in Figure 2. The functionality of the Color
quently transmitted to the decoder through skip connections Encoder is based on the integration of learned color features
as in U-Net [32], which retains fine-grained details and struc- into the colorization process, which is a crucial step in attain-
tural information to facilitate the colorization procedure. ing precise and realistic colorization.
Colorization is initiated using a random normal distribu- generated by the Color Encoder were in accordance with the
tion that serves as a representation of the possible colors. reference features derived from the GT images. The similarity
The random normal distribution goes through a sequence of between two features is evaluated using a loss function that
convolutional layers designed to extract and enhance the color quantifies the degree of similarity between the features of
features. The process of transitioning over these layers helps the Color Encoder and the reference features. The Color
remove randomness because the initial state of randomness Encoder can then generate features that resemble the color
gradually becomes an organized color feature. An essential of the GT images by minimizing the loss. This approach
step is required to guarantee that these created features gen- converts a random normal distribution into color features
uinely correlate with the real-world color features. The color that are not only semantically meaningful, but also inher-
features generated by the Color Encoder are compared with ently aligned with real color patterns. Moreover, the Color
those obtained from a pretrained VGG network with GT Encoder is trained jointly with the GAN. In the proposed
color images, where a set of reference features that capture colorization model, the Color Encoder plays a crucial role in
the essence of authentic color compositions is extracted. The generating colorized outputs that are visually appealing and
primary goal of the comparison was to ensure that the features realistic.
value or mean, and λ is a hyperparameter denoting the weight 256 × 256 pixels using bilinear interpolation. The experiment
or coefficient for the GP term. In our context, the Wasserstein showed that scaling outperformed random cropping, owing
loss is a crucial part of our model’s training, guiding the to the potential adverse impact of cropping on color learning.
generator to create more realistic colorizations by minimizing The input images underwent normalization, resulting in their
the difference between the distribution of generated and real values being adjusted to fall within the range of −1 to 1.
chrominance images. During the training process, the input images were split into
In (2), Lc refers ‘‘Color Loss,’’ which is a particular metric L and ab channels, with the L channel serving as the input
that measures difference between feature produced by the and the ab channel as the GT. We set 80% training, 10%
Color Encoder and one from a pretrained VGG network. The validation, and 10% testing sets, respectively, which are taken
difference is calculated by L1 distance. The Color Loss based from different image categories. The other methods were
on L1 distance is distinct from the LL1 loss mentioned in trained on the same training set for fair comparison. The
the overall loss function of the model, which evaluates the epoch for training is set to 800, and the entire training takes
model accuracy in replicating real colors. Here L1 distance approximately 2.5 days with GeForce RTX 3090 GPU.
is only used to quantify the difference between actual color The ADAM optimizer was used in our experiment with
features (from pretrained VGG network) and generated ones learning rates of 1 × 10.4 and 2 × 10.4 for the generator and
(from Color Encoder). The Color Loss measures the align- discriminator, respectively. The values for the exponential
ment between the learned color features and representations decay rates β1 and β2 in the ADAM optimizer [34] were set
obtained from the GT color images. Color Loss is defined to 0.5 and 0.999, respectively. The training process involved
by (5). iteratively optimizing the generator and discriminator until
the network converged.
Lc = E Gf N µ, σ 2 − VGG x ab (5)
1
B. RESULTS AND DISCUSSION
where Lc is the Color Loss while N (µ, σ ) is the random nor-
2
mal distribution with mean µ = 0 and variance σ 2 = 0.1 and We used a variety of well-established assessment metrics
Gf , VGG are the functions for Color Encoder through which including the Peak Signal-to-Noise Ratio (PSNR) [35], Struc-
random normal distribution is passed and pretrained VGG tural Similarity Index (SSIM) [36], and colorfulness [37]
network, respectively. We use a two-step process, initially to thoroughly analyze the performance of the proposed
extracting features from the GT chrominance image using a image colorization model. The Peak Signal-to-Noise Ratio
pretrained VGG network, and then comparing these features (PSNR) is a commonly used metric in image coloriza-
with those generated by the Color Encoder, which receives a tion that measures the accuracy of colorized images by
random normal distribution as input, ensuring that obtained calculating the ratio between the maximum possible pixel
color features closely resemble the GT. value and mean-squared error between the generated chromi-
LL1 is the conventional L1 loss defined as (6). nance image and the corresponding GT chrominance image.
A higher PSNR signifies a higher degree of similarity in pixel
LL1 = x ab − yab (6) values, indicating improved image quality. The Structural
1
Similarity Index (SSIM) is a perceptual metric used to assess
L1 loss measures the accuracy of the colorization model
the degree of structural similarity between colored images
in replicating real colors. We also used perceptual loss to
and their corresponding GT. This assessment considers lumi-
evaluate the perceptual similarity between the colorized and
nance, contrast, and structure, thereby providing a more
GT images. It leverages high-level features from a pretrained
human-centered evaluation of the visual fidelity. SSIM is a
VGG network to ensure that the colorized output captures the
metric that quantifies the similarity between two images in the
overall structure, textures, and patterns of the real image. The
range of -1 to 1, where a value of 1 indicates a perfect match
VGG loss can be defined by the rectified linear unit activation
between the images. We computed PSNR and SSIM values
layer of the pretrained VGG network, as in (7).
with the resulting chrominance images (ab channel) and their
2
Lp = ϕk x ab − ϕk yab (7) corresponding GTs (ab channel). This approach evaluates
2 the quality of colorization with a focus on the chrominance
k denotes the layer index with 0, 1, 2, and 3, signaling aspect, which is a key component of the color informa-
the layers from which features are extracted in the VGG tion, and provides a comprehensive assessment that fully
network and ϕk represents the features of the k-th layer of incorporates color information when comparing the perfor-
the pretrained VGG network. mance of different image colorization methods. The metrics
of PSNR and SSIM are valuable in this context as they are
IV. EXPERIMENTAL RESULTS well-established metrics for quantifying image quality, and
A. IMPLEMENTATION DETAILS they allow for a quantitative comparison of our method with
The experiment was conducted using the PASCAL-VOC existing techniques. PSNR and SSIM, designed for grayscale
dataset [33], which consists of 17,125 publicly available images, are commonly used metrics for assessing colorization
images. The supplied images underwent a resizing pro- models. Despite their original intent, they effectively capture
cess, in which they were transformed to a resolution of key aspects of image quality, providing valuable insights into
the fidelity of colorized outputs. However, the colorfulness model and the other models. As shown in Table 1, our
metric [37] is used to measure the color diversity and sat- proposed method demonstrated significantly improved per-
uration of an image. A colorfulness metric [37] quantifies formance compared with other methods in terms of PSNR
the level of colorfulness using colorized images and mea- values, with higher similarity to the original color images.
sures the degree of saturation and vividness exhibited by The capacity of the model to replicate the structural attributes
the colors present in the images. Higher values indicate that of real images with perceptual accuracy was further high-
the colorized images demonstrate a greater abundance and lighted using SSIM.
intensity of color. The colorfulness of an image is computed Desaturation, which causes muted or less vibrant colors,
by evaluating the standard deviation of its color channels. can be mitigated by integrating the proposed Color Encoder.
A higher standard deviation implies more color diversity and, The Color Encoder effectively incorporates the learned color
hence, higher colorfulness. features into the network at the bottleneck, enhancing the
The △Colorfulness metric is also proposed to allevi- intensity and vibrancy of the generated colors. Furthermore,
ate the drawback of the colorfulness metric, which could color bleeding, where colors can inadvertently spread beyond
favor vibrancy over realism and accuracy, and quantifies object boundaries, can be resolved by integrating the CBAM.
the difference in colorfulness values between the output and The CBAM attention mechanism operates at the bottleneck
GT images. △Colorfulness is obtained by calculating the of each layer in the encoder part, allowing the model to
absolute difference in the colorfulness values between out- focus on local and global features. The selective attention
put and GT images and quantitatively measures how much of CBAM helps confine colorization to appropriate regions,
the perceived colorfulness changes between these images. significantly lowering the bleed effect with accurate and
A lower △Colorfulness value indicates a higher level of color realistic colors. With the help of these strategic integrations,
accuracy. our model demonstrates improved color accuracy and realism
The performance of the proposed image coloriza- while effectively reducing desaturation and color bleeding,
tion model was compared with that of a comprehen- as shown in Figure 4.
sive set of seven state-of-the-art colorization models: Our proposed method has a lower △Colorfulness value
CIC [8], Deoldify [18], Image colorization with CBAM than existing state-of-the-art models, indicating that the out-
(ICCBAM) [39], Pix2Pix [17], BigColor [22], InstColor [16], puts of the proposed method are more coherent to GTs.
ChromaGAN [19], ColTran [25], and CT2 [26]. Table 1 The obtained value indicates that our colorizations achieve a
presents a quantitative comparison between the proposed better balance between color intensity (vibrance) and realism.
C. ABLATION STUDIES
Although some models obtained very higher colorfulness Ablation studies were performed to investigate the impact of
scores with rare and vivid colors such as ICCBAM, the output two key elements (Color Encoder and CBAM modules) in
images appeared to be overly saturated and unrealistic as our image colorization model. To evaluate the effect of the
shown in Figure 4. In contrast, our method achieves a lower Color Encoder and CBAM modules, the model was trained
△Colorfulness value, as shown in Table 1, indicating that our and tested using two turning-on/off modules. Table 2 presents
proposed model can produce outputs that closely resemble the results of these ablation studies.
real-world GT image color perceptions. This result highlights The ‘‘Base’’ model, which turns off CBAM and the Color
that our model tends to make colorized images appear both Encoder, shows relatively lower performance for multiple
realistic and visually appealing. This strikes a good balance metrics. The model only with Color Encoder (‘‘A’’ model) by
between the two, so the colors look real, and the images are turning-off CBAM shows improvement in PSNR and SSIM
pleasing to the eye. with worse Colorfulness.
Figure 4 presents the visual results and comparisons of The model with only CBAM by the turning-off Color
the proposed colorization model with other state-of-the-art Encoder shows a similar improvement as case A. However,
methods. As shown in Figure 4, the outputs of our proposed our proposed method surpasses these variants, indicating its
network achieved more naturalness and realism, which is capacity for superior image colorization quality.
noticeable in the fourth row of Figure 4, where our method This observation indicates that the inclusion of the CBAM
shows natural colors in numbers on a t-shirt. In contrast, has a notable impact on enhancing the attention mechanism
the other models struggle with the task of colorization and of the model, leading to improved precision in the gener-
do not achieve the same level of precision. BigColor shows ated outcomes. In addition, the Color Encoder significantly
oversaturation in the output color images, resulting in an enhances the realism and colorfulness of images by capturing
and incorporating essential color features. This performance visual perception. The significance of the proposed Color
improvement is attributed to the ability of the Color Encoder Encoder and CBAM module in producing precise color
to extract and integrate crucial color information, which details and emphasizing important features was demonstrated
enriches the colorization process and results in more realistic through ablation studies. In conclusion, the proposed image
and vibrant colorized images. colorization model outperformed state-of-the-art methods by
To investigate the efficacy of the Color Encoder, we per- producing colorized images that were both visually appealing
formed experiments by adding the Color Encoder in the and realistic.
Pix2Pix and ChromaGAN frameworks. Table 3 shows the
effect of the Color Encoder on other colorization methods. REFERENCES
As shown in Table 3, the Color Encoder leads to improve- [1] A. R. Smith, ‘‘Color gamut transform pairs,’’ in Proc. 5th Annu. Conf.
ments in terms of PSNR, SSIM, and Colorfulness when it Comput. Graph. Interact. Techn., Aug. 1978, pp. 12–19.
is integrated into both methods. The improvement is more [2] L. Yatziv and G. Sapiro, ‘‘Fast image and video colorization using
chrominance blending,’’ IEEE Trans. Image Process., vol. 15, no. 5,
noticeable in Colorfulness and △Colorfulness for the meth- pp. 1120–1129, May 2006.
ods, highlighting the effect of Color Encoder in creating [3] R. Irony, D. Cohen-Or, and D. Lischinski, ‘‘Colorization by example,’’ in
more vibrant and visually appealing colorized images. These Proc. Eurographics Symp. Rendering Techn., 2005, pp. 201–210.
results indicate that the Color Encoder can improve the col- [4] T. Welsh, M. Ashikhmin, and K. Mueller, ‘‘Transferring color to greyscale
images,’’ ACM Trans. Graph., vol. 21, no. 3, pp. 277–280, Jul. 2002.
orization performance even for the other method as well as [5] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn-
our proposed method. ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11,
Despite the promising results, our proposed colorization pp. 2278–2324, Nov. 1998.
[6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
model often shows failure cases in a specific condition. S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial nets,’’ in
Figure 5 shows the examples. The proposed model shows Proc. NeurIPS, 2014, pp. 2672–2680.
color bleeding and incoherent color allocations with complex [7] G. Larsson, M. Maire, and G. Shakhnarovich, ‘‘Learning representations
or cluttered scenes for the object boundaries and shows hal- for automatic colorization,’’ in Proc. Eur. Conf. Comput. Vis. Cham,
Switzerland: Springer, 2016, pp. 577–593.
lucinated details and unsaturated colorization for low-light [8] R. Zhang, P. Isola, and A. A. Efros, ‘‘Colorful image colorization,’’ in Proc.
images with faded contents or less context information. These Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 649–666.
failure cases underline the necessity for further enhance- [9] A. Deshpande, J. Rock, and D. Forsyth, ‘‘Learning large-scale auto-
matic image colorization,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
ments, particularly in discerning complex scenes and refining Dec. 2015, pp. 567–575.
the adaptation of colors in faded images. [10] L. Chen, C. Doersch, A. Kornblith, G. Hinton, and T. D. Bui, ‘‘BigGAN:
Large scale adversarial representation learning,’’ in Proc. NeurIPS, 2018,
V. CONCLUSION pp. 10824–10839.
This study introduces a novel image colorization model that [11] J. Hill and A. Zakia, The New Manual of Photography. London, U.K.:
Dorling Kindersley Limited, 2008.
utilizes a deep learning approach to generate accurate and
[12] A. Levin, D. Lischinski, and Y. Weiss, ‘‘Colorization using optimization,’’
realistic colorized images. The proposed model incorpo- ACM Trans. Graph., vol. 23, no. 3, pp. 689–694, Aug. 2004.
rates a proposed Color Encoder and CBAM module into a [13] Q. Luan, F. Wen, D. Cohen-Or, L. Liang, Y.-Q. Xu, and H.-Y. Shum,
CWGAN architecture, yielding compelling results in both ‘‘Natural image colorization,’’ in Proc. 18th Eurographics Conf. Rendering
Techn., 2007, pp. 309–320.
quantitative and qualitative evaluations. The effectiveness of [14] Y. Qu, T.-T. Wong, and P.-A. Heng, ‘‘Manga colorization,’’ ACM Trans.
the proposed model is demonstrated through experimentation Graph., vol. 25, no. 3, pp. 1214–1220, Jul. 2006.
and evaluation. The incorporation of Color Loss into the [15] Z. Cheng, Q. Yang, and B. Sheng, ‘‘Deep colorization,’’ in Proc. IEEE Int.
Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 415–423.
comprehensive objective function allowed the generation of
[16] J.-W. Su, H.-K. Chu, and J.-B. Huang, ‘‘Instance-aware image coloriza-
colorizations that demonstrated both a strong resemblance tion,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
to GT images and a perceptual appeal aligned with human Jun. 2020, pp. 7965–7974.
[17] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, ‘‘Image-to-image translation [35] D. Slepian and J. Wolf, ‘‘Noiseless coding of correlated information
with conditional adversarial networks,’’ in Proc. IEEE Conf. Comput. Vis. sources,’’ IEEE Trans. Inf. Theory, vol. IT-19, no. 4, pp. 471–480,
Pattern Recognit. (CVPR), Jul. 2017, pp. 5967–5976. Jul. 1973.
[18] Deoldify. Accessed: Aug. 2023. [Online]. Available: https://github. [36] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, ‘‘Image quality
com/jantic/DeOldify assessment: From error visibility to structural similarity,’’ IEEE Trans.
[19] P. Vitoria, L. Raad, and C. Ballester, ‘‘ChromaGAN: Adversarial picture Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
colorization with semantic class distribution,’’ in Proc. IEEE Winter Conf. [37] D. Hasler and S. Susstrunk, ‘‘Measuring colourfulness in natural images,’’
Appl. Comput. Vis. (WACV), Mar. 2020, pp. 2434–2443. in Proc. IS&T/SPIE Electron. Imag., vol. 12, Jun. 2003, pp. 87–95.
[20] S. Woo, J. Park, J. Lee, and M. Yang, ‘‘CBAM: Convolutional block [38] S. Treneska, E. Zdravevski, I. M. Pires, P. Lameski, and S. Gievska,
attention module,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, ‘‘GAN-based image colorization for self-supervised visual feature learn-
pp. 427–436, doi: 10.1007/978-3-030-01234-2_29. ing,’’ Sensors, vol. 22, no. 4, p. 1599, Feb. 2022, doi: 10.3390/s22041599.
[21] Y. Lu, X. Huang, Y. Zhai, L. Yang, and Y. Wang, ‘‘ColorGAN: Automatic [39] C. Liu and Y. Tu. (2021). Image Colorization With Convolution
image colorization with GAN,’’ in Proc. IEEE 3rd Int. Conf. Inf. Technol., Block Attention Modules. GitHub repository. [Online]. Available:
Big Data Artif. Intell. (ICIBA), vol. 3. Chongqing, China, May 2023, https://github.com/kliu513/Image-Colorization
pp. 212–218, doi: 10.1109/ICIBA56860.2023.10164924.
[22] X. Wang, Y. Wu, Y. Li, H. Zhang, X. Zhao, and Y. Shan, ‘‘DualGAN:
Dual-path generative adversarial network for image colorization,’’ in Proc.
AAAI Conf. Artif. Intell. (AAAI), 2021, pp. 15679–15686.
[23] G. Kim, K. Kang, S. Kim, H. Lee, S. Kim, J. Kim, S.-H. Baek, and S. Cho,
HAMZA SHAFIQ received the B.S. degree in
‘‘BigColor: Colorization using a generative color prior for natural images,’’
electrical engineering from the University of
in Proc. Eur. Conf. Comput. Vis. (ECCV), Oct. 2022, pp. 350–366.
[24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Engineering and Technology, Lahore, Pakistan,
Ł. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. 31st in 2020. He is currently pursuing the M.S. degree
Conf. Neural Inf. Process. Syst. (NIPS), Long Beach, CA, USA, 2017, with the Department of Information and Com-
pp. 6000–6010. munications Engineering, Chosun University,
[25] M. Kumar, D. Weissenborn, and N. Kalchbrenner, ‘‘Colorization trans- South Korea. From 2020 to 2021, he was a
former,’’ in International Conference on Learning Representations, 2021. Research Assistant with the University of Engi-
[26] S. Weng, J. Sun, Y. Li, S. Li, and B. Shi, ‘‘CT2 : Colorization trans- neering and Technology, Lahore. He is a Graduate
former via color tokens,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022, Research Assistant with the Multimedia Infor-
pp. 1–16. mation Processing Laboratory, Chosun University. His research interests
[27] X. Kang, T. Yang, W. Ouyang, P. Ren, L. Li, and X. Xie, ‘‘DDColor: include image restoration and enhancement, image-to-image translation, and
Towards photo-realistic and semantic-aware image colorization via dual medical image processing.
decoders,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Dec. 2023,
pp. 1–10.
[28] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, ‘‘Loss functions for image
restoration with neural networks,’’ IEEE Trans. Comput. Imag., vol. 3,
no. 1, pp. 47–57, Mar. 2017.
[29] X. Liu, M. Gao, Y. Wang, and Q. Chen, ‘‘A local–global L1 -norm based BUMSHIK LEE (Member, IEEE) received the
variational model for image colorization,’’ Neurocomputing, vol. 212, B.S. degree in electrical engineering from Korea
pp. 86–98, Sep. 2016. University, Seoul, South Korea, in 2000, and
[30] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, ‘‘Learning to discover the M.S. and Ph.D. degrees in information
cross-domain relations with generative adversarial networks,’’ in Proc. and communications engineering from the Korea
34th Int. Conf. Mach. Learn., vol. 70, 2017, pp. 1857–1865. Advanced Institute of Science and Technology
[31] Colorimetry—Part 1: CIE Standard Colorimetric Observers, docu-
(KAIST), Daejeon, South Korea, in 2006 and
ment CIE Publication 15.2, International Commission on Illumination
2012, respectively. He was a Research Professor
(CIE), 2004.
[32] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-Net: Convolutional networks
with KAIST, in 2014, and a Postdoctoral Scholar
for biomedical image segmentation,’’ in Proc. Int. Conf. Med. Image with the University of California at San Diego,
Comput. Comput.-Assisted Intervent (MICCAI), 2015, pp. 234–241. San Diego, CA, USA, from 2012 to 2013. He was a Principal Engi-
[33] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, neer with the Advanced Standard Research and Development Laboratory,
‘‘The PASCAL visual object classes (VOC) challenge,’’ Int. J. Comput. LG Electronics, Seoul, from 2015 to 2016. In 2016, he joined the Depart-
Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010. ment of Information and Communications Engineering, Chosun University,
[34] D. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’ South Korea. His research interests include video processing, video security,
in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, and medical image processing.
May 2015, pp. 1–13.