Image Colorization Using Color-Features and Adversarial Learning

This document presents a novel image colorization method that enhances color accuracy and realism by introducing a Color Encoder and a Color Loss mechanism, addressing common issues like desaturation and color bleeding. The method employs a Conditional Wasserstein Generative Adversarial Network (CWGAN) architecture to improve adversarial training and utilizes an attention mechanism for better feature representation. Experimental results demonstrate that this approach significantly outperforms existing state-of-the-art methods in realism and precision.

Uploaded by

A0554

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views11 pages

Image Colorization Using Color-Features and Adversarial Learning

Uploaded by

A0554

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Received 30 October 2023, accepted 16 November 2023, date of publication 20 November 2023,

date of current version 29 November 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3335225

Image Colorization Using Color-Features and

Adversarial Learning
HAMZA SHAFIQ AND BUMSHIK LEE , (Member, IEEE)
Department of Information and Communication Engineering, Chosun University, Gwangju 61452, South Korea
Corresponding author: Bumshik Lee ([email protected])
This work was supported by a research fund from Chosun University, 2022.

ABSTRACT In this study, we introduce an innovative image colorization method that not only improves
color accuracy and realism but also addresses common issues found in existing methods, such as desaturation
and color bleeding. Our proposed method features a novel component called the Color Encoder, which
extracts intrinsic color features. Moreover, the proposed Color Encoder aligns essential color features
systematically, drawn from a random normal distribution, with real colors. These aligned features are fused
at the bottleneck and serve as the foundation for subsequent colorization. Complementing the Color Encoder
is our Color Loss mechanism, which aims to align the extracted features from the Color Encoder with the
ground-truth color features, enhancing overall color representation accuracy. We also employ a Conditional
Wasserstein Generative Adversarial Network (CWGAN) architecture within the framework of a Generative
Adversarial Network (GAN) to improve adversarial training and colorization accuracy. To enhance feature
representation, we incorporate an attention mechanism at the bottleneck of each encoder layer, further
refining our model’s ability to capture essential image details. Experimental results show that our approach
significantly outperforms other state-of-the-art methods in terms of both realism and precision, striking a
well-balanced performance.

INDEX TERMS Image colorization, generative adversarial network, image enhancement.

I. INTRODUCTION the amount of manual labor involved, they still required a

Image colorization is the process of adding color to grayscale certain degree of artistic expertise and a knowledge of color
images, which can be traced back to their origins in traditional theory to achieve the desired results.
artistic practices. Historically, artists employed a manual pro- Over the past decade, image colorization has significantly
cess to add color to monochromatic photographs, dedicating evolved owing to the emergence of machine learning, par-
meticulous attention to preserving the integrity of the original ticularly deep learning. This advancement has led to a shift
image and maintaining realism. The procedure required sig- from a manual and artist-dependent approach to an automated
nificant labor intensity and relied heavily on artistic skill [1]. approach to image colorization. Various methodologies, such
The processes for colorizing images also evolved alongside as color transfer [3] and exemplar-based colorization [4],
the development of technology. Semi-automated colorization were introduced in the initial stages of automated coloriza-
techniques became prevalent in the era of digital images and tion. These approaches rely on reference colored images to
sophisticated software, relying on user inputs to guide the guide the colorization process. The emergence of deep learn-
colorization process. These strategies include scribble-based ing has facilitated the utilization of Convolutional Neural
colorization, in which the user provides color scribbles as Networks (CNNs) [5] and Generative Adversarial Networks
input, and the software propagates these colors throughout the (GANs) [6], which learn to colorize images from extensive
image [2]. Although these techniques were able to decrease datasets without explicit instructions [7], [8].
Despite the achievements in image colorization using
The associate editor coordinating the review of this manuscript and these techniques, challenges still exist without a comprehen-
approving it for publication was Yizhang Jiang . sive solution. The process of transitioning from grayscale
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 132811
H. Shafiq, B. Lee: Image Colorization Using Color-Features and Adversarial Learning

to color involves converting a single-channel image into a hand-colored with water colors, oils, crayons, or other
three-channel image using the RGB color model. The map- dyes [11]. Although the manual technique described above
ping process is likely to be non-deterministic, indicating that a was characterized by its subjectivity and reliance on the
grayscale image can be colorized in numerous ways. Human artist’s expertise and understanding, it nonetheless marked
perception is vital for this process. The perception of color the early desire to infuse life into monochrome imagery.
labeled as ‘realistic’ can vary among individuals [9]. Another With the introduction of digital image processing, semi-
significant obstacle to image colorization is color bleeding, automated image colorization techniques have become
which can be a severe problem because the assigned col- available. Levin et al. [12] proposed a semiautomatic
ors extend beyond the object borders, resulting in unnatural image colorization technique in which an interactive and
effects. Color bleeding mainly occurs when colors from one optimization-based strategy with minimal user input was
object bleed into nearby areas, with a loss of object differenti- introduced. The user is required to provide a few color
ation. In addition, the phenomenon of ‘‘hallucinated’’ details scribbles and the framework subsequently distributes these
in image colorization also remains a crucial problem. This colors across the entire image by considering the similar-
phenomenon involves neural networks generating colors or ities between the pixels. Luan et al. [13] introduced an
intricate details that were not present in the original image. optimization-based method for colorization by extending [12]
While such additions can enhance visual appeal, they may and enhanced the propagation of colors by reducing user
deviate from historical accuracy [10]. scribbles. Qu et al. [14] further improved the technique in [13]
These issues have made it necessary to perform intensive by integrating spatial and range regularization to manage
research and develop novel methods for image colorization. color propagation effectively.
Hence, we introduce a novel image colorization model that The emergence of machine learning has had a profound
tackles the above mentioned issues and improves the overall influence on image colorization. Ironi et al. introduced
quality and accuracy using the proposed Color Encoder and automated techniques for colorization in which colors are
GAN architecture. The Color Encoder plays a crucial role transferred from a reference image to a grayscale target image
in addressing the issue of undersaturation by successfully using feature similarities [3]. Welsh et al. also used color
incorporating color features, resulting in a more realistic and transfer but matched the source and target images using
vivid colorization result. However, the integration of the Con- texture descriptors [4]. Particularly, deep-learning algorithms
volutional Block Attention Module (CBAM) aims to mitigate allow image colorization to be more automated, accurate,
color bleeding problems by improving the network’s ability to and consistent. Cheng et al. proposed an early deep-learning
focus on crucial image components, leading to more accurate model that utilized a deep CNN as a regression task for col-
and localized color deployment. Furthermore, the incorpo- orization to predict the chrominance value of each pixel [15].
ration of Conditional Wasserstein Generative Adversarial Zhang et al. made a notable contribution with an end-to-end
Networks (CWGAN) increases the overall visual appeal model that employed a classification objective function rather
by guiding the colorization process to align with natural than regression loss, leading to the production of more vivid
color distributions and enhancing image finer details. These and realistic colorizations [6]. Taking advantage of the nature
elements collectively empower our method to offer a com- of human color perception, this work used the CIELAB color
prehensive solution to the challenges encountered in image space to anticipate the ‘a’ and ‘b’ channels from the input ‘L’
colorization, providing both accuracy and visual appeal. The channel. The instance-aware image colorization method pro-
key contributions of this paper are given below. posed by Su et al. focuses on achieving precise colorization
• Introduction of a novel Color Encoder, capable of using object instances [16].
learning and generating essential color features to GANs have also been investigated for image-colorization
tackle undersaturation issues and produce colorizations tasks. The Pix2Pix framework [17] employs a conditional
that closely resemble real-world colors. GAN (cGAN) to acquire knowledge of the mapping between
• Integration of novel Color Loss to measure the disparity the input and output images. This framework has also been
between color features generated by the Color Encoder used effectively for colorization [17]. Moreover, there has
and actual color features obtained from ground truth been a surge in the popularity of open-source projects
(GT). The Color Loss plays a crucial role in training such as ‘‘DeOldify’’ that focus on adding color to his-
the Color Encode. torical images [18]. A further noteworthy contribution in
• Incorporation of the CBAM within the architecture of the field is the research conducted by Vitoria et al., titled
the CWGAN. CBAM enhances the visual appeal of ‘‘ChromaGAN.’’ This study uses adversarial learning tech-
colorization results by focusing attention on impor- niques to achieve colorization, with a specific emphasis
tant image features, contributing to more accurate and on the distribution of semantic classes [19]. Furthermore,
visually pleasing colorizations. CBAM [20] is integrated into the ChromaGAN architec-
ture to improve colorization [21]. Wang et al. [22] pro-
II. RELATED WORKS posed DualGAN, a dual-path generative adversarial network
The process of colorizing images extends back to the early for image colorization, which consists of two generator
twentieth century, when black-and-white photographs were streams, one for low-frequency and one for high-frequency
132812 VOLUME 11, 2023
H. Shafiq, B. Lee: Image Colorization Using Color-Features and Adversarial Learning

colorization. The two streams are trained jointly with a Unlike conventional techniques, the proposed Color Encoder
discriminator to generate photorealistic colorizations. Sub- enhances the colorization process with a deeper under-
sequently, Kim et al. presented ‘‘BigColor,’’ a technique standing of the network with contextual color information,
that utilizes a generative color prior to enhancing natural resulting in significantly improved outcomes. Moreover, our
image colorization [23]. Moreover, the authors propose a novel Color Loss mechanism plays a crucial role in training
GAN-based image colorization method for self-supervised the Color Encoder to generate color features that closely
visual feature learning [38]. This GAN-based image coloriza- resemble the original color features. The focus on the Color
tion method is based on the cGAN architecture, with a new Encoder and the specific Color Loss significantly improves
loss function, a multi-scale discriminator, and a channel and the effectiveness and quality of image colorization, setting
spatial attention mechanism. Furthermore, Liu and Tu [39] our model apart from conventional approaches.
propose a PatchGAN-based image colorization model incor-
porating a CBAM. Liu et al. show that CBAM can help the III. PROPOSED METHODOLOGY
model focus on important regions of the image, leading to The proposed image colorization model leverages the power
improved performance on benchmark datasets. of deep learning, specifically using GAN to achieve accurate
Recently, the popularity of transformers [24] in and visually appealing colorization results. The proposed
vision-based applications has increased. Transformers are a architecture consists of a generator network that learns to
class of neural networks that are designed to effectively pro- map the luminance (L) channel of the input grayscale image
cess and manage long-range relationships. The Colorization to the chrominance (a and b) channels in the CIELAB color
Transformer, by Kumar et al. emerged as an early exam- space [31], and a discriminator network that learns to identify
ple of transformers employed in image colorization [25]. real images from the GT and the fake image generated by the
The grayscale image was coarsely colored using a coloriza- generator. The CIELAB color space was chosen in this study
tion transformer composed of a conditional autoregressive because it closely approximates human vision, making it
transformer. Subsequently, two fully parallel networks are ideal for perceptually meaningful color transformations [31].
employed to upsample the coarse color, resulting in a finely With the L ∗ component for luminance and the a∗ and b∗
colored high-resolution image. components for color along the green-red and blue-yellow
In addition to colorization transformers, subsequent studies axes, respectively, CIELAB successfully isolates color infor-
have employed transformers for image colorization. CT2 [26] mation from intensity information. Moreover, the CIELAB
is another image colorization technique that employs a color space provides perceptually uniform representations of
transformer-based approach. In this method, colors are colors, ensuring that the Euclidean distance between col-
encoded as tokens, and the interaction between grayscale ors aligns more precisely with human perception of color
image patches and color tokens is guided by the color distinctions.
attention and query modules. Furthermore, DDColor [27] Let x L be the input grayscale image with a luminance
proposes a dual decoder GAN architecture for image col- channel (L) and x ab be the GT image with chrominance
orization. The first decoder generates a coarse colorization, channels (ab). x L is the input to the generator, and the output
while the second decoder refines the colorization and adds chrominance image yab is generated using (1).
semantic details.
Despite considerable advancements in image colorization, yab = G(x L ) (1)
the existing techniques have several limitations. Manual and
semiautomated techniques are associated with significant where G represents the generator function trained to produce
time consumption, require a certain level of artistic expertise, an image containing the chrominance channels yab . These
and may yield inconsistent outcomes [28]. Although pow- chrominance channels produce a colorized image, referred
erful, deep-learning models are computationally expensive, to as yLab when combined with the original input luminance
require large amounts of data, and often produce desaturated channel, x L . In other words. yLab is the final colorized image
outputs owing to the conservative nature of loss functions [8]. produced by the generator G, which combines the luminance
Furthermore, they struggle with multiple complicated images information x L with the chrominance channel yab . The out-
because of the difficulty in learning high-level semantics [29]. put colorized image is characterized by its coherency and
The issue of color ambiguity remains unresolved in most visual appeal. Subsequently, it is converted from the CIELAB
existing models. Models also tend to ‘hallucinate’ details, color space to the RGB color space for the final output.
which can be problematic for historical and archival image Figure 1 shows the overall architecture of the proposed net-
colorization, where color authenticity is crucial [30]. work. As shown in Figure 1, the luminance image is extracted
To address these problems, we propose a novel image from the input image and fed to the generator. The gener-
colorization method based on a CWGAN, which uses atten- ator contains an encoder, a decoder, and a Color Encoder.
tion and Color Encoder in the generator. In our proposed The generator produced a yab image containing chrominance
method, we propose a novel colorization element, the Color channels. The generated image is fed to the discriminator
Encoder, which significantly amplifies the colorization pro- along with the input image as a condition. The discriminator
cess by incorporating a comprehensive array of color features. generates scores based on real and false images.