Colorization Report

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Image Colorization Deep Learning

Convolutional Neural Network

Introduction

Image colorization is the process of assigning colors to a grayscale image to make it more
aesthetically appealing and perceptually meaningful. These are recognized as sophisticated
tasks than often require prior knowledge of image content and manual adjustments to achieve
artifact-free quality. Also, since objects can have different colors, there are many possible
ways to assign colors to pixels in an image, which means there is no unique solution to this
problem.
Nowadays, image colorization is usually done by hand in Photoshop. Many institutions use
image colorization services for assigning colors to grayscale historic images. There is also for
colorization purposes in the documentation image. However, using Photoshop for this
purpose requires more energy and time. One solution to this problem is to use machine
learning / deep learning techniques.
Recently, deep learning has gained increasing attention among researchers in the field of
computer vision and image processing. As a typical technique, convolutional neural
network (CNNs) have been well-studied and successfully applied to several tasks such as
image recognition, image reconstruction, image generation, etc. (Nguyen et al., 2016)

A CNN consists of multiple layers of small computational units that only process portions of
the input image in a feed-forward fashion. Each layer is the result of applying various image
filters, each of which extracts a certain feature of the input image, to the previous layer. Thus,
each layer may contain useful information about the input image at different levels of
abstraction.

Color Representation

So, how do we render an image, the basics of digital colors, and the main logic for our neural
network. We can say that grayscale images can be represented in grids of pixels.
Each pixel has a value that corresponds to its brightness. The values span from 0–255, from
black to white. While, a color image consist of three layers: Red, Green, Blue (RGB) layer.
Let’s imagine splitting a green leaf on a white background into three channels. As we know
that the color of the leaf is only consist of the green layer. But, the leaf actually present in all
three layers. The layes not only determine color, but also brightness.

Just like grayscale images, each layer in a color image has value from 0-255. The value 0 means that it
has no color in that layer. If the value is 0 for all color channels, then the image pixel is black. A neural
network creates a relationship between an input value and output value. In this project the network
needs to find the traits that link grayscale images with colored ones. So, we should search for the
features that link a grid od grayscale values to the three color grids.

Defining the Colorization Problem

Our final output is a colored image. We have a grayscale image for the input and we want to
predict two color layers, the ab in Lab. To create the final color image we’ll include the
L/grayscale image we used for the input. The result will be creating a Lab image.

How we turn one layer into two layer? We use a convolutional filters. Let’s say them as the
red and blue filter in 3D glasses. They can highlight or remove something to extract
information out of the picture. The network can either create a new image from a filter or
combine several filters into one image.

Convert the image from RGB to Lab

This part is important because we working on color images anyway or we working on RGB
image, meaning every image is very important and every channel is very important and we
need to predict the value in every channel. So instead of doing that, for this project the easy
way is by converting the RGB to Lab. Before we jump into the code, we should know about
the CIELAB color space into this diagram.

The CIELAB, or CIE L* a* b, color system represents quantitative relationship of colors on


three axes: L value indicates lightness, and a* and b* are chromaticity coordinates. On the
color space diagram, L* is represented on a vertical axis with values from 0 (black) to 100
(white). The a* value indicates red-green component of a color, where +a* (positive) and -a*
(negative) indicate red and green values, respectively. The yellow and blue components are
represented on the b* axis as +b* (positive) and -b* (negative) values, respectively. At the
center of the plane is neutral or achromatic. The distance from the central axis represents the
chroma (C), or saturation of the color. The angle on the chromaticity axes represents the hue
(ho ). The L, a, and b values can be transcribed to dermatological parameters. The L* value
correlates with the level of pigmentation of the skin. The a* value correlates with erythema.
The b* value correlates with pigmentation and tanning.
Now, let’s first define the colorization problem in terms of the CIE Lab color space. Like the
RGB color space, it is a 3-channel color space, but unlike the RGB color space, color
information is encoded only in the a (green-red component) and b (blue-yellow component)
channels. The L (lightness) channel encodes intensity information only.
By iterating on each image, we convert the RGB to Lab. Think of Lab image as a grey image
in L channel and all color info stored in A and B channels. The input to the network will be
the L channel, so we assign L channel to X vector. And assign A and B to Y.
To change the RGB into Lab image we using rgb2lab() function from skimage library.
After converting the color space using the function rgb2lab() we select the grayscale layer
with: [ : , : , 0]. This is our input for the neural network. [ : , : , 1: ] selects the
two color layers, green–red and blue–yellow.
The Lab color space has a different range in comparison to RGB. The color spectrum ab in
Lab ranges from -128 to 128. By dividing all values in the output layer by 128, we bound the
range between -1 and 1. We match it with our neural network, which also returns values
between -1 and 1.
Model Architecture

CNN Architecture for Colorization

The architecture proposed by Zhang et al is a VGG-style network with multiple convolutional


blocks. Each block has two or three convolutional layers followed by a Rectified Linear Unit
(ReLU) and terminating in a Batch Normalization layer. Unlike the VGG net, there are no
pooling or fully connected layers.

Image above is about CNN architecture for colorization.


Encoder As we can see from the image above, the input image is rescaled to 224×224. The
input represented by H x W x 1 (L component) a grayscale images. While the output H/8 x
W/8 x 512 feature reperestation. It uses 8 Convolutional layers with 3×3 kernels that
alternate stride 1 and padding to preserve teh input size, stride 2 to halve the input size. The
encoder network, each convolutional layer uses a ReLu activation function.
Decoder To create convolutional layers to up-sampling. The final output H x W x 2 (ab
component) that applies a series of convolutional layers. For the last layer we use tanh instead
of Relu. This is because we are colorizing the image in this layer using 2 filters, A and B. A
and B values range between -1 and 1 so tanh (or hyperbolic tangent) is used as it also has the
range between -1 and 1. Other functions go from 0 to 1.

From the left side we have the grayscale input, our filters, and the prediction from our neural
network. We map the predicted values and the real values within the same interval. This way,
we can compare the values. The interval ranges from -1 to 1. To map the predicted values, we
use a tanh activation function. For any value you give the tanh function, it will return -1 to 1.
The true color values range between -128 and 128. This is the default interval in the Lab
color space. By dividing them by 128, they too fall within the -1 to 1 interval. This
“normalization” enables us to compare the error from our prediction. After calculating the
final error, the network updates the filters to reduce the total error. The network continues in
this loop until the error is as low as possible.

You might also like