BTP Report
BTP Report
PROJECT ON
COLORIZATION OF GRAYSCALE IMAGES
Submitted By
Ashish Rana (236/CO/15)
Deepanshu Chauhan (247/CO/15)
Gaurav Sharma (255/CO/15)
This is to certify that the work which is being hereby presented by us in this project
titled “Colorization of Grayscale Images” in partial fulfillment of the award of the
Bachelor of Engineering submitted at the Department of Computer Engineering ,
Netaji Subhas Institute of Technology Delhi, is a genuine account of our work carried
out during the period from December 2018 to May 2019 under the guidance of Prof.
Sangeeta Sabharwal, Department of Computer Engineering, Netaji Subhas Institute of
Technology, Delhi. The matter embodied in the project report to the best of our
knowledge has not been submitted for the award of any other degree elsewhere.
Dated:
This is to certify that the above declaration by the students is true to the best of my
knowledge.
2|Page
ACKNOWLEDGMENT
Enumerating and enlisting the individual contributions in the making of the project is
a very difficult task. It took many special people, researchers and their contributions
in this field to enable it and support it. Here we would like to acknowledge their
precious co-operation and express our sincere gratitude to them.
We would like to express our deep gratitude towards our mentor Prof. Sangeeta
Sabharwal, Professor, COE Division, Netaji Subhas Institute of Technology, New
Delhi under whose supervision we completed our work. Her invaluable suggestions,
enlightening comments, and constructive criticism always kept our spirits up during
our work.
We are also thankful to our friends who motivated us at each and every step of this
project. Without their interest in our project, we could not have been gone so far. And
most of all, we would like to thanks our wonderful parents who motivated us from
day one of the projects. You were the lights that lead us.
Our experience of working together has been wonderful. We hope that the knowledge,
practical and theoretical, that we have gained through this term B.E. Project will help
us in our future endeavors in the field.
3|Page
CERTIFICATE
This is to certify that the report entitled “Colorization of Grayscale Images” being
submitted by Ashish Rana, Deepanshu Chauhan, Gaurav Sharma to the Department of
Computer Engineering, NSIT, for the award of bachelor’s degree of engineering, is
the record of the bonafide work carried out by them under our supervision and
guidance. The results contained in this report have not been submitted either in part or
in full to any other university or institute for the award of any degree or diploma.
Supervisors
4|Page
ABSTRACT
The resultant image should be precise. Precise means that the output image should be
represented close to its true natural colors.
It is important to note that the goal of colorization is not to recover the actual ground
truth color but rather to produce a plausible colorization that the user finds useful even
if the colorization differs from the ground truth color.
The software that we have developed has the power to build a deep convolutional
neural network and train it on a number of colored images to extract different features
and understand correlations between them and to use the knowledge gained to predict
the colored images.
5|Page
LIST OF TABLES
6|Page
LIST OF FIGURES
7|Page
TABLE OF CONTENTS
1 Declaration 2
2 Acknowledgment 3
3 Certificate 4
4 Abstract 5
5 List of Tables 6
6 List of Figures 7
7 Chapter-1 Introduction 10
10 1.3 Motivation 11
11 1.4 Goals 13
8|Page
25 3.5 Backpropagation 38
30 Chapter-4 Methodology 47
31 4.1 Approach 47
34 Conclusions 59
35 Future Scope 60
36 References 61
9|Page
Introduction
1
Introduction
1.1 What is Colorization?
It is important to note that the goal of colorization is not to recover the actual ground
truth color, but rather, to produce a plausible colorization that the user finds useful
even if the colorization differs from the ground truth color.
Colorization can seem like an intimidating task because so much information is lost
(two out of three color dimensions) in converting a color image to its underlying
grayscale representation. The semantics of an image scene provides many clues for
sound colorization. Deep learning is a successful tool for colorization because it takes
advantage of scene semantics for image classification and object detection.
Note that if a real color image and an artificial image are compared side-by-side, the
human eye would try to extract the differences between the two images.
Introduction
10|Page
Since the goal of colorization is to produce a plausible colorization, this visualization
would be counter-productive.
● The resultant image should be Precise. Precise means that the output image
should be represented close to its true natural colors.
● It is important to note that the goal of colorization is not to recover the actual
ground truth color, but rather, to produce a plausible colorization that the user
finds useful even if the colorization differs from the ground truth color.
The software that we have developed has the power to build a deep convolutional
neural network and train it on a number of colored images to extract different features
and understand correlations between them and to use the knowledge gained to predict
the colored images.
1.3 Motivation
Introduction
11|Page
allowing better interpretation of CCTV Camera Footages, Astronomical photography,
and Electron Microscopy.
Deep learning has the potential to be a successful tool for colorization because it
already takes advantage of scene semantics for image classification and object
detection.
The ideas and methods developed in the field of colorization of grayscale images have
been carried forward to increasing the performance of object detection algorithms.
Introduction
1.4 Goals
12|Page
Image colorization using convolutional neural networks aims at reducing the efforts
of the designer to manually colorize the grayscale images. Earlier methods used
simple regression and other similar techniques for the same purposes.
Our goal was to create a Web Application that accepts a grayscale image as input and
produce a colorful image as output.
In this thesis, we propose a fully automatic process for colorization that produces
realistic colors. We embrace the underlying uncertainty of the problem by posing it as
a classification task and use the class-rebalancing training time to increase the
diversity of colors in the result. The colorization results obtained are very good and
vibrant. In addition, there is no need for human intervention as the process is fully
automatic hence the user just needs to provide the input on which the colorization is
to be performed.
Chapter-1 Introduction
In this chapter, we have introduced the concept of grayscale image colorization and
the various ways in which it can be done.
13|Page
Introduction
Chapter-4 Methodology
In this chapter, we will discuss the way our system has been designed. We shall put
emphasis on the various techniques that we have incorporated into our project. The
various traditional and non-traditional techniques in grayscale images colorization
that we have implemented will be discussed in detail.
Chapter-6 Conclusions
In this chapter, we will summarise our approach to building the software and discuss
the conclusions.
Review of Literature
14|Page
2
Review of Literature
Very Deep Convolutional Neural Networks for Large-Scale Image
Recognition
K. Simonyan and A. Zisserman. 2015
In this work, they investigated the effect of the convolutional network depth on its
accuracy in the large-scale image recognition setting. Their main contribution is a
thorough evaluation of networks of increasing depth using an architecture with very
small (3x3) convolution filters, which shows that a significant improvement on the
prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
These findings were the basis of their ImageNet Challenge 2014 submission, where
their team secured the first and the second places in the localization and classification
tracks respectively. They also showed that our representations generalize well to other
datasets, where they achieve state-of-the-art results. They have made our two best-
performing ConvNet models publicly available to facilitate further research on the use
of deep visual representations in computer vision.
It was demonstrated that the representation depth is beneficial for the classification
accuracy, and that state-of-the-art performance on the ImageNet challenge dataset can
be achieved using a conventional ConvNet architecture (LeCun et al., 1989;
Krizhevsky et al., 2012) with substantially increased depth. They also showed that
their models generalize well to a wide range of tasks and datasets, matching or
outperforming more complex recognition pipelines built around less deep image
representations. Their results yet again confirm the importance of depth in visual
representations.
Review of Literature
15|Page
They presented a convolutional-neural-network-based system that faithfully colorizes
black and white photographic images without direct human assistance. They explored
various network architectures, objectives, color spaces, and problem formulations.
The final classification-based model they built generates colorized images that are
significantly more aesthetically-pleasing than those created by the baseline
regression-based model, demonstrating the viability of their methodology and
revealing promising avenues for future work.
The main challenge their model faces is inconsistency in colors within individual
objects. Their current system-on the other hand-makes one color prediction on each
pixel, and hopefully the close-by pixels have a similar color assignment. However, it
is not always the case. Even though local regions of small sizes are examined together
given the nature of convolutional layers, there is no explicit enforcement on the object
level. They experimented with applying a Gaussian smoothing on the class scores to
address this issue. This kind of smoothing performed only slightly better.
Unfortunately, it introduced another issue: it significantly increased visual noise along
object edges. Accordingly, they left out the smoothing in our final model. To address
the issue of color inconsistency, they considered incorporating segmentation to
enforce uniformity in color within segments. They can also utilize post-processing
schemes such as total variation minimization and conditional random fields to achieve
a similar end.
Review of Literature
16|Page
In this paper, a novel approach is proposed that uses deep learning techniques to
colorize grayscale images. By utilizing a pre-trained convolutional neural network
designed for image classification, they were able to separate content and style of
different images and recombine them into a single image. They then proposed a
method that can colorize a grayscale image by combining its content with the style of
a color image which is most semantically similar to the grayscale one. As an
application, they used the proposed method to colorize images of ukiyo-e—a genre of
Japanese painting—and obtain interesting results, showing the potential of this
method in the growing field of computer-assisted art.
There are two main approaches for image colorization: one requires the user to assign
colors to some regions and utilizes the information gained from the previous step to
colorize the whole image, and another one that tries to learn the color of each pixel
from a color image having a semantic similarity. In this paper, they use the latter
approach; they extract the information about color from an image and transfer it to
another image.
In this paper, a reliable method for colorizing grayscale images is presented that uses
CNN to extract color information from an image and transfer the same to another
image. They showed examples of plausible-looking generated images. Their results
indicate that the presented method can be used as a tool for colorization in the future
allowing minimum human intervention.
17|Page
Review of Literature
Review of Literature
18|Page
Colorful Image Colorization
Richard Zhang, Phillip Isola, Alexei A. Efros. 2016
Given a grayscale photograph as input, this paper aims to solve the problem of
producing a plausible color version of the photograph. A fully automatic approach
that produces vibrant and realistic colorizations is proposed in this paper. They
embraced the underlying uncertainty of the problem by posing it as a classification
task and use class-rebalancing at training time to increase the diversity of colors in the
result. The system is implemented as a feed-forward pass in a CNN at test time and is
trained on over a million color images. Their algorithm was evaluated using a
“colorization Turing test,” which involves asking human participants to choose
between a generated and original color image. Their method was able to successfully
fool humans on 32% of the trials, which is significantly higher than previous methods.
Given the lightness channel L which is the grayscale image itself, their system
predicts the corresponding a and b color channels of the image in the Lab colorspace.
To solve this problem, they leverage large-scale data. Predicting color has the nice
property that training data is practically free: any color photograph can be used as a
training example, simply by taking the image’s L channel as an input and its ab
channels as the supervisory signal. Others have noted the easy availability of training
data, and previous works have trained convolutional neural networks (CNN) to
predict color on large datasets. However, the results from these previous attempts tend
to look desaturated. One explanation is that use loss functions that encourage
conservative predictions. These losses are inherited from standard regression
problems, where the goal is to minimize Euclidean error between an estimate and the
ground truth. They trained a CNN to map from a grayscale input to a distribution over
quantized color value outputs.
Review of Literature
19|Page
Recent developments in automatic colorization mostly involve images that contain a
common theme or require highly processed data such as semantic maps as input. In
this approach, a fully generalized colorization procedure using a conditional Deep
Convolutional Generative Adversarial Network (DCGAN) is used.
The network is trained over datasets that are publicly available such as CIFAR-10 and
Places365. The results of the generative model and traditional deep neural networks
are compared.
In 2014, Goodfellow et al. [1] proposed a new type of generative model: generative
adversarial networks (GANs). A GAN is composed of two smaller networks called
the generator and discriminator. The generator’s task is to produce indistinguishable
results. The discriminator task is to classify the sample as the generator’s model
distribution or the original data distribution. Both of these subnetworks are trained
simultaneously until the generator is able to consistently produce results that the
discriminator cannot classify.
The architectures of the generator and discriminator both follow a multilayer
perceptron model. Since colorization is a class of image translation problems, the
generator and discriminator are both convolutional neural networks (CNN). The
generator is represented by the mapping G(z;θG), where z is a noise variable
(uniformly distributed) that acts as the input of the generator. Similarly, the
discriminator is represented by the mapping D(x;θD) to produce a scalar between 0
and 1, where x is a color image. The output of the discriminator can be interpreted as
the probability of the input originating from the training data. These constructions of
G and D enable us to determine the optimization problem for training the generator
and discriminator: Gis trained to minimize the probability that the discriminator
makes a correct prediction in generated data, while Dis trained to maximize the
probability of assigning the correct label.
Technical Description
3
20|Page
Technical Description
This chapter gives an extensive background behind the concepts employed in this
project. None of the work in this chapter is original; the ideas from each section have
been cross-referenced to indicate the source of the information presented, whenever
needed. This study was very necessary from the point of view of getting the
background information to help us proceed in designing the proposed applications.
They have proven their capacity in many problems, such as Computer Vision, which
are difficult to address by extracting features in a traditional way. This section aims to
briefly explain all the main technical concepts of the method used, in order to easily
understand the Deep Learning technique.
21|Page
Technical Description
The power of ANN comes from a set of computationally simple nodes that combine
together, that is, the neurons. These neurons are structured in layers, which are
connected between them, similarly to the way biological neurons are connected by
axons. These layers are divided into 3 main types: input, hidden and output. The input
layer corresponds to the data that the network receives. It could be understood as the
input vector from other methods. This layer is connected to a hidden layer, that is, the
ones that are not in the extremes. This is where their name comes, as they are not
“visible” from the outside. Another interesting interpretation would be that, contrary
to other methods, once the network is trained, looking at them does not provide any
insight into what they do. As such, ANN is sometimes referred to as black boxes, as it
is almost impossible to understand their functioning. There can be multiple hidden
layers, each of them connected to the previous one. Every neuron in hidden and
output layers are traditionally connected to all neurons from the previous layer. Each
edge has an associated weight, which indicates how strongly related the two neurons
are, either directly or inversely, similarly to the way biological neurons are connected.
Finally, the last layer is called the output layer, and it delivers the result of the ANN,
with one output per class. This is important, as ANN is mostly used for classification
problems.
22|Page
Technical Description
___________________________________________________________________________
____
FIGURE 3.2: The three layers of an ANN:
Notice how each neuron is connected to all neurons from previous layers.
In order to calculate the activation value of each neuron i there are three elements
required: input value Xi, weights Wi and activation function h(z). The
input value is the outputs from the previous layer that the neuron receives. As already
stated, each neuron is most often connected to all neurons from previous
23|Page
Technical Description
layers. Additionally, a bias value b is usually passed to each layer, not coming from
any neuron. As each edge connecting two neurons has its own weight, the value used
by neuron i from layer l to calculate the activation function, given N inputs, can be
expressed as:
There are many possibilities depending on the problem at hand, such as the
hyperbolic tangent:
All these have in common that they usually have a range between 0 and 1, or -1 and 1.
There is no definite answer regarding which to choose, but there are some properties
that they should fulfill, such as being continuously differentiable.
24|Page
Technical Description
One of the main requirements for training this kind of algorithms is data.
All learning algorithms use data in their training processes, but ANN requires
more than most. As will be explained in the following chapters, this became a
real issue during the project.
Given the data, there are various learning algorithms, from which gradient descent
combined with back propagation can be considered, given it’s widely
spread use, the most successful of all of them. In fact, to a certain degree, it
could be considered that using it is enough for training most ANNs.
This algorithm starts by initializing all weights in the network, which can be done
following various strategies. Some of the most common ones include drawing them
from a probability distribution, or randomly setting them, although low values are
advisable. The process followed afterward consists of 3 phases that are repeated many
times over. In the first one, an input instance is propagated through all the network,
and the output values are calculated.
Then, this output is evaluated, using a loss function, with the correct output, and this
is used to calculate how far off the network is. The final phase consists of updating
each weight in order to minimize the obtained error. This is done by obtaining the
gradient of each neuron, that could be understood as a “step” towards to actual value.
When these three phases are repeated for all input instances we consider this an
epoch. The algorithm can run for as many epochs as specified, or as required to find
the solution.
Briefly, the obtaining of the gradient goes as follows. Once the outputs have been
calculated for an instance, we obtain the error achieved for each output neuron o,
calling it This value allows finding the gradient of each o. For this, we need to
find the derivative of the output of o with respect to its input Xo, that is, the partial
25|Page
Technical Description
the derivative of its activation function For the logistic regression case, this
becomes:
The stochastic approach is the one presented, in which weights are updated after each
instance. This introduces a certain amount of randomness, preventing the algorithm
from getting stuck in local optima. The other approach, instead, applies the weight
update only after having processed a set of instances, using the average error. This
usually makes the algorithm converge faster to local minima, which may actually be a
good result. A compromise between them both can be achieved using the mini-batch
strategy. This one uses small batches with randomly selected samples, which
combines both strategies benefits.
26|Page
Technical Description
Therefore, it can be concluded that the learning task for neural networks consists of
finding the right weights. The algorithm explained here is the one most commonly
used, although many other architectures use some variations over this basic algorithm.
___________________________________________________________________________
____
FIGURE 3.3: Illustration of Gradient Descent Algorithm
27|Page
Technical Description
One of the key aspects in most machine learning methods is the way data is
represented, that is, which features to use. If the features used are badly chosen, the
method will fail regardless of its quality. Even more, this selection affects the
knowledge with which the values, it will not be able to make any sense from a written
28|Page
Technical Description
the report, no matter its quality. Therefore, it is no surprise that there has been a
historical interest in finding the appropriate features. This becomes especially relevant
in the case of Computer Vision problems. The reason is that, when faced with an
image, there are usually way too many features a simple 640X480 RGB image has
almost 1 million pixels, and most of them are irrelevant. Because of this, it is
important to find some way of condensing this information in a more compact way.
There are two main ways of obtaining features, manually choosing them such as
physiological values in medical applications or automatically generating them, an
approach known as representation learning. The latter has proven to be more effective
in problems such as computer vision, as it is very difficult for us humans to know
what makes an image distinguishable. Instead, in many cases machines have been
able to determine which features were relevant for them, resulting in some state of art
results. The most paradigmatic case of representation learning is the auto-encoders.
They perform a 2 step process, first they encode the information they receive into a
compressed representation, and they later try to decode or reconstruct, the original
input from this reduced representation.
We are going to focus on Computer Vision problems from now on, as it will make it
easier to understand some of the next sections. Regarding the features extracted,
people may have some clear ideas about what makes an object, such as a car,
recognizable. Having 4 wheels, doors in the lateral, a glass at the front, it is made of
metal, etc. However, these are high-level features, that are not easy for a machine to
find in an image. To make it even worse, each kind of object in the world has its
particular features, usually with large intra-class variability. Because of this,
developing a general object recognition application would be impossible, as we would
need manually selected features for each of them. Therefore, it has not been a
successful line of research recently. On the contrary, if machines are capable of
determining on their own what is representative of an object for them on their own,
they will have the potential of learning how to represent any object they are trained
with.
Technical Description
29|Page
However, there is an additional difficulty for this kind of problems, that is, the
variability depending on the conditions of each picture. We do not only have to deal
with the intra-class variability, but also the same object variability. The same car can
be pictured in almost endless ways, depending on the pose of the car, light conditions,
image quality, etc. Humans are capable of making rid of this variation by extracting
what we could consider abstract features. These features can include the ones we
mentioned before, such as a number of wheels, but also others we are not aware of,
such as the fact that they are usually on a road, or that their wheels should be in
contact with the floor. In order to develop a successful representation learning
method, it should be able to extract this kind of high-level features, regardless of their
variation. The problem is that this process can be extremely difficult to develop into a
machine, which may lead to thinking that it makes no sense to make the effort of
doing so. This is, precisely, where Deep Learning has proven to be extremely useful.
30|Page
Technical Description
31|Page
those from a decade ago, but also the appearance of graphical cards has greatly
boosted the speed of those methods. Graphics Processing Units, or GPU, were firstly
designed to allow computers to run demanding graphical programs, mainly video
games. In order to do so, they excelled at rapidly performing large amounts of simple
operations, as rendering methods needed. Seeing this, it became apparent that they
could be used for other kinds of applications with similar needs, such as precisely,
DNN. Nowadays, most DNN researchers and users use GPUs to run theirs, as they
can reduce the running time in orders of magnitude. This has popularized the use of
DNN, as it is no longer necessary to use expensive supercomputers in order to train
networks in a reasonable amount of time.
The other factor that helped at DNN training was the new data-oriented culture that
arose in the decade of 2000s. As data mining and machine learning made it possible to
analyze all kinds of data in a fast and reliable way, many entities wanted to make use
of it. In order to do so, they started gathering large amounts of data and converting
them into usable datasets. These cover a great range of disciplines, such as health,
economics, social behavior, etc.
Although some of these datasets were of private use, many of them were released to
the public. It became a self-feeding circle, because as more data was available to
study, the better the analyzing techniques became, which lured more people into using
it. This allowed the creation of large datasets, with millions of instances, that could be
used to train ANN with great numbers of parameters to learn, without overfitting. The
final factor that allowed the popularization of DNN was the appearance of new
methods of training them. Although the two previous facts helped, without advanced
training algorithms we could not have made use of them. It is commonly considered
that it was Hinton who established the basis for modern Deep Learning in 2006
[Hinton, Osindero, and Teh, 2006]. In that publication, he proposed a way of training
deep neural networks in a fast and successful way. This was achieved by treating each
32|Page
Technical Description
layer as a Restricted Boltzmann machine, and training them one at a time, thus pre-
training the network weights. After that, the network was fine-tuned as a whole. This
breakthrough allowed to train multiple layered deep networks that could not have
been trained previously, as they would have ended up overfitting. After that, many
other methods have been developed in order to train deep networks, such as ReLU
layers or dropout regularization.
Technical Description
33|Page
Figure 3.7: Illustration of convolution function on an input image
Technical Description
34|Page
There is a formula which is used in determining the dimension of the activation maps:
(N + 2P - F)/ S + 1; where N = Dimension of image (input) file
● P = Padding
● F = Dimension of filter
● S = Stride
In this section, we introduce the 4 most used layer types, together with a fifth not so
common, but relevant for our project.
• Convolutional Layer: The most iconic layer has already been introduced. It is
inspired in traditional MLPs, but having some major differences. The main ones are
that each layer has a single set of weights for all neurons shared weights and that each
neuron only processes a small part of the input space. It uses all the parameters
introduced in the previous section.
• Pooling Layer: These layers are useful in progressively reducing the size of the
image representation. It works by taking each channel of its receptive field and resizes
it by keeping only the maximum of its values. It is usually used with 2x2 kernels, and
a stride of 2, which halves each side size. This reduces the overall size in 75% by
picking the largest of 2x2 patches. This kind of layers does not have weights that need
training, and it only uses the stride and kernel size parameters. Its utility consists in
reducing the number of weights to learn, which reduces computational time as well as
the probability of overfitting. Unlike the convolution layer, the pooling layer does not
alter the depth of the network, the depth dimension remains unchanged.
• Fully Connected Layer: These layers are basically neural layers connected
Technical Description
35|Page
to all neurons from the previous layer, like the ones from regular ANNs. In this case,
they do not use any of the introduced parameters, using instead the number of
neurons. The output they produce could be understood as a compact feature vector
representing the input image. They are also used as output layers, with one neuron per
output, as usual.
• Locally Connected Layer: The last presented layer is really similar to the
Convolutional Layer, but it does not use the shared weight strategy. This strategy is
justified in normal Convolutional Layers because the relevant features are usually
independent of their position in the image. However, there are cases in which this may
not hold true. If you know, for example, that all your images will have a face-centered
to the same position, it makes sense to look for different features at the eye zone than
at the mouth zone. This is achieved by giving each neuron its own set of weights,
similarly to regular ANN, but they still only process their receptive field. These layers
are commonly used after some Convolutional and Pooling ones due to 2 reasons. The
first one is that, in order for the features from, eyes and mouth, to be different, they
need to be a relatively abstract one. Basic structures, such as edges or corners, are
relevant in both cases. As already explained, this abstraction level is achieved by
applying successive neuron layers, which builds “complex” features by means of
simpler ones. Thus, the utility of using some Convolutional Layers before. The other
reason is that using Locally Connected Layers introduces a large number of weights
into the
network, making it more prone to overfitting. Then, it is better to use them when the
image has already been reduced by previous layers. All in all, this type of layer is not
very commonly used due to it requiring a fixed spatial distribution. However, if this
condition is fulfilled, and you have enough data to prevent overfitting, they are an
excellent choice.
Technical Description
The activation function is a node that is put at the end of or in between Neural
Networks. They help to decide if the neuron would fire or not.
36|Page
FIGURE 3.8: Few examples of activation functions
The activation function is the non-linear transformation that is performed over the
input signal. This transformed output is then sent to the next layer as input.
ReLU function is the most widely used activation function in neural networks today.
The greatest advantage that ReLU has over other activation functions is that all the
neurons are not activated at the same time. From the image for ReLU function above,
we’ll notice that it converts all negative inputs to zero and the neuron does not get
activated. This makes it very computational efficient as few neurons are activated per
time. It does not saturate at the positive region. It has been observed that ReLU
converges six times faster than tanh and sigmoid activation functions.
37|Page
Technical Description
Also, ReLU functions are not zero-centered. This means that for it to get to its optimal
point, it will have to use a zig-zag path which may be longer.
The goal of any supervised learning algorithm is to find a function that best maps a set
of inputs to their correct output. An example would be a classification task, where the
input is an image of an animal, and the correct output is the name of the animal.
Loss Function
Sometimes referred to as the cost function or error function (not to be confused with
the Gauss error function), the loss function is a function that maps values of one or
more variables onto a real number intuitively representing some "cost" associated
with those values. For back propagation, the loss function calculates the difference
between the network output and its expected output, after a case propagates through
the network.
Consider the diagram below -
Technical Description
38|Page
Figure 3.9: Flowchart representing Backpropagation
● Calculate the error – How far is your model output from the actual output.
● Update the parameters – If the error is huge then, update the parameters
(weights and biases). After that again check the error. Repeat the process until the
error becomes minimum.
The Backpropagation algorithm looks for the minimum value of the error function in
weight space using a technique called the delta rule or gradient descent. The weights
that minimize the error function is then considered to be a solution to the learning
problem.
39|Page
Technical Description
1 2
2 4
1 2 3
2 4 6
Notice the difference between the actual output and the desired output:
TABLE 3.3: Example Model predictions with absolute and square error
Input Desired Output Model output (W=3) Absolute Error Square Error
0 0 0 0 0
1 2 3 1 1
2 4 6 2 4
Let’s change the value of ‘W’. Notice the error when ‘W’ = ‘4’
TABLE 3.4: Example Model predictions and errors with two different weights
Model
Desired Model output Absolute Square Square
Input output
Output (W=3) Error Error Error
(W=4)
0 0 0 0 0 0 0
1 2 3 1 1 4 4
2 4 6 2 4 8 16
40|Page
Technical Description
Now if you notice, when we increase the value of ‘W’ the error has increased. So,
obviously, there is no point in increasing the value of ‘W’ further. But, what happens
if I decrease the value of ‘W’? Consider the table below:
TABLE 3.5: Comparison of Example Model predictions and errors with reduced weights
Desired Model output Absolute Model output Square
Input Square Error
Output (W=3) Error (W=2) Error
0 0 0 0 0 0 0
1 2 3 2 4 3 0
2 4 6 2 4 4 0
● Then, we noticed that there is some error. To reduce that error, we propagated
backward and increased the value of ‘W’.
● After that, also we noticed that the error has increased. We came to know that,
we can’t increase the ‘W’ value.
So, we are trying to get the value of weight such that the error becomes minimum.
Basically, we need to figure out whether we need to increase or decrease the weight
value. Once we know that, we keep on updating the weight value in that direction
until error becomes minimum. You might reach a point, where if you further update
the weight, the error will increase. At that time you need to stop, and that is your final
weight value.
41|Page
42|Page
Technical Description
We need to reach the ‘Global Loss Minimum’. This is nothing but Backpropagation.
43|Page
Technical Description
The best way is to check the ground near you and observe where the land tends to
descend. This will give an idea in what direction you should take your first step. If
you follow the descending path, it is very likely you would reach the lake.
To represent this graphically, notice the below graph.
Suppose we want to find out the best parameters (θ1) and (θ2) for our learning
algorithm. Similar to the analogy above, we see we find similar mountains and valleys
when we plot our “cost space”. Cost space is nothing but how our algorithm would
perform when we choose a particular value for a parameter.
So on the y-axis, we have the cost J(θ) against our parameters θ1 and θ2 on x-axis and
z-axis respectively. Here, hills are represented by the red region, which has a high
cost, and valleys are represented by the blue region, which has a low cost.
44|Page
Technical Description
Now there are many types of gradient descent algorithms. They can be classified by
two methods mainly:
In full batch gradient descent algorithms, you use whole data at once to compute the
gradient, whereas in stochastic you take a sample while computing the gradient.
Gradient Descent is a sound technique which works in most of the cases. But there are
many cases where gradient descent does not work properly or fails to work altogether.
There are three main reasons when this would happen:
1. Data challenges
2. Gradient challenges
3. Implementation challenges
Data Challenges
45|Page
Technical Description
● There is also a saddle point problem. This is a point in the data where the
gradient is zero but is not an optimal point. We don’t have a specific way to
avoid this point and is still an active area of research.
Gradient Challenges
● If the execution is not done properly while using gradient descent, it may lead
to problems like vanishing gradient or exploding gradient problems. These
problems occur when the gradient is too small or too large. And because of
this problem, the algorithms do not converge.
Implementation Challenges
● Also, it’s important to keep track of things like floating point considerations
and hardware/ software prerequisites.
The size of the steps taken during gradient descent is called the learning rate. With a
high learning rate, we can cover more ground each step, but we risk overshooting the
lowest point since the slope of the hill is constantly changing. With a very low
learning rate, we can confidently move in the direction of the negative gradient since
we are recalculating it so frequently. A low learning rate is more precise, but
46|Page
Technical Description
calculating the gradient is time-consuming, so it will take us a very long time to get to
the bottom.
● Error rates – You should check the training and testing error after specific
iterations and make sure both of them decreases. If that is not the case, there might be
a problem!
● Learning rate – which you should check when using adaptive techniques. A
decent trick is to multiply your learning rate by 0.3 and adjust the steps accordingly
till you reach the global minimum.
47|Page
Methodology
4
Methodology
4.1 Approach
The approach that we followed to accomplish the task is represented by the flowchart
given below -
48|Page
Methodology
● Feature Extraction
Natural images have the property of being ”‘stationary”’, meaning that the statistics of
one part of the image are the same as any other part. This suggests that the features
that we learn at one part of the image can also be applied to other parts of the image,
and we can use the same features at all locations.
Feature Extraction consists of preparing a model with the following layers -
● Convolution
● Activation
● Flattening
● Full Connection
49|Page
Methodology
The above-shown model in the diagram tries to predict the “a” and “b” color
channels for the grayscale input image in “Lab” color space.
50|Page
Methodology
● Generate Output
After model training is finished, we have a grayscale layer for input and we expect the
model to predict two color layers, the ab in Lab.
51|Page
FIGURE 4.5: Process showing colorization of a grayscale image
52|Page
Result and Discussions
5
Results and Discussions
For the following CNN Model, we observed the following results:
53|Page
Result and Discussions
54|Page
Result and Discussions
55|Page
Result and Discussions
56|Page
Result and Discussions
Results
Input Image Epochs=1000 Epochs>1000
57|Page
Result and Discussions
58|Page
From the above observations, we can infer that -
● The software produced promising results which can be used in gaining some
useful insight.
● The output images produced are vibrant and plausible though they may not
represent the ground truth colors.
● The software performed well on different types of images like human life,
infrastructure, nature views, etc.
● Overfitting is not present, as it is clear from the random inputs.
59|Page
Conclusions
6
Conclusions
The conclusion we can draw from the above results are as follows -
● The precision of images mainly depends upon the architecture of the model
used and training of the model, as we can see by increasing the number of
epochs more promising results were obtained
● Not only increasing the number of epochs is sufficient, but a large dataset is
also required so as to avoid the problem of overfitting.
● Even after adding a significant number of convolution layers the size of the
output is still large to be manipulated efficiently. In this case, pooling or
downsampling proves to be a good candidate to make the output of
convolution layers more compact without significant loss of information.
● Our model can be efficiently used to colorize small images but, if it was to
handle large images it would take a significant amount of time under normal
operating conditions and hence it would not be a reliable way since cost
requirements will be increased. Therefore new techniques such as Deep
Convolutional Generative Adversarial Network (DCGAN) can also be used
for further optimization.
60|Page
61|Page
Future Scope
7
Future Scope
Though our work gives promising results, there are still a lot of limitations that we
plan to address in the future-
● Color Precision - Though the results provided by the model were satisfactory,
there is still a need for more training of the model to enhance precision.
● We further aim to make our model more precise by adding more convolutional
and pooling layers.
62|Page
References
1. Tung Nguyen, Kazuki Mori, Ruck Thawonmas. 2016. Image Colorization
Using a Deep Convolutional Neural Network. In ASIAGRAPH 2016
Proceeding.
2. Arshiya Sayyed, Apeksha Rahangdale, Rutuja Hasurkar, and Kshitija Hande.
2017. Automatic Colorization Of Gray-scale Images using Deep Learning. In
International Journal of Science, Engineering, and Technology Research
(IJSETR).
3. Richard Zhang, Phillip Isola, Alexei A. Efros. Colorful Image Colorization.
2016. Summarization through submodularity and dispersion. In European
Conference on computer vision.
4. Kamyar Nazeri, Eric Ng, and Mehran Ebrahimi. 2018. Image Colorization
using Generative Adversarial Networks. In book: Articulated Motion and
Deformable Objects, pp.85-94.
5. K. Simonyan and A. Zisserman, 2015. Very Deep Convolutional Networks for
Large-Scale Image Recognition. International Conference on Learning
Representations 2015 (arXiv:1409.1556v6).
6. Jeff Hwang and You Zhou, 2016. Image Colorization with Deep
Convolutional Neural Networks.
7. Richard Zang interactive demo, performance comparisons, deep dream
visualization - https://richzhang.github.io/colorization/
8. A pre-trained neural network model for colorizing Black & White images -
https://blog.floydhub.com/colorizing-b-w-photos-with-neural-networks/
9. Deep Convolutional Generative Adversarial Network for satellite imagery -
https://medium.com/the-downlinq/artificial-colorization-of-grayscale-satellite-
imagery-via-gans-part-1-79c8d137e97b
10. https://towardsdatascience.com/develop-a-nlp-model-in-python-deploy-it-
with-fla sk-step-by-step-744f3bdd7776
63|Page