0% found this document useful (0 votes)
24 views63 pages

BTP Report

This document is a project report submitted by three students - Ashish Rana, Deepanshu Chauhan, and Gaurav Sharma - for their B.E. in Computer Engineering. The project aims to develop a method for colorizing grayscale images using deep learning techniques like convolutional neural networks. The report includes sections on artificial neural networks, deep learning, convolutional neural networks, backpropagation, and gradient descent algorithms which are relevant to their colorization method. It also presents the methodology, results and discussions.

Uploaded by

Uttkarsh Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views63 pages

BTP Report

This document is a project report submitted by three students - Ashish Rana, Deepanshu Chauhan, and Gaurav Sharma - for their B.E. in Computer Engineering. The project aims to develop a method for colorizing grayscale images using deep learning techniques like convolutional neural networks. The report includes sections on artificial neural networks, deep learning, convolutional neural networks, backpropagation, and gradient descent algorithms which are relevant to their colorization method. It also presents the methodology, results and discussions.

Uploaded by

Uttkarsh Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 63

B.E.

PROJECT ON
COLORIZATION OF GRAYSCALE IMAGES

Submitted By
Ashish Rana (236/CO/15)
Deepanshu Chauhan (247/CO/15)
Gaurav Sharma (255/CO/15)

Under the Guidance of


Prof. Sangeeta Sabharwal

A Project in partial fulfillment of requirement for the award of


B.E in
Computer Engineering

Department of Computer Engineering,


NETAJI SUBHAS INSTITUTE OF TECHNOLOGY
UNIVERSITY OF DELHI, DELHI
NEW DELHI-110078
2019
DECLARATION

This is to certify that the work which is being hereby presented by us in this project
titled “Colorization of Grayscale Images” in partial fulfillment of the award of the
Bachelor of Engineering submitted at the Department of Computer Engineering ,
Netaji Subhas Institute of Technology Delhi, is a genuine account of our work carried
out during the period from December 2018 to May 2019 under the guidance of Prof.
Sangeeta Sabharwal, Department of Computer Engineering, Netaji Subhas Institute of
Technology, Delhi. The matter embodied in the project report to the best of our
knowledge has not been submitted for the award of any other degree elsewhere.

Dated:

Ashish Rana Deepanshu Chauhan Gaurav Sharma

This is to certify that the above declaration by the students is true to the best of my
knowledge.

Prof. Sangeeta Sabharwal

2|Page
ACKNOWLEDGMENT

Enumerating and enlisting the individual contributions in the making of the project is
a very difficult task. It took many special people, researchers and their contributions
in this field to enable it and support it. Here we would like to acknowledge their
precious co-operation and express our sincere gratitude to them.

We would like to express our deep gratitude towards our mentor Prof. Sangeeta
Sabharwal, Professor, COE Division, Netaji Subhas Institute of Technology, New
Delhi under whose supervision we completed our work. Her invaluable suggestions,
enlightening comments, and constructive criticism always kept our spirits up during
our work.

We are also thankful to our friends who motivated us at each and every step of this
project. Without their interest in our project, we could not have been gone so far. And
most of all, we would like to thanks our wonderful parents who motivated us from
day one of the projects. You were the lights that lead us.

Our experience of working together has been wonderful. We hope that the knowledge,
practical and theoretical, that we have gained through this term B.E. Project will help
us in our future endeavors in the field.

We regret any inadvertent omissions.

Ashish Rana Deepanshu Chauhan Gaurav Sharma

236CO15 247CO15 255CO15

3|Page
CERTIFICATE

This is to certify that the report entitled “Colorization of Grayscale Images” being
submitted by Ashish Rana, Deepanshu Chauhan, Gaurav Sharma to the Department of
Computer Engineering, NSIT, for the award of bachelor’s degree of engineering, is
the record of the bonafide work carried out by them under our supervision and
guidance. The results contained in this report have not been submitted either in part or
in full to any other university or institute for the award of any degree or diploma.

Supervisors

Prof. Sangeeta Sabharwal


Department of COE

4|Page
ABSTRACT

The aim of our dissertation is to develop an efficient method to perform ‘Colorization


of Grayscale Images’ such that the output images are predicted close to their true
colors.

The resultant image should be precise. Precise means that the output image should be
represented close to its true natural colors.

It is important to note that the goal of colorization is not to recover the actual ground
truth color but rather to produce a plausible colorization that the user finds useful even
if the colorization differs from the ground truth color.

We use “Convolutional Neural Networks” and various “deep learning techniques” to


solve this problem.

The software that we have developed has the power to build a deep convolutional
neural network and train it on a number of colored images to extract different features
and understand correlations between them and to use the knowledge gained to predict
the colored images.

5|Page
LIST OF TABLES

Table Caption Page No.

Table 3.1 Example Dataset 40

Table 3.2 Example Dataset with Model prediction 40

Table 3.3 Example Model predictions with absolute and square 40


error

Table 3.4 Example model predictions and errors with two 40


different weights

Table 3.5 Comparison of Example Model predictions and 41


errors with reduced weights

6|Page
LIST OF FIGURES

Figure Caption Page No.

Fig. 3.1 Illustration of a biological neuron and artificial neuron 22

Fig. 3.2 The three layers of an ANN 23

Fig. 3.3 Illustration of Gradient Descent Algorithm 27

Fig. 3.4 A typical feedforward network 28

Fig. 3.5 A typical feedback network 28

Fig. 3.6 Deep Neural Network 31

Fig. 3.7 Illustration of convolution function on an input image 34

Fig. 3.8 Few examples of activation functions 37

Fig. 3.9 Flowchart representing Backpropagation 39

Fig. 3.10 Graph showing Square Error vs. Weight 42

Fig. 3.11 Classic Mountain Example for Gradient Descent 42

Fig. 3.12 Graph showing Cost Function vs. Parameters 43

Fig. 4.1 Flowchart of the approach followed 47

Fig. 4.2 Illustration of Convolution function on an image 49

Fig. 4.3 Diagram depicting Layers of Model used 49

Fig. 4.4 An example illustrating Lab Color Space 50

Fig. 4.5 Process showing colorization of a grayscale image 51

Fig. 5.1 Code for Building the CNN 52

Fig. 5.2 Summary of Model Architecture 53

Fig. 5.3 Home page of Web Application 54

Fig. 5.4 Automatic Demo 54

Fig. 5.5 File input for the Specific Demo 55

Fig. 5.6 Specific Demo 55

7|Page
TABLE OF CONTENTS

S.NO TITLE Page No.

1 Declaration 2

2 Acknowledgment 3

3 Certificate 4

4 Abstract 5

5 List of Tables 6

6 List of Figures 7

7 Chapter-1 Introduction 10

8 1.1 What is Colorization? 10

9 1.2 Problem Statement 11

10 1.3 Motivation 11

11 1.4 Goals 13

12 1.5 Overview of the Thesis 13

13 Chapter-2 Review of Literature 15

14 Chapter-3 Technical Description 21

15 3.1 Artificial Neural Networks 21

16 3.1.1 What are they? 22

17 3.1.2 How do they work? 23

18 3.1.3 How are they trained? 25

19 3.1.4 Types of ANNs 27

20 3.2 Deep Learning 28

21 3.3 Deep Neural Networks 31

22 3.4 Convolutional Neural Networks 33

23 3.4.1 Layer Types 35

24 3.4.2 Activation Function 36

8|Page
25 3.5 Backpropagation 38

26 3.6 Gradient Descent 42

27 3.6.1 Challenges in Executing Gradient Descent 44

28 3.6.2 Learning Rate 45

29 3.6.3 Practical Tips on Applying Gradient Descent 46

30 Chapter-4 Methodology 47

31 4.1 Approach 47

33 Chapter-5 Results and Discussions 52

34 Conclusions 59

35 Future Scope 60

36 References 61

9|Page
Introduction

1
Introduction
1.1 What is Colorization?

Colorization can be defined as a computer-aided process of adding color to a


grayscale image or video. The task of colorizing a grayscale image involves assigning
three dimensional (RGB) pixel values to an image which varies along only one
dimension (luminance or intensity). Since different colors may have the same
luminance value but vary in hue or saturation, the mapping between intensity and
color is not unique, and colorization is ambiguous in nature, requiring some amount
of human interaction or external information.

It is important to note that the goal of colorization is not to recover the actual ground
truth color, but rather, to produce a plausible colorization that the user finds useful
even if the colorization differs from the ground truth color.

Colorization can seem like an intimidating task because so much information is lost
(two out of three color dimensions) in converting a color image to its underlying
grayscale representation. The semantics of an image scene provides many clues for
sound colorization. Deep learning is a successful tool for colorization because it takes
advantage of scene semantics for image classification and object detection.

Note that if a real color image and an artificial image are compared side-by-side, the
human eye would try to extract the differences between the two images.

Introduction

10|Page
Since the goal of colorization is to produce a plausible colorization, this visualization
would be counter-productive.

1.2 Problem Statement

The aim of our dissertation is to develop an efficient method to perform “colorization


of grayscale images” such that the output images are predicted close to their true
colors.

● The resultant image should be Precise. Precise means that the output image
should be represented close to its true natural colors.

● It is important to note that the goal of colorization is not to recover the actual
ground truth color, but rather, to produce a plausible colorization that the user
finds useful even if the colorization differs from the ground truth color.

● We use ‘Convolutional Neural Networks’ and various ‘deep learning


techniques’ to solve this problem

The software that we have developed has the power to build a deep convolutional
neural network and train it on a number of colored images to extract different features
and understand correlations between them and to use the knowledge gained to predict
the colored images.

1.3 Motivation

Colorization of a grayscale image is a difficult problem. Given a certain image, there


is often no “correct” attainable color. Colorization can seem like an intimidating task
because so much information is lost (two out of three color dimensions) in converting
a color image to its underlying grayscale representation.

Introduction

The applications of such a method provide a new way of entertainment by making


colorization of old, black and white photographs and cinema feasible, along with

11|Page
allowing better interpretation of CCTV Camera Footages, Astronomical photography,
and Electron Microscopy.

Deep learning has the potential to be a successful tool for colorization because it
already takes advantage of scene semantics for image classification and object
detection.

The ideas and methods developed in the field of colorization of grayscale images have
been carried forward to increasing the performance of object detection algorithms.

Motivations for Deep Architectures


● Insufficient depth can hurt
❏ With shallow architecture (SVM, NB, KNN, etc.), the required number
of nodes in the graph (i.e. computations, and also a number of
parameters, when we try to learn the function) may grow very large.
❏ Many efficiently represented functions in deep architecture cannot be
represented efficiently with a shallow one.
● The brain has a deep architecture
❏ The visual cortex shows a sequence of areas each of which contains a
representation of the input, and signals flow from one to the next.
❏ Note that representations in the brain are in between dense distributed
and purely local: they are sparse: about 1% of neurons are active
simultaneously in the brain.
● Cognitive processes seem deep
❏ The organization of human ideas and concepts is hierarchical.
❏ It is the very human nature to first learn simpler concepts and then
compose them to represent more abstract ones.

Introduction

1.4 Goals

12|Page
Image colorization using convolutional neural networks aims at reducing the efforts
of the designer to manually colorize the grayscale images. Earlier methods used
simple regression and other similar techniques for the same purposes.

The second important challenge we address in our dissertation is of “Precision”.

Our goal was to create a Web Application that accepts a grayscale image as input and
produce a colorful image as output.

The task of the colorization of grayscale images is an extremely difficult problem. In


fact, there have been many approaches throughout the history that have not given
promising results.

In this thesis, we propose a fully automatic process for colorization that produces
realistic colors. We embrace the underlying uncertainty of the problem by posing it as
a classification task and use the class-rebalancing training time to increase the

diversity of colors in the result. The colorization results obtained are very good and
vibrant. In addition, there is no need for human intervention as the process is fully
automatic hence the user just needs to provide the input on which the colorization is
to be performed.

1.5 Overview of the Thesis

Chapter-1 Introduction

In this chapter, we have introduced the concept of grayscale image colorization and
the various ways in which it can be done.

13|Page
Introduction

Chapter-2 Review of Literature


In this chapter, we will discuss the various techniques that can be implemented for the
process of grayscale image colorization. We shall put emphasis on the techniques will
be used in this project while discussing other techniques that have been used in the
past.

Chapter-3 Technical Description


In this chapter, we will discuss the various technologies that have been used in our
project. We shall elaborate on the various techniques that we have used to implement
this project.

Chapter-4 Methodology

In this chapter, we will discuss the way our system has been designed. We shall put
emphasis on the various techniques that we have incorporated into our project. The
various traditional and non-traditional techniques in grayscale images colorization
that we have implemented will be discussed in detail.

Chapter-5 Results and Discussions


In this chapter, we will discuss the output of our models – both traditional and non-
traditional. The behavior of our models shall be elaborated in this chapter.

Chapter-6 Conclusions
In this chapter, we will summarise our approach to building the software and discuss
the conclusions.

Chapter-7 Future Scope


In this chapter, we will discuss the further scope, where we can come over certain
deficiencies in our approach through suggested workarounds.

Review of Literature

14|Page
2
Review of Literature
Very Deep Convolutional Neural Networks for Large-Scale Image
Recognition
K. Simonyan and A. Zisserman. 2015

In this work, they investigated the effect of the convolutional network depth on its
accuracy in the large-scale image recognition setting. Their main contribution is a
thorough evaluation of networks of increasing depth using an architecture with very
small (3x3) convolution filters, which shows that a significant improvement on the
prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
These findings were the basis of their ImageNet Challenge 2014 submission, where
their team secured the first and the second places in the localization and classification
tracks respectively. They also showed that our representations generalize well to other
datasets, where they achieve state-of-the-art results. They have made our two best-
performing ConvNet models publicly available to facilitate further research on the use
of deep visual representations in computer vision.

It was demonstrated that the representation depth is beneficial for the classification
accuracy, and that state-of-the-art performance on the ImageNet challenge dataset can
be achieved using a conventional ConvNet architecture (LeCun et al., 1989;
Krizhevsky et al., 2012) with substantially increased depth. They also showed that
their models generalize well to a wide range of tasks and datasets, matching or
outperforming more complex recognition pipelines built around less deep image
representations. Their results yet again confirm the importance of depth in visual
representations.

Review of Literature

Image Colorization with Deep Convolutional Neural Networks


Jeff Hwang,You Zhou. 2016

15|Page
They presented a convolutional-neural-network-based system that faithfully colorizes
black and white photographic images without direct human assistance. They explored
various network architectures, objectives, color spaces, and problem formulations.
The final classification-based model they built generates colorized images that are
significantly more aesthetically-pleasing than those created by the baseline
regression-based model, demonstrating the viability of their methodology and
revealing promising avenues for future work.

Here, they took a statistical-learning-driven approach to solve this problem. They


designed and built a convolutional neural network (CNN) that accepts a black-and-
white image as an input and generates a colorized version of the image as its output.
The system generates its output based solely on images it has “learned from” in the
past, with no further human intervention.

The main challenge their model faces is inconsistency in colors within individual
objects. Their current system-on the other hand-makes one color prediction on each
pixel, and hopefully the close-by pixels have a similar color assignment. However, it
is not always the case. Even though local regions of small sizes are examined together
given the nature of convolutional layers, there is no explicit enforcement on the object
level. They experimented with applying a Gaussian smoothing on the class scores to
address this issue. This kind of smoothing performed only slightly better.
Unfortunately, it introduced another issue: it significantly increased visual noise along
object edges. Accordingly, they left out the smoothing in our final model. To address
the issue of color inconsistency, they considered incorporating segmentation to
enforce uniformity in color within segments. They can also utilize post-processing
schemes such as total variation minimization and conditional random fields to achieve
a similar end.

Review of Literature

Image Colorization Using a Deep Convolutional Neural Network


Tung Nguyen, Kazuki Mori, Ruck Thawonmas. 2016

16|Page
In this paper, a novel approach is proposed that uses deep learning techniques to
colorize grayscale images. By utilizing a pre-trained convolutional neural network
designed for image classification, they were able to separate content and style of
different images and recombine them into a single image. They then proposed a
method that can colorize a grayscale image by combining its content with the style of
a color image which is most semantically similar to the grayscale one. As an
application, they used the proposed method to colorize images of ukiyo-e—a genre of
Japanese painting—and obtain interesting results, showing the potential of this
method in the growing field of computer-assisted art.

There are two main approaches for image colorization: one requires the user to assign
colors to some regions and utilizes the information gained from the previous step to
colorize the whole image, and another one that tries to learn the color of each pixel
from a color image having a semantic similarity. In this paper, they use the latter
approach; they extract the information about color from an image and transfer it to
another image.
In this paper, a reliable method for colorizing grayscale images is presented that uses
CNN to extract color information from an image and transfer the same to another
image. They showed examples of plausible-looking generated images. Their results
indicate that the presented method can be used as a tool for colorization in the future
allowing minimum human intervention.

17|Page
Review of Literature

Automatic Colorization Of Gray-scale Images using Deep Learning


Arshiya Sayyed, Apeksha Rahangdale, Rutuja Hasurkar, and Kshitija Hande.
2017

Automatic colorization of gray-scale images using deep learning is a technique to


colorize grayscale images without human intervention. Conventional techniques
require human intervention, which is much more time-consuming. The project deals
with deep learning techniques to automatically colorize grayscale images. The
proposed technique uses (DNNs) Deep Convolutional Neural Networks and has a
number of advantages. The technique can reduce manual work, speed up the process
of colorization and improve accuracy. Automatic colorization techniques using
ConvNet finds applications in various domains such as colorization of old movies,
better interpretation of CCTV footages, astronomy, electron microscopy, and
archaeology. The conventional approach to achieve colorization included a
regression-based model, graph cut algorithm, etc. The proposed model is a
classification based technique but uses a baseline regression model. Designed system
consists of two phases- training and testing. The system is trained by Feature
extraction and pixel-mapping from the input colored image. In the testing phase, the
system is provided with grayscale input images to check the accuracy of the
colorization of these images. This technique can prove to be useful in eliminating the
need of expensive image transferring pieces of equipment for astronomical images
and to speed up the process of conversion of legacy images to modern colored
images, thus reducing the manual effort needed by utilizing deep learning techniques.

The proposed technique focuses on reducing human intervention in the colorization


process with the help of deep learning techniques. The grayscale image will pass
through different layers in the neural network and will be colored automatically. This
paper is divided into four modules which include the process of encoding and
decoding the input images, Intermediate resulting images from the specific color-
space from the training image dataset, eventually providing the colored images.

Review of Literature

18|Page
Colorful Image Colorization
Richard Zhang, Phillip Isola, Alexei A. Efros. 2016

Given a grayscale photograph as input, this paper aims to solve the problem of
producing a plausible color version of the photograph. A fully automatic approach
that produces vibrant and realistic colorizations is proposed in this paper. They
embraced the underlying uncertainty of the problem by posing it as a classification
task and use class-rebalancing at training time to increase the diversity of colors in the
result. The system is implemented as a feed-forward pass in a CNN at test time and is
trained on over a million color images. Their algorithm was evaluated using a
“colorization Turing test,” which involves asking human participants to choose
between a generated and original color image. Their method was able to successfully
fool humans on 32% of the trials, which is significantly higher than previous methods.

Given the lightness channel L which is the grayscale image itself, their system
predicts the corresponding a and b color channels of the image in the Lab colorspace.
To solve this problem, they leverage large-scale data. Predicting color has the nice
property that training data is practically free: any color photograph can be used as a
training example, simply by taking the image’s L channel as an input and its ab
channels as the supervisory signal. Others have noted the easy availability of training
data, and previous works have trained convolutional neural networks (CNN) to
predict color on large datasets. However, the results from these previous attempts tend
to look desaturated. One explanation is that use loss functions that encourage
conservative predictions. These losses are inherited from standard regression
problems, where the goal is to minimize Euclidean error between an estimate and the
ground truth. They trained a CNN to map from a grayscale input to a distribution over
quantized color value outputs.

Review of Literature

Image Colorization using Generative Adversarial Networks


Kamyar Nazeri, Eric Ng, and Mehran Ebrahimi. 2018

19|Page
Recent developments in automatic colorization mostly involve images that contain a
common theme or require highly processed data such as semantic maps as input. In
this approach, a fully generalized colorization procedure using a conditional Deep
Convolutional Generative Adversarial Network (DCGAN) is used.
The network is trained over datasets that are publicly available such as CIFAR-10 and
Places365. The results of the generative model and traditional deep neural networks
are compared.

In 2014, Goodfellow et al. [1] proposed a new type of generative model: generative
adversarial networks (GANs). A GAN is composed of two smaller networks called
the generator and discriminator. The generator’s task is to produce indistinguishable
results. The discriminator task is to classify the sample as the generator’s model
distribution or the original data distribution. Both of these subnetworks are trained
simultaneously until the generator is able to consistently produce results that the
discriminator cannot classify.
The architectures of the generator and discriminator both follow a multilayer
perceptron model. Since colorization is a class of image translation problems, the
generator and discriminator are both convolutional neural networks (CNN). The
generator is represented by the mapping G(z;θG), where z is a noise variable
(uniformly distributed) that acts as the input of the generator. Similarly, the
discriminator is represented by the mapping D(x;θD) to produce a scalar between 0
and 1, where x is a color image. The output of the discriminator can be interpreted as
the probability of the input originating from the training data. These constructions of
G and D enable us to determine the optimization problem for training the generator
and discriminator: Gis trained to minimize the probability that the discriminator
makes a correct prediction in generated data, while Dis trained to maximize the
probability of assigning the correct label.

Technical Description

3
20|Page
Technical Description
This chapter gives an extensive background behind the concepts employed in this
project. None of the work in this chapter is original; the ideas from each section have
been cross-referenced to indicate the source of the information presented, whenever
needed. This study was very necessary from the point of view of getting the
background information to help us proceed in designing the proposed applications.

3.1 Artificial Neural Networks


Inspired by their biological counterparts, Artificial Neural Networks are sets
of
interconnected computational nodes, usually with square or cubic shapes.
They are a computational approach for problems in which the solution of the problem,
or finding a proper representation, is difficult for traditional computer programs. The
way they process information could be understood as receiving external inputs that
can elicit, or not, a response in some of the nodes of the system neurons. The whole
set of responses determines the final output of the network.

They have proven their capacity in many problems, such as Computer Vision, which
are difficult to address by extracting features in a traditional way. This section aims to
briefly explain all the main technical concepts of the method used, in order to easily
understand the Deep Learning technique.

21|Page
Technical Description

3.1.1 What are they?

The power of ANN comes from a set of computationally simple nodes that combine
together, that is, the neurons. These neurons are structured in layers, which are
connected between them, similarly to the way biological neurons are connected by
axons. These layers are divided into 3 main types: input, hidden and output. The input
layer corresponds to the data that the network receives. It could be understood as the
input vector from other methods. This layer is connected to a hidden layer, that is, the
ones that are not in the extremes. This is where their name comes, as they are not
“visible” from the outside. Another interesting interpretation would be that, contrary
to other methods, once the network is trained, looking at them does not provide any
insight into what they do. As such, ANN is sometimes referred to as black boxes, as it
is almost impossible to understand their functioning. There can be multiple hidden
layers, each of them connected to the previous one. Every neuron in hidden and
output layers are traditionally connected to all neurons from the previous layer. Each
edge has an associated weight, which indicates how strongly related the two neurons
are, either directly or inversely, similarly to the way biological neurons are connected.
Finally, the last layer is called the output layer, and it delivers the result of the ANN,
with one output per class. This is important, as ANN is mostly used for classification
problems.

FIGURE 3.1 Illustration of a biological neuron and artificial neuron

22|Page
Technical Description

___________________________________________________________________________
____
FIGURE 3.2: The three layers of an ANN:
Notice how each neuron is connected to all neurons from previous layers.

3.1.2 How do they work?

ANN is used to approximate an unknown mathematical function, which can be either


linear or nonlinear. They are capable, theoretically, to approximate any function. Its
basic unit is the neuron, that computes a “simple” activation function given its inputs,
and propagates its value to the following layer. Therefore, the whole function is
composed by gathering activation values from all neurons. Having hundreds of
neurons which is not too many, the number of edges can reach orders of magnitude
higher, and thus the difficulty in interpreting them.

In order to calculate the activation value of each neuron i there are three elements
required: input value Xi, weights Wi and activation function h(z). The
input value is the outputs from the previous layer that the neuron receives. As already
stated, each neuron is most often connected to all neurons from previous

23|Page
Technical Description

layers. Additionally, a bias value b is usually passed to each layer, not coming from
any neuron. As each edge connecting two neurons has its own weight, the value used
by neuron i from layer l to calculate the activation function, given N inputs, can be
expressed as:

representing a linear addition of all of them. The activation function is a nonlinear


function representing the degree of activation of the neuron, and it can be defined as:

There are many possibilities depending on the problem at hand, such as the
hyperbolic tangent:

or the logistic function:

All these have in common that they usually have a range between 0 and 1, or -1 and 1.
There is no definite answer regarding which to choose, but there are some properties
that they should fulfill, such as being continuously differentiable.

24|Page
Technical Description

3.1.3 How are they trained?

One of the main requirements for training this kind of algorithms is data.
All learning algorithms use data in their training processes, but ANN requires
more than most. As will be explained in the following chapters, this became a
real issue during the project.

Given the data, there are various learning algorithms, from which gradient descent
combined with back propagation can be considered, given it’s widely
spread use, the most successful of all of them. In fact, to a certain degree, it
could be considered that using it is enough for training most ANNs.

This algorithm starts by initializing all weights in the network, which can be done
following various strategies. Some of the most common ones include drawing them
from a probability distribution, or randomly setting them, although low values are
advisable. The process followed afterward consists of 3 phases that are repeated many
times over. In the first one, an input instance is propagated through all the network,
and the output values are calculated.
Then, this output is evaluated, using a loss function, with the correct output, and this
is used to calculate how far off the network is. The final phase consists of updating
each weight in order to minimize the obtained error. This is done by obtaining the
gradient of each neuron, that could be understood as a “step” towards to actual value.
When these three phases are repeated for all input instances we consider this an
epoch. The algorithm can run for as many epochs as specified, or as required to find
the solution.

Briefly, the obtaining of the gradient goes as follows. Once the outputs have been
calculated for an instance, we obtain the error achieved for each output neuron o,
calling it This value allows finding the gradient of each o. For this, we need to
find the derivative of the output of o with respect to its input Xo, that is, the partial

25|Page
Technical Description

the derivative of its activation function For the logistic regression case, this
becomes:

We provide this detailed information to justify the use of continuously differentiable


activation function. Otherwise, this partial derivative could not be
obtained and, therefore, the network could not be trained. Continuing with the
gradient, it is obtained by combining this partial derivative with the error obtained.
This gradient is then used to adjust the weights of all output neurons so that they get
closer to its optimal value. Intuitively, given a hill you are descending, the gradient
would be a step in the direction with the steepest descent that brings you closer to the
floor. After adjusting the weights for the output layers, the same process needs to be
done for the remaining layers, except for the input one. In order to do so, each layer
needs the of the next layer to be calculated. This is the reason why it is called
back propagation, because it starts in the output layer, and from there it goes
backward. In the end, all the edge weights have been updated, and thus a new instance
can be processed.
There are two main ways of applying backpropagation that need to be mentioned,
stochastic and batch.

The stochastic approach is the one presented, in which weights are updated after each
instance. This introduces a certain amount of randomness, preventing the algorithm
from getting stuck in local optima. The other approach, instead, applies the weight
update only after having processed a set of instances, using the average error. This
usually makes the algorithm converge faster to local minima, which may actually be a
good result. A compromise between them both can be achieved using the mini-batch
strategy. This one uses small batches with randomly selected samples, which
combines both strategies benefits.

26|Page
Technical Description

Therefore, it can be concluded that the learning task for neural networks consists of
finding the right weights. The algorithm explained here is the one most commonly
used, although many other architectures use some variations over this basic algorithm.

___________________________________________________________________________
____
FIGURE 3.3: Illustration of Gradient Descent Algorithm

3.1.4 Types of Artificial Neural Networks


There are two types of Artificial Neural Networks- Feedforward and Feedback.

3.1.4.1 Feedforward ANN


The information flow is unidirectional. A unit sends information to another unit from
which it does not receive any information. There are no feedback loops. They are used
in pattern generation/recognition/classification. They have fixed inputs and outputs.

27|Page
Technical Description

Figure 3.4: A typical feedforward network

3.1.4.2 Feedback ANN


Here, feedback loops are allowed. They are used in content addressable memories.

Figure 3.5: A typical feedback network

3.2 Deep Learning

One of the key aspects in most machine learning methods is the way data is
represented, that is, which features to use. If the features used are badly chosen, the
method will fail regardless of its quality. Even more, this selection affects the
knowledge with which the values, it will not be able to make any sense from a written

28|Page
Technical Description

the report, no matter its quality. Therefore, it is no surprise that there has been a
historical interest in finding the appropriate features. This becomes especially relevant
in the case of Computer Vision problems. The reason is that, when faced with an
image, there are usually way too many features a simple 640X480 RGB image has
almost 1 million pixels, and most of them are irrelevant. Because of this, it is
important to find some way of condensing this information in a more compact way.
There are two main ways of obtaining features, manually choosing them such as
physiological values in medical applications or automatically generating them, an
approach known as representation learning. The latter has proven to be more effective
in problems such as computer vision, as it is very difficult for us humans to know
what makes an image distinguishable. Instead, in many cases machines have been
able to determine which features were relevant for them, resulting in some state of art
results. The most paradigmatic case of representation learning is the auto-encoders.
They perform a 2 step process, first they encode the information they receive into a
compressed representation, and they later try to decode or reconstruct, the original
input from this reduced representation.

We are going to focus on Computer Vision problems from now on, as it will make it
easier to understand some of the next sections. Regarding the features extracted,
people may have some clear ideas about what makes an object, such as a car,
recognizable. Having 4 wheels, doors in the lateral, a glass at the front, it is made of
metal, etc. However, these are high-level features, that are not easy for a machine to
find in an image. To make it even worse, each kind of object in the world has its
particular features, usually with large intra-class variability. Because of this,
developing a general object recognition application would be impossible, as we would
need manually selected features for each of them. Therefore, it has not been a
successful line of research recently. On the contrary, if machines are capable of
determining on their own what is representative of an object for them on their own,
they will have the potential of learning how to represent any object they are trained
with.
Technical Description

29|Page
However, there is an additional difficulty for this kind of problems, that is, the
variability depending on the conditions of each picture. We do not only have to deal
with the intra-class variability, but also the same object variability. The same car can
be pictured in almost endless ways, depending on the pose of the car, light conditions,
image quality, etc. Humans are capable of making rid of this variation by extracting
what we could consider abstract features. These features can include the ones we
mentioned before, such as a number of wheels, but also others we are not aware of,
such as the fact that they are usually on a road, or that their wheels should be in
contact with the floor. In order to develop a successful representation learning
method, it should be able to extract this kind of high-level features, regardless of their
variation. The problem is that this process can be extremely difficult to develop into a
machine, which may lead to thinking that it makes no sense to make the effort of
doing so. This is, precisely, where Deep Learning has proven to be extremely useful.

The main characteristic of Deep Learning is that it is capable of making abstractions,


by building complex concepts from simpler ones. Given an image, it is capable of
learning concepts such as cars, cats or humans by combining sets of basic features,
such as corners or edges. This process is done through successive “layers” that
increase the complexity of the learned concepts. The idea of depth in Deep Learning
comes precisely from these abstraction levels. Each layer gets as input the output of
the previous one and it uses to learn higher-level features, as seen in Figure below.
Interestingly, in some cases is the network uses these features to produce an output,
and sometimes it simply generates them for other methods to use.

30|Page
Technical Description

FIGURE 3.6: Deep Neural Network:


The higher the layer the more abstract the concepts features represent, until they are capable
of learning how to recognize complex concepts such as cars or people.

3.3 Deep Neural Networks


Even though there are various approaches to Deep Learning, such as Deep Kernel
Methods, the one that has been most used, by far, uses neural networks, and it is
known as Deep Neural Networks (DNN). They can be roughly thought of as an ANN
with many hidden layers. One of the most commonly used ANN approaches for Deep
Neural Networks is the MLP. As already explained previously in this chapter, neural
networks are composed of layers which consist of interconnected neurons. In
principle, there is no limit regarding the number of layers and the number of neurons
per layer, but, in practice, it has been almost impossible to successfully train more
than a handful of hidden layers. As already explained, the number of weights in a
network can easily reach the thousands, or even millions in the larger ones, meaning a
large number of parameters to learn. This requires both extremely large computational
times and data to feed to the training stages. There have been attempts at doing so
since decades ago, but it has not been until the late 2000s that the means for
effectively doing so have been available. There are various factors that allowed to
train this kind of networks. The first of all is the increase in computational power that
computers have experienced. Not only today’s computers are much powerful than
Technical Description

31|Page
those from a decade ago, but also the appearance of graphical cards has greatly
boosted the speed of those methods. Graphics Processing Units, or GPU, were firstly
designed to allow computers to run demanding graphical programs, mainly video
games. In order to do so, they excelled at rapidly performing large amounts of simple
operations, as rendering methods needed. Seeing this, it became apparent that they
could be used for other kinds of applications with similar needs, such as precisely,
DNN. Nowadays, most DNN researchers and users use GPUs to run theirs, as they
can reduce the running time in orders of magnitude. This has popularized the use of
DNN, as it is no longer necessary to use expensive supercomputers in order to train
networks in a reasonable amount of time.

The other factor that helped at DNN training was the new data-oriented culture that
arose in the decade of 2000s. As data mining and machine learning made it possible to
analyze all kinds of data in a fast and reliable way, many entities wanted to make use
of it. In order to do so, they started gathering large amounts of data and converting
them into usable datasets. These cover a great range of disciplines, such as health,
economics, social behavior, etc.

Although some of these datasets were of private use, many of them were released to
the public. It became a self-feeding circle, because as more data was available to
study, the better the analyzing techniques became, which lured more people into using
it. This allowed the creation of large datasets, with millions of instances, that could be
used to train ANN with great numbers of parameters to learn, without overfitting. The
final factor that allowed the popularization of DNN was the appearance of new
methods of training them. Although the two previous facts helped, without advanced
training algorithms we could not have made use of them. It is commonly considered
that it was Hinton who established the basis for modern Deep Learning in 2006
[Hinton, Osindero, and Teh, 2006]. In that publication, he proposed a way of training
deep neural networks in a fast and successful way. This was achieved by treating each

32|Page
Technical Description

layer as a Restricted Boltzmann machine, and training them one at a time, thus pre-
training the network weights. After that, the network was fine-tuned as a whole. This
breakthrough allowed to train multiple layered deep networks that could not have
been trained previously, as they would have ended up overfitting. After that, many
other methods have been developed in order to train deep networks, such as ReLU
layers or dropout regularization.

3.4 Convolutional Neural Networks


Convolutional neural networks (CNN) are a kind of feeding forward neural network
where every single node can be used to apply filters through overlapping regions. The
processing occurs in an alternative fashion between convolution and sub-sampling
layers followed by one or more fully connected layers such as standard multilayer
perceptron (MLP). This architecture has various benefits compared to the standard
Neural Networks. The NNs have been successfully applied to features that have been
extracted from other systems, which means that the performance of NNs depends on
matching relevant features that can be obtained. Another way to use NNs is to apply
them directly to the raw pixel of the image. However, if the images have high
dimensions, more parameters are needed because the hidden layer would be fully
connected. To tackle this problem CNN could be applied. CNN depends on sharing
the weights, which reduces the numbers of parameters.
The convolution layers apply a local filter to the input image, which leads to a better
classification, there is a correlation in the neighborhood pixels of the same image. In
other words, the pixels of the input images can have some correlations with each
other. For instance, the nose is always between the eyes and the mouth in face images.
When we apply the filter to a subset of the image, we will extract some local features.
By combining them subsequently, we will get the same format as the original image
but with a less dimensional image. These kinds of formats are not found in the fully
connected layers.

Technical Description

33|Page
Figure 3.7: Illustration of convolution function on an input image

Technical Description

34|Page
There is a formula which is used in determining the dimension of the activation maps:
(N + 2P - F)/ S + 1; where N = Dimension of image (input) file

● P = Padding

● F = Dimension of filter

● S = Stride

3.4.1 Layer types

In this section, we introduce the 4 most used layer types, together with a fifth not so
common, but relevant for our project.

• Convolutional Layer: The most iconic layer has already been introduced. It is
inspired in traditional MLPs, but having some major differences. The main ones are
that each layer has a single set of weights for all neurons shared weights and that each
neuron only processes a small part of the input space. It uses all the parameters
introduced in the previous section.

• Pooling Layer: These layers are useful in progressively reducing the size of the
image representation. It works by taking each channel of its receptive field and resizes
it by keeping only the maximum of its values. It is usually used with 2x2 kernels, and
a stride of 2, which halves each side size. This reduces the overall size in 75% by
picking the largest of 2x2 patches. This kind of layers does not have weights that need
training, and it only uses the stride and kernel size parameters. Its utility consists in
reducing the number of weights to learn, which reduces computational time as well as
the probability of overfitting. Unlike the convolution layer, the pooling layer does not
alter the depth of the network, the depth dimension remains unchanged.

• Fully Connected Layer: These layers are basically neural layers connected

Technical Description

35|Page
to all neurons from the previous layer, like the ones from regular ANNs. In this case,
they do not use any of the introduced parameters, using instead the number of
neurons. The output they produce could be understood as a compact feature vector
representing the input image. They are also used as output layers, with one neuron per
output, as usual.

• Locally Connected Layer: The last presented layer is really similar to the
Convolutional Layer, but it does not use the shared weight strategy. This strategy is
justified in normal Convolutional Layers because the relevant features are usually
independent of their position in the image. However, there are cases in which this may
not hold true. If you know, for example, that all your images will have a face-centered
to the same position, it makes sense to look for different features at the eye zone than
at the mouth zone. This is achieved by giving each neuron its own set of weights,
similarly to regular ANN, but they still only process their receptive field. These layers
are commonly used after some Convolutional and Pooling ones due to 2 reasons. The
first one is that, in order for the features from, eyes and mouth, to be different, they
need to be a relatively abstract one. Basic structures, such as edges or corners, are
relevant in both cases. As already explained, this abstraction level is achieved by
applying successive neuron layers, which builds “complex” features by means of
simpler ones. Thus, the utility of using some Convolutional Layers before. The other
reason is that using Locally Connected Layers introduces a large number of weights
into the
network, making it more prone to overfitting. Then, it is better to use them when the
image has already been reduced by previous layers. All in all, this type of layer is not
very commonly used due to it requiring a fixed spatial distribution. However, if this
condition is fulfilled, and you have enough data to prevent overfitting, they are an
excellent choice.

3.4.2 Activation Function

Technical Description

The activation function is a node that is put at the end of or in between Neural
Networks. They help to decide if the neuron would fire or not.

36|Page
FIGURE 3.8: Few examples of activation functions

The activation function is the non-linear transformation that is performed over the
input signal. This transformed output is then sent to the next layer as input.

ReLU function is the most widely used activation function in neural networks today.
The greatest advantage that ReLU has over other activation functions is that all the
neurons are not activated at the same time. From the image for ReLU function above,
we’ll notice that it converts all negative inputs to zero and the neuron does not get
activated. This makes it very computational efficient as few neurons are activated per
time. It does not saturate at the positive region. It has been observed that ReLU
converges six times faster than tanh and sigmoid activation functions.

Some disadvantage ReLU presents is that it is saturated in the negative region,


meaning that the gradient at that region is zero. With the gradient equal to zero,
during backpropagation all the weights will not be updated, to fix this, we use Leaky
ReLU.

37|Page
Technical Description

Also, ReLU functions are not zero-centered. This means that for it to get to its optimal
point, it will have to use a zig-zag path which may be longer.

3.5 Back Propagation

Back Propagation is a method used in artificial neural networks to calculate


a gradient that is needed in the calculation of the weights to be used in the network. It
is commonly used to train deep neural networks, a term referring to neural networks
with more than one hidden layer.

Backpropagation is a special case of a general technique called automatic


differentiation. In the context of learning, backpropagation is commonly used by
the gradient descent optimization algorithm to adjust the weight of neurons by
calculating the gradient of the loss function. This technique is also sometimes
called backward propagation of errors because the error is calculated at the output
and distributed back through the network layers.

The goal of any supervised learning algorithm is to find a function that best maps a set
of inputs to their correct output. An example would be a classification task, where the
input is an image of an animal, and the correct output is the name of the animal.

The motivation for backpropagation is to train a multi-layered neural network such


that it can learn the appropriate internal representations to allow it to learn any
arbitrary mapping of input to output.

Loss Function

Sometimes referred to as the cost function or error function (not to be confused with
the Gauss error function), the loss function is a function that maps values of one or
more variables onto a real number intuitively representing some "cost" associated
with those values. For back propagation, the loss function calculates the difference
between the network output and its expected output, after a case propagates through
the network.
Consider the diagram below -
Technical Description

38|Page
Figure 3.9: Flowchart representing Backpropagation

Summary of steps followed:-

● Calculate the error – How far is your model output from the actual output.

● Error minimum? – Check whether the error is minimized or not.

● Update the parameters – If the error is huge then, update the parameters
(weights and biases). After that again check the error. Repeat the process until the
error becomes minimum.

● Model is ready to make a prediction – Once the error becomes minimum,


you can feed some inputs to your model and it will produce the output.

The Backpropagation algorithm looks for the minimum value of the error function in
weight space using a technique called the delta rule or gradient descent. The weights
that minimize the error function is then considered to be a solution to the learning
problem.

Let’s understand how it works with an example:

You have a dataset, which has labels.

Consider the below table:

39|Page
Technical Description

TABLE 3.1: Example Dataset


Input Desired Output
0 0

1 2

2 4

Now the output of your model when ‘W” value is 3:


TABLE 3.2: Example Dataset with Model predictions

Input Desired Output Model output (W=3)


0
0 0

1 2 3

2 4 6

Notice the difference between the actual output and the desired output:
TABLE 3.3: Example Model predictions with absolute and square error

Input Desired Output Model output (W=3) Absolute Error Square Error

0 0 0 0 0

1 2 3 1 1

2 4 6 2 4

Let’s change the value of ‘W’. Notice the error when ‘W’ = ‘4’
TABLE 3.4: Example Model predictions and errors with two different weights
Model
Desired Model output Absolute Square Square
Input output
Output (W=3) Error Error Error
(W=4)
0 0 0 0 0 0 0

1 2 3 1 1 4 4

2 4 6 2 4 8 16

40|Page
Technical Description

Now if you notice, when we increase the value of ‘W’ the error has increased. So,
obviously, there is no point in increasing the value of ‘W’ further. But, what happens
if I decrease the value of ‘W’? Consider the table below:

TABLE 3.5: Comparison of Example Model predictions and errors with reduced weights
Desired Model output Absolute Model output Square
Input Square Error
Output (W=3) Error (W=2) Error

0 0 0 0 0 0 0

1 2 3 2 4 3 0

2 4 6 2 4 4 0

Now, what we did here:

● We first initialized some random value to ‘W’ and propagated forward.

● Then, we noticed that there is some error. To reduce that error, we propagated
backward and increased the value of ‘W’.

● After that, also we noticed that the error has increased. We came to know that,
we can’t increase the ‘W’ value.

● So, we again propagated backward and we decreased ‘W’ value.

● Now, we noticed that the error has reduced.

So, we are trying to get the value of weight such that the error becomes minimum.
Basically, we need to figure out whether we need to increase or decrease the weight
value. Once we know that, we keep on updating the weight value in that direction
until error becomes minimum. You might reach a point, where if you further update
the weight, the error will increase. At that time you need to stop, and that is your final
weight value.

Consider the graph below:

41|Page
42|Page
Technical Description

Figure 3.10: Graph showing Square Error vs. Weight

We need to reach the ‘Global Loss Minimum’. This is nothing but Backpropagation.

3.6 Gradient Descent

To explain Gradient Descent we will use the classic mountaineering example.


Suppose you are at the top of a mountain, and you have to reach a lake which is at the
lowest point of the mountain. A twist is that you are blindfolded and you have zero
visibility to see where you are headed. So, what approach will you take to reach the
lake?

Figure 3.11: Classic Mountain Example for Gradient Descent

43|Page
Technical Description

The best way is to check the ground near you and observe where the land tends to
descend. This will give an idea in what direction you should take your first step. If
you follow the descending path, it is very likely you would reach the lake.
To represent this graphically, notice the below graph.

Figure 3.12: Graph showing Cost Function vs. Parameters

Let us now map this scenario in mathematical terms.

Suppose we want to find out the best parameters (θ1) and (θ2) for our learning
algorithm. Similar to the analogy above, we see we find similar mountains and valleys
when we plot our “cost space”. Cost space is nothing but how our algorithm would
perform when we choose a particular value for a parameter.

So on the y-axis, we have the cost J(θ) against our parameters θ1 and θ2 on x-axis and
z-axis respectively. Here, hills are represented by the red region, which has a high
cost, and valleys are represented by the blue region, which has a low cost.

44|Page
Technical Description

Now there are many types of gradient descent algorithms. They can be classified by
two methods mainly:

● On the basis of data ingestion

1. Full Batch Gradient Descent Algorithm


2. Stochastic Gradient Descent Algorithm

In full batch gradient descent algorithms, you use whole data at once to compute the
gradient, whereas in stochastic you take a sample while computing the gradient.

● On the basis of differentiation techniques

1. First order Differentiation


2. Second order Differentiation

Gradient descent requires the calculation of gradient by differentiation of cost


function. We can either use first-order differentiation or second-order differentiation.

3.6.1 Challenges in executing Gradient Descent

Gradient Descent is a sound technique which works in most of the cases. But there are
many cases where gradient descent does not work properly or fails to work altogether.
There are three main reasons when this would happen:

1. Data challenges
2. Gradient challenges
3. Implementation challenges

Data Challenges

● If the data is arranged in a way that it poses a non-convex optimization


problem. It is very difficult to perform optimization using gradient descent.
Gradient descent only works for problems which have a well-defined convex
optimization problem.

45|Page
Technical Description

● Even when optimizing a convex optimization problem, there may be


numerous minimal points. The lowest point is called the global minimum,
whereas the rest of the points are called local minima. Our aim is to go to a
global minimum while avoiding local minima.

● There is also a saddle point problem. This is a point in the data where the
gradient is zero but is not an optimal point. We don’t have a specific way to
avoid this point and is still an active area of research.

Gradient Challenges

● If the execution is not done properly while using gradient descent, it may lead
to problems like vanishing gradient or exploding gradient problems. These
problems occur when the gradient is too small or too large. And because of
this problem, the algorithms do not converge.

Implementation Challenges

● Most of the neural network practitioners don’t generally pay attention to


implementation, but it’s very important to look at the resource utilization by
networks. For e.g.: When implementing gradient descent, it is very important
to note how many resources you would require. If the memory is too small for
your application, then the network would fail.

● Also, it’s important to keep track of things like floating point considerations
and hardware/ software prerequisites.

3.6.2 Learning Rate

The size of the steps taken during gradient descent is called the learning rate. With a
high learning rate, we can cover more ground each step, but we risk overshooting the
lowest point since the slope of the hill is constantly changing. With a very low
learning rate, we can confidently move in the direction of the negative gradient since
we are recalculating it so frequently. A low learning rate is more precise, but

46|Page
Technical Description

calculating the gradient is time-consuming, so it will take us a very long time to get to
the bottom.

3.6.3 Practical tips on applying gradient descent


When applying gradient descent, you can look at these points which might be helpful
in circumventing the problem:

● Error rates – You should check the training and testing error after specific
iterations and make sure both of them decreases. If that is not the case, there might be
a problem!

● Gradient flow in hidden layers – Check if the network doesn’t show a


vanishing gradient problem or exploding gradient problem.

● Learning rate – which you should check when using adaptive techniques. A
decent trick is to multiply your learning rate by 0.3 and adjust the steps accordingly
till you reach the global minimum.

47|Page
Methodology

4
Methodology
4.1 Approach
The approach that we followed to accomplish the task is represented by the flowchart
given below -

Figure 4.1: Flowchart of the Approach followed

48|Page
Methodology

● Upload Grayscale Image


A grayscale image is uploaded to the webpage and it is further processed by python
script to resize its dimensions for the proper execution of the model.

● Pre-processing of the uploaded image


This stage involves the conversion of uploaded image into a form suitable for
manipulation by later stages.

● Feature Extraction
Natural images have the property of being ”‘stationary”’, meaning that the statistics of
one part of the image are the same as any other part. This suggests that the features
that we learn at one part of the image can also be applied to other parts of the image,
and we can use the same features at all locations.
Feature Extraction consists of preparing a model with the following layers -

● Convolution

● Activation

● Pooling (Down Sampling)

● Flattening

● Full Connection

In our approach, we constructed a convolutional neural network consisting of 8


convolutional layers which train itself on colored images and then predicts “a” and
“b” layers for the black & white images in the Lab color format.
Convolution is a mathematical operation on two functions (f and g) to produce a third
function that expresses how the shape of one is modified by the other.
A Convolutional layer computes the output volume by the computing dot product
between all filters and image patch.

49|Page
Methodology

Mathematically, convolution operation can be written as -

Figure 4.2: Illustration of Convolution Function on an image

Figure 4.3: Diagram depicting Layers of Model used

The above-shown model in the diagram tries to predict the “a” and “b” color
channels for the grayscale input image in “Lab” color space.

50|Page
Methodology

Lab Color Space


L stands for lightness and a and b for the color spectrums green–red and blue–
yellow.
A Lab encoded image has one layer for grayscale and has packed three color layers
into two. This means that we can use the original grayscale image in our final
prediction. Also, we only have to two channels to predict.

Figure 4.4: Example Illustrating Lab Color Space

● Training Neural Network


Training of the neural network consists of the following steps:

● Processing of an instance through the model

● Error Calculation (Mean Squared Error)

● Backpropagation (Stochastic Gradient Descent) and weights updations

● Generate Output
After model training is finished, we have a grayscale layer for input and we expect the
model to predict two color layers, the ab in Lab.

51|Page
FIGURE 4.5: Process showing colorization of a grayscale image

52|Page
Result and Discussions

5
Results and Discussions
For the following CNN Model, we observed the following results:

Figure 5.1: Code of Building the CNN

53|Page
Result and Discussions

Figure 5.2: Summary of the Model Architecture

54|Page
Result and Discussions

Figure 5.3: Home Page of Web Application

Figure 5.4: Automatic Demo:


A specified number of random samples from Test set are colorized

55|Page
Result and Discussions

Figure 5.5: File input for a specific demo

Figure 5.6: Specific Demo:


Colorizes a specific input file provided by the user

56|Page
Result and Discussions

Results
Input Image Epochs=1000 Epochs>1000

57|Page
Result and Discussions

Result and Discussions

58|Page
From the above observations, we can infer that -
● The software produced promising results which can be used in gaining some
useful insight.
● The output images produced are vibrant and plausible though they may not
represent the ground truth colors.
● The software performed well on different types of images like human life,
infrastructure, nature views, etc.
● Overfitting is not present, as it is clear from the random inputs.

59|Page
Conclusions

6
Conclusions

The conclusion we can draw from the above results are as follows -

● The precision of images mainly depends upon the architecture of the model
used and training of the model, as we can see by increasing the number of
epochs more promising results were obtained

● Not only increasing the number of epochs is sufficient, but a large dataset is
also required so as to avoid the problem of overfitting.

● Efficient utilization of resources results in the reduction of time and hence


greater accuracy. For eg- GPU in a system can be used instead of a CPU to
reduce time.

● Addition of more Convolution layers resulted in a more accurate model.

● Even after adding a significant number of convolution layers the size of the
output is still large to be manipulated efficiently. In this case, pooling or
downsampling proves to be a good candidate to make the output of
convolution layers more compact without significant loss of information.

● Our model can be efficiently used to colorize small images but, if it was to
handle large images it would take a significant amount of time under normal
operating conditions and hence it would not be a reliable way since cost
requirements will be increased. Therefore new techniques such as Deep
Convolutional Generative Adversarial Network (DCGAN) can also be used
for further optimization.

60|Page
61|Page
Future Scope

7
Future Scope

Though our work gives promising results, there are still a lot of limitations that we
plan to address in the future-

● Color Precision - Though the results provided by the model were satisfactory,
there is still a need for more training of the model to enhance precision.

● With incoming of better methods in deep learning such as Deep Convolutional


Generative Adversarial Network (DCGAN), the model can further be
improved to enhance color precision.

● The ideas proposed in our dissertation can be extended for colorization of -

❏ Old movies, videos, and images


❏ CCTV footages

● We further aim to make our model more precise by adding more convolutional
and pooling layers.

62|Page
References
1. Tung Nguyen, Kazuki Mori, Ruck Thawonmas. 2016. Image Colorization
Using a Deep Convolutional Neural Network. In ASIAGRAPH 2016
Proceeding.
2. Arshiya Sayyed, Apeksha Rahangdale, Rutuja Hasurkar, and Kshitija Hande.
2017. Automatic Colorization Of Gray-scale Images using Deep Learning. In
International Journal of Science, Engineering, and Technology Research
(IJSETR).
3. Richard Zhang, Phillip Isola, Alexei A. Efros. Colorful Image Colorization.
2016. Summarization through submodularity and dispersion. In European
Conference on computer vision.
4. Kamyar Nazeri, Eric Ng, and Mehran Ebrahimi. 2018. Image Colorization
using Generative Adversarial Networks. In book: Articulated Motion and
Deformable Objects, pp.85-94.
5. K. Simonyan and A. Zisserman, 2015. Very Deep Convolutional Networks for
Large-Scale Image Recognition. International Conference on Learning
Representations 2015 (arXiv:1409.1556v6).
6. Jeff Hwang and You Zhou, 2016. Image Colorization with Deep
Convolutional Neural Networks.
7. Richard Zang interactive demo, performance comparisons, deep dream
visualization - https://richzhang.github.io/colorization/
8. A pre-trained neural network model for colorizing Black & White images -
https://blog.floydhub.com/colorizing-b-w-photos-with-neural-networks/
9. Deep Convolutional Generative Adversarial Network for satellite imagery -
https://medium.com/the-downlinq/artificial-colorization-of-grayscale-satellite-
imagery-via-gans-part-1-79c8d137e97b
10. https://towardsdatascience.com/develop-a-nlp-model-in-python-deploy-it-
with-fla sk-step-by-step-744f3bdd7776

63|Page

You might also like