Automatic Image Colorization Using Deep Learning
Automatic Image Colorization Using Deep Learning
The approach used is that both the content-image and the density and diversity atasets have been evaluated. The
style-image are passed into a CNN network that will be difference between object-centric and scene-centric
pre-trained and then the formats of content representation networks has been shown. Linear SVM and pre-trained
and styled representation will be extracted. Then the same ImageNet database has been used. Visualization of CNN
will be done to a noisy picture. For optimization L-BFGS layers have been used to show object-centric and scene-
was used and then the parameters were properly tuned and centric networks. The Places database is very large it
images produced were much better than the method of using achieves the best performance when the whole of the set is
stochastic gradient descent [2]. used as a training set. The workers were presented with
In this model architecture is proposed which is based on different sets of images and had to choose the set which is
neural networks. It will color black and white pictures so similar. The deep features obtained from ImageNet were
without the use of any human-interference. Many network not enough competitive to perform the tasks. The Places
models, problem objectives are focused on. The final dataset was 60 times larger than the SUN database [6].
architecture will produce colored pictures that will be more In this paper, a generic framework has been introduced
useful and pleasing than the previously made base-line without any supervision. An online algorithm has been
regression models. This system uses various datasets. There proposed for big image databases. This approach is less
are several 1000s of pictures divided into 8 categories in the efficient for training rather any other supervised method.
MIT CVCL Urban and natural scene categories dataset. End to end training occurs and this project aims at learning
About 411 pictures were experimented to check the discriminative features and very few assumptive features are
reliability of the system. A pipeline is built by making the there and so this is very easy to train. A separate mean
program read pictures of certain constrained dimensions and square function has been used and so the model is used to
in red, green, blue color spaces. This pipeline consists of a train millions of images. A SoftMax function as a loss
neural network. This model also solved issues of image- function has been used. After performing several
inconsistency. The system can be made to learn to produce experiments, the quality of features has been evaluated.
pictures that could be compared with real images [3]. ImageNet database has been used and object classification
Traditionally picture-colorization is done using scribbling and detection have been done [7].
methods that work manually. In this paper, an automated
method is proposed. Two distinct convolutional III. PROPOSED WORK
architectures of the neural network are compared and trained
In this approach, we build a deep convolutional neural
using various lossy functions. Each variant result is obtained
network that takes a grayscale image as an input and
in the form of pictures, videos and then compared. The main
produces a colorized image. Firstly, we convert our black
goal of this paper is to determine whether any possibility is
and white image in 256 x 256 pixels. We give this as an
there to use neural networks for the colorization of grayscale
input to our neural network. Our model is trained to produce
images in an automated manner or not. The images would be
photos with realistic colors by training on colorful images.
different from natural images by containing less amount of
The images produced would easily fool a viewer.
textural material so making the process of obtaining
The RGB color space is a 3-channel color space. CIE
information harder. So, for obtaining these several variants
Lab color space is similar to RGB color space but the only
of neural networks and then performances compared. Two
difference is that the color information is encoded only in
architectures are considered-one will be a traditional and
the "a" and "b" channels. The L (lightness) channel only
plain network and another one will be inspired by a residual
encodes the intensity, so we can use it as our grayscale input
network that has not been put to use previously [4].
to the neural network. The trained network will predict ab
The aim of this paper is to make an output image a realistic
channels. Now we will combine the produced ab channels
picture like the input but not necessarily the same as the
with L channel. Finally, we will convert the "Lab" image
original. A neural network is explored first. Then the model
back to the RGB color space.
is combined with a classifier named Inception ResNet V2
which has been trained using 1.2 million pictures in order to
IV. ARCHITECTURE
obtain a more realistic output. CNN has been used to color
images. This model had advantages compared to earlier On giving our model the l component of an image as an
models which used mean squared errors. And that led to input, it calculates the ab components. It then combines it
photos of which was desaturated. New models such as with the input to form the colored image. The architecture is
colorful image colorization help to encourage more bold proposed in the Fig. 1.
pixel choices as compared to those which were more The CNN is divided into 4 parts. The encoder component
conservative. The dataset used is that of Unsplash which produces mid-level features and the feature extraction
consists of 10,000 pictures, 95% of which is used in the component produces high level features. These are then
training set and 2.5% in the development set and 2.5% in the merged into the fusion layer. Finally, the output is generated
test set. Various transformations such as image zooming and with the help of decoder component.
flipping were also performed to avoid overfitting. A very
simple survey was done at the end to determine the
frequency of colorings which were accurate but this
approach proved to be too slow and blunt [5].
In this paper, the ImageNet database has been used which
A. Preprocessing Adam optimizer is used while training so that the loss is back
The pixel size of the images is scaled between (-1,1) for gets backpropagated and the model parameters gets updated.
correct learning.
VI. EXPERIMENTS
ImageNet database is used for the most part of the
training process. The database consists of images which are
in millions and they come in different sets. We have trained
our model on 18 gigabytes of the images. The shape of the
pictures of the ImageNet database are heterogeneous. So, all
the images are rescaled to (224 x 224) and (299 x 299) for
encoding and inception respectively. The training time was
around 8hrs. Nvidia GeForce 1050Ti GPU was used for
speeding up the process.
Fig. 1. Architecture
VII. RESULT
B. Encoder After training our model, we tried colorizing some black and
The input given is (H x W) black and white image and its white images. The nature elements like rivers, trees, grass,
processed into (H/8 x W/8 x 512). In this process, 8 etc. are colorized well but some of the objects are not
convolutional layers are used with (3 x 3) kernels. To always. For those objects, our model has produced next
preserve the input size of the layers, padding is used. The probable colors.
1st, 3rd and 5th layers have stride equal to 2. This causes the
output dimensions to be halved and therefore reduces the
required computations.
C. Feature Extractor
For this we are using a pre trained inception model. Firstly,
we scale the image to (299 x 299) and then we stack it with
itself to produce a 2-channel image. Then we feed this into
the network and just before the SoftMax function, we
extract the output. The resultant is the (1001 x 1 x 1)
embedding.
D. Fusion
The feature vector is replicated (HW/82) times and is
attached along the depth axis to the feature volume which is
the output of encoder. So, a single volume of shape (H/8 x
H/8 x 1257) is obtained. Finally, an image of (H/8 x W/8 x
256) dimensions is obtained after applying 256
convolutional layers of (1 x 1) size.
E. Decoder
The input to the decoder is (H/8 x W/8 x 256) image. It
is passed through a series of up-sampling and
convolutional layers. It outputs a layer of size (H x W x
2). We have compared our results with Zhang’s who has used
the same training set of images. We both have used different
V. OBJECTIVE FUNCTION loss functions. We observed that although the results were
good most of the time but some of the results were low
Optimal values for the model are calculated by saturated because of less diverse data set.
minimizing the objective function which is defined over the
target and estimated output. For this, we calculate the mean VIII. CONCLUSION AND FUTURE WORK
square error between the real value of the pixel colors of the
ab component and its estimated value. It is given by: In this project, we have presented an efficient way of
coloring images using Deep CNN unlike the older manual
procedure. The aim of this paper is to make an output image
a realistic picture like the input but not necessarily the same
as the original. Various transformations such as image
zooming and flipping were also performed to avoid
overfitting. High-level features are extracted using the
Θ: Model Parameters. model
Xk(i,j) : ij:th pixel value of the k:th component of the target.
X˜(ki,j): ij:th pixel value of the k:th component of the
reconstructed image.
Automatic Image Colorization using Deep Learning
=
REFERENCES
1. Federico Baldassare, Diego Gonzalez Morn and Lucas Rodes-Guirao.
“Deep Koalarization: Image Colorization using CNNs and
Inception-Resnet-v2”. arXiv:1712.03400, 2017.
2. Tung Nguyen, Kazuki Mori and Ruck Thawonmas, “Image
Colorization Using a Deep Convolutional Neural Network”.
arXiv:1604.07904, April 2016.
3. Jeff Hwang and You Zhou, “Image Colorization with Deep
Convolutional Neural Networks”, Stanford University.
4. David Futschik, “Colorization of black-and-white images using deep
neural networks”, January 2018.
5. Alex Avery and Dhruv Amin, “Image Colorization”.CS230 – Winter
2018.
6. Zhou, B. Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep
features for scene recognition using places database.
7. Kreahenbuhl, P, Doersch, C, Donahue, J, Darrell, Data-dependent
initializations of convolutional neural networks. ICOLR (2016).