0% found this document useful (0 votes)

23 views9 pages

VCAS 2022 Paper 632

The document describes research on using a convolutional neural network and recurrent neural network model to automatically generate captions for images. It analyzes using principal component analysis (PCA) for feature selection in the model. PCA performed best according to the BLEU score metric, selecting the most relevant features and increasing the quality of generated captions while decreasing training time. The model takes images as input, extracts features using InceptionV3, selects features using PCA, and feeds these to an RNN to generate captions in English.

Uploaded by

Prashant Prashant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views9 pages

VCAS 2022 Paper 632

Uploaded by

Prashant Prashant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Analyzing PCA for Automatic Image Captioning

Prashant and Gargi Srivastava

Rajiv Gandhi Institute of Petroleum Technology, Jais

Amethi 229304, Uttar Pradesh, India
{20cs3045,gsrivastava}@rgipt.ac.in

Abstract. When you see an image, your brain can easily describe it,
but can this be performed by a machine? With growing deep learn-
ing techniques and massive datasets available, we can automatically
build a model to generate image captions. In this work, CNN (Con-
volutional Neural Networks) and RNN (Recurrent Neural Networks) are
used to construct a model that automatically creates captions to images.
FLICKR 8K dataset is used as a benchmark dataset. As per the re-
sults, it is observed that PCA(Principal Component Analysis) performs
best for selecting features according to the BLEU score. The categorical
cross-entropy loss function performs best.

Keywords: deep learning, image captions, CNN

1 Introduction

Automatic image captioning is generating textual descriptions of an image us-

ing an artificial system. It involves natural language processing concepts to de-
scribe images in natural language such as English [1]. This work implements
the captions generator using CNN and RNN [2] together. Image features will be
extracted using the InceptionV3 [3] model, a CNN pre-trained model on the Im-
agenet dataset. Features are selected using feature selection algorithms such as
PCA [4], KPCA [5], SVD [6], etc. Feature selection is selecting relevant features
that decrease the training time and increase the quality of captions generated.
The features are fed to RNN, which is responsible for generating captions in the
English language.

Fig. 1. Automatic Image Captioning

2 Prashant and Gargi

1.1 Motivation

It is necessary to know how important this problem is to the real world. And
generating automatic captions are really solving lots of problems.

– Medical use- Taking a snapshot of the affected area of the skin and gener-
ating captions is used to identify diseases.
– CCTV Camera - Along with viewing the world, we can also generate
captions to the video, which will help to reduce the crimes and accidents.
– Visually Impaired - It helps visually impaired persons to get information
about images.
– Petroleum exploration-Generating captions of reservoir rocks on the sub-
surface of the earth helps to know the property of the reservoir.

2 Methodology

2.1 Collecting Dataset

The FLICKR 8K dataset [7] which contains 8000 images with 5 captions each
is used for training and testing.This is divided into three parts-

1. Training images - 6000

2. Testing images - 1000
3. Validation images - 1000

2.2 Cleaning Description

This dataset contains 5 captions of each image. Captions are English sentences
that contain special symbols(like a full-stop, question mark, etc.). For prepro-
cessing, these symbols and single-letter words are eliminated. After cleaning, the
description dataset looks as shown in Fig. 2.

Fig. 2. Post-processed captions

Automatic Image Captioning 3

2.3 Extracting feature vector from image

The input to our model is images. They need to be converted into a fixed-size
vector. For this purpose the InceptionV3(convolutional neural networks) model
is used as shown in Fig. 3, which is pre-trained on the Imagenet [8] dataset.
InceptionV3 takes images as input and converts them into a fixed-size vector of
length 2048.

Fig. 3. InceptionV3 Model

3 Feature selection
Feature selection is the process of selecting relevant features. It is the process
of reducing the input variables. It is desirable to reduce the number of input
variables to both reduce the model’s training time and increase the model’s
performance. It decreases the redundancy in features. There are many feature
selection techniques available. Some of them are PCA (Principal Component
Analysis), KPCA (Kernel Principal Component Analysis) etc.

3.1 Principal Component Analysis(PCA)

Principal Component Analysis (PCA), generally known as the data reduction
technique, is a helpful feature selection technique because it uses linear algebra
to transform the dataset into a compressed form. This work implements it using
the sci-kit-learn python library. It provides a choice to select the number of
features in the output.

3.2 Kernel Principal Component Analysis(KPCA)

Kernel Principal Component Analysis is a non-linear dimensional transformation
technique. It is an extension of Principal Component Analysis (PCA), which is
a linear transformation technique.
4 Prashant and Gargi

3.3 SVD(Singular Value Decomposition)

SVD is a data decomposition approach similar to principal component analysis

(PCA).For the decomposition, SVD exploits the linear combination of rows and
columns of the matrix.

4 Data preprocessing - Captions

Captions are what the model is going to predict; it is the target of our model.
So it needs to tokenize all words in the captions and encode them to a fixed size
vector. The model will map all the tokens to a 200 length fixed-size vector using
a pre-trained GLOVE [9] model.

5 Model Architecture

There are two inputs to the model for training, image and partial caption. This
is achieved using functional APIs provided by the Keras [10] library of Python.
Functional APIs allow creating a Merge model. Let us look on model summary:

Fig. 4. Caption

LSTM(Long short term Memory) [11] is a specialized Recurrent Neural Network

(RNN). The model uses an adam optimizer of keras library to compile our model.

6 Evaluation

Some evaluation techniques are needed to know how good captions the model
is predicting. The BLEU score [12] is used to evaluate our model. BLEU score
is a number between zero and one which measures similarity between machine-
generated text and a set of good quality reference translations. It must be noted
that the image used for testing must be similar to what have been used in training
the model. No machine learning model will give relevant captions if the testing
image is totally different from the training images.
Automatic Image Captioning 5

Fig. 5. Caption

7 Analysis and results

7.1 Loss Function
Three different loss functions i.e Categorical cross entropy [13], Binary cross
entropy [14] and Poisson loss [15] are verified. Above result concludes that Cat-

Table 1. Model accuracy against Loss functions

Categorical cross Binary cross

Loss function Poisson
entropy entropy
Accuracy 0.3021 0.2875 0.2904

egorical cross entropy performs best for this project.

7.2 Feature Selection

Four methods for feature selection have been analyzed. Only 512 features of 2048
were selected. It is observed that PCA performs best for the model. Selecting
different numbers of features using PCA. The above result shows that reducing
the dimension increases the average BLEU score i.e. it increases the performance
6 Prashant and Gargi

Table 2. Average bleu score against Feature selection methods

Feature selection
PCA KPCA SVD MDS
method
Average Bleu
0.3593 0.3435 0.3524 0.3321
Score

Table 3. Effect on Average Blue Score due to Number of selected features

Number of Average
selected features Bleu Score
512 0.3593
256 0.3603
128 0.3716
64 0.3745
32 0.3800
2 -

of the model. But reducing it to a very small size leads to overfitting of the model
i.e for 2 in case of PCA. Since machines cannot produce good captions as human.
The caption is irrelevant in the last case which shows it need to improve more.
This becomes the future scope of this work. Some of results produced by our
model are shown in Fig. 6, Fig. 7, Fig. 8 and Fig. 9.

8 Conclusion and Future Scope

Though a good level of accuracy is obtained, lots of modifications can be done

to improve this.

– Use of a larger dataset.

– Use different feature selection methods. i.e., using supervised selection meth-
ods.
– Changing the architecture of the model.
– Different matrices can be used to evaluate the model instead of BLEU.
– Use cross-validation set to know about overfitting of the model.

References
1. Srivastava, G., Srivastava, R.: A survey on automatic image captioning. In: Inter-
national Conference on Mathematics and Computing, pp. 74–83. Springer (2018)
2. Wang, J.: Analysis and design of a recurrent neural network for linear program-
ming. IEEE Transactions on Circuits and Systems I: Fundamental Theory and
Applications 40(9), 613–618 (1993)
3. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the incep-
tion architecture for computer vision. In: Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 2818–2826 (2016)
Automatic Image Captioning 7

Fig. 6. The BLEU-1 in this case is 0.5555.

Fig. 7. BLEU-1 in this case is 0.875 which is quite good.

8 Prashant and Gargi

Fig. 8. BLEU-1 is 0.7.

Fig. 9. BLEU-1 in this case is 0.33.

Automatic Image Captioning 9

4. Rao, C.R.: The use and interpretation of principal component analysis in applied
research. Sankhyā: The Indian Journal of Statistics, Series A pp. 329–358 (1964)
5. Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In:
International conference on artificial neural networks, pp. 583–588. Springer (1997)
6. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions.
In: Linear algebra, pp. 134–151. Springer (1971)
7. Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a rank-
ing task: Data, models and evaluation metrics. Journal of Artificial Intelligence
Research 47, 853–899 (2013)
8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-
scale hierarchical image database. In: 2009 IEEE conference on computer vision
and pattern recognition, pp. 248–255. Ieee (2009)
9. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre-
sentation. In: Proceedings of the 2014 conference on empirical methods in natural
language processing (EMNLP), pp. 1532–1543 (2014)
10. Chollet, F., et al.: Keras (2015). URL https://github.com/fchollet/keras
11. Schmidhuber, J., Hochreiter, S., et al.: Long short-term memory. Neural Comput
9(8), 1735–1780 (1997)
12. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic
evaluation of machine translation. In: Proceedings of the 40th annual meeting of
the Association for Computational Linguistics, pp. 311–318 (2002)
13. Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural
networks with noisy labels. Advances in neural information processing systems 31
(2018)
14. Liu, L., Qi, H.: Learning effective binary descriptors via cross entropy. In: 2017
IEEE winter conference on applications of computer vision (WACV), pp. 1251–
1258. IEEE (2017)
15. Brigo, D., Pallavicini, A., Torresetti, R.: Calibration of cdo tranches with the dy-
namical generalized-poisson loss model. Available at SSRN 900549 (2007)

Zlib - Pub Marine Biology Function Biodiversity Ecology
100% (4)
Zlib - Pub Marine Biology Function Biodiversity Ecology
584 pages
Chapter 1 Managers and You in The Workplace
100% (1)
Chapter 1 Managers and You in The Workplace
75 pages
R Max Powered Running Manual
100% (2)
R Max Powered Running Manual
40 pages
Image Captioning Research Paper
No ratings yet
Image Captioning Research Paper
59 pages
Image Captioning
No ratings yet
Image Captioning
33 pages
Image Caption Generator PCL
No ratings yet
Image Caption Generator PCL
19 pages
Design of Machine Learning Algorithms For Object Captioning
No ratings yet
Design of Machine Learning Algorithms For Object Captioning
45 pages
RP Springer
No ratings yet
RP Springer
10 pages
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
No ratings yet
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
10 pages
Ai Image Captioning
No ratings yet
Ai Image Captioning
10 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
BTP Report
No ratings yet
BTP Report
27 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
Aust Cse Thesis Final Book
No ratings yet
Aust Cse Thesis Final Book
72 pages
Presentation Manu Niha
No ratings yet
Presentation Manu Niha
11 pages
PGCON Paper Final
No ratings yet
PGCON Paper Final
4 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
Rich Image Captioning in The Wild
No ratings yet
Rich Image Captioning in The Wild
8 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
Automated Neural Image Caption Generator For Visually Impaired People
No ratings yet
Automated Neural Image Caption Generator For Visually Impaired People
6 pages
Automated Image Captioning Using CNN and RNN
No ratings yet
Automated Image Captioning Using CNN and RNN
17 pages
Automatic Image Captioning Using Neural Networks
No ratings yet
Automatic Image Captioning Using Neural Networks
9 pages
Implementation of Simple and Efficient P
No ratings yet
Implementation of Simple and Efficient P
8 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Image Captioners Sometimes Tell More Than Images They See
No ratings yet
Image Captioners Sometimes Tell More Than Images They See
6 pages
Image Captioning Using R-CNN & LSTM Deep Learning Model
No ratings yet
Image Captioning Using R-CNN & LSTM Deep Learning Model
4 pages
Mini Project Fln..
No ratings yet
Mini Project Fln..
51 pages
Poster 2
No ratings yet
Poster 2
1 page
Hybrid Image Captioning Model
No ratings yet
Hybrid Image Captioning Model
6 pages
Document From Deependra Singh
No ratings yet
Document From Deependra Singh
10 pages
Fang 2015
No ratings yet
Fang 2015
10 pages
Review 3
No ratings yet
Review 3
18 pages
Imagecaptionusing CNNand LSTM
No ratings yet
Imagecaptionusing CNNand LSTM
11 pages
Image Caption Generator Using AI: Review - 1
No ratings yet
Image Caption Generator Using AI: Review - 1
9 pages
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning
No ratings yet
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning
10 pages
Image Captioning Using CNN and LSTM
No ratings yet
Image Captioning Using CNN and LSTM
9 pages
Image Caption Generation
No ratings yet
Image Caption Generation
8 pages
Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory
No ratings yet
Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory
8 pages
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
No ratings yet
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
10 pages
ALGORITHM Saikareddy Img Cap-1742112866980
No ratings yet
ALGORITHM Saikareddy Img Cap-1742112866980
6 pages
DogCat Report
No ratings yet
DogCat Report
10 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
Materials Today: Proceedings: K. Loganathan, R. Sarath Kumar, V. Nagaraj, Tegil J. John
No ratings yet
Materials Today: Proceedings: K. Loganathan, R. Sarath Kumar, V. Nagaraj, Tegil J. John
5 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
Liceria & Co.
No ratings yet
Liceria & Co.
16 pages
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
No ratings yet
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
8 pages
CNN and RNN
No ratings yet
CNN and RNN
82 pages
DL 20i0551 Project Proposal
No ratings yet
DL 20i0551 Project Proposal
3 pages
118 Presentation
No ratings yet
118 Presentation
26 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
Research Paper Final
No ratings yet
Research Paper Final
5 pages
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
No ratings yet
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
4 pages
DL Project Report
No ratings yet
DL Project Report
10 pages
Image Caption Generator Research Paper
No ratings yet
Image Caption Generator Research Paper
4 pages
Visual Image Caption Generator Using Deep Learning
No ratings yet
Visual Image Caption Generator Using Deep Learning
7 pages
Image Captioning
No ratings yet
Image Captioning
16 pages
Review 3
No ratings yet
Review 3
18 pages
Base Paper
No ratings yet
Base Paper
6 pages
Ds File
No ratings yet
Ds File
58 pages
Image To Caption Generator
No ratings yet
Image To Caption Generator
7 pages
Show and Tell: A Neural Image Caption Generator
No ratings yet
Show and Tell: A Neural Image Caption Generator
9 pages
SCIENCE 1-4th QUARTER EXAM
No ratings yet
SCIENCE 1-4th QUARTER EXAM
3 pages
Gauge Sizes Chart: EN 10253 4 Structural Dimensions of Fittings ISO 5251 ISO 3419
No ratings yet
Gauge Sizes Chart: EN 10253 4 Structural Dimensions of Fittings ISO 5251 ISO 3419
10 pages
Strategic Marketing Plan For Black Thunder
No ratings yet
Strategic Marketing Plan For Black Thunder
24 pages
Kinerja Ruas Dan Simpang Jalan
No ratings yet
Kinerja Ruas Dan Simpang Jalan
43 pages
The Nature and Goals of Anthropology, Sociology and Political Science
No ratings yet
The Nature and Goals of Anthropology, Sociology and Political Science
12 pages
Detention Volume Estimating Workbook (PDF) - 201404301105510967
No ratings yet
Detention Volume Estimating Workbook (PDF) - 201404301105510967
300 pages
Safari 8
No ratings yet
Safari 8
8 pages
EV Juices - Pixelmon Wiki
No ratings yet
EV Juices - Pixelmon Wiki
1 page
Adm602 - Tutorial - Question - Week 8
No ratings yet
Adm602 - Tutorial - Question - Week 8
7 pages
Houston Stuart 2001 PDF
No ratings yet
Houston Stuart 2001 PDF
33 pages
Nigerian Agricultural Journal: Adoption of Improved Soybean Production Technologies in Benue State, Nigeria
No ratings yet
Nigerian Agricultural Journal: Adoption of Improved Soybean Production Technologies in Benue State, Nigeria
6 pages
AAnalyst 300 Data Sheet
No ratings yet
AAnalyst 300 Data Sheet
2 pages
Analisis Desain Grafis Menggunakan Teknologi Komputer Berbasis Software Coreldraw
No ratings yet
Analisis Desain Grafis Menggunakan Teknologi Komputer Berbasis Software Coreldraw
11 pages
Ideal Gas and Real Gas Deviation Van Der Waals Equation
No ratings yet
Ideal Gas and Real Gas Deviation Van Der Waals Equation
8 pages
7673 Final Report - ST-2016-7673-1
No ratings yet
7673 Final Report - ST-2016-7673-1
58 pages
Wa0007.
No ratings yet
Wa0007.
55 pages
WK 6 Strategic Planning Policy Analysis
No ratings yet
WK 6 Strategic Planning Policy Analysis
47 pages
Lyrics
No ratings yet
Lyrics
42 pages
Trane Presentation
No ratings yet
Trane Presentation
52 pages
Life Science Book
67% (3)
Life Science Book
448 pages
Vaya Linear MP RGB BCP424 50 RGB L1210 CE 60 Watt
No ratings yet
Vaya Linear MP RGB BCP424 50 RGB L1210 CE 60 Watt
3 pages
SWAN SODIUM Na
No ratings yet
SWAN SODIUM Na
120 pages
Data Sheet - F12 EN 2021.12.09
No ratings yet
Data Sheet - F12 EN 2021.12.09
4 pages
Upstream Field Development Phase
No ratings yet
Upstream Field Development Phase
4 pages
What Is A Worldview? Published in Dutch As: "Wat Is Een Wereldbeeld?"
No ratings yet
What Is A Worldview? Published in Dutch As: "Wat Is Een Wereldbeeld?"
14 pages
Final Monsoon Report 2015 Punjab
No ratings yet
Final Monsoon Report 2015 Punjab
31 pages
Newton's Laws of Motion at Work Science Presentation in Beige Charcoal Hand Drawn Style
No ratings yet
Newton's Laws of Motion at Work Science Presentation in Beige Charcoal Hand Drawn Style
18 pages

VCAS 2022 Paper 632

Uploaded by

VCAS 2022 Paper 632

Uploaded by

Analyzing PCA for Automatic Image Captioning

Prashant and Gargi Srivastava

Rajiv Gandhi Institute of Petroleum Technology, Jais

Keywords: deep learning, image captions, CNN

Automatic image captioning is generating textual descriptions of an image us-

Fig. 1. Automatic Image Captioning

2.1 Collecting Dataset

1. Training images - 6000

2.2 Cleaning Description

Fig. 2. Post-processed captions

2.3 Extracting feature vector from image

Fig. 3. InceptionV3 Model

3.1 Principal Component Analysis(PCA)

3.2 Kernel Principal Component Analysis(KPCA)

3.3 SVD(Singular Value Decomposition)

SVD is a data decomposition approach similar to principal component analysis

4 Data preprocessing - Captions

LSTM(Long short term Memory) [11] is a specialized Recurrent Neural Network

7 Analysis and results

Table 1. Model accuracy against Loss functions

Categorical cross Binary cross

egorical cross entropy performs best for this project.

7.2 Feature Selection

Table 2. Average bleu score against Feature selection methods

Table 3. Effect on Average Blue Score due to Number of selected features

8 Conclusion and Future Scope

Though a good level of accuracy is obtained, lots of modifications can be done

– Use of a larger dataset.

Fig. 6. The BLEU-1 in this case is 0.5555.

Fig. 7. BLEU-1 in this case is 0.875 which is quite good.

Fig. 8. BLEU-1 is 0.7.

Fig. 9. BLEU-1 in this case is 0.33.

You might also like