0% found this document useful (0 votes)

21 views4 pages

Practical 3

Image captioning can be regarded as an end-to-end Sequence problem, as it converts images, which are regarded as a sequence of pixels, to a sequence of words.

Uploaded by

magnus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

Practical 3

Image captioning can be regarded as an end-to-end Sequence problem, as it converts images, which are regarded as a sequence of pixels, to a sequence of words.

Uploaded by

magnus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Theoretical Tasks

Image captioning can be regarded as an end-to-end Sequence problem, as it

converts images, which are regarded as a sequence of pixels, to a sequence of
words. For this purpose, we need to process both the language or statements
and the images. We use recurrent neural networks for the language part, and
for the image part, we use convolutional neural networks to obtain the feature
vectors. Image Captioning

We are dealing with two types of information, a language one and another
image one. So, the question arises of how or in what order we should intro-
duce the information into our model. Elaborately speaking, we need a language
RNN model to generate a word sequence, so when should we introduce the im-
age data vectors in the language model? A paper by Marc Tanti and Albert
Gatt [Comparison of Architectures], Institute of Linguistics and Language Tech-
nology, University of Malta covered a comparison study of all the approaches.
Image Captioning

You can read the post and Andrej Karpathy’s Architecture.

Then, you should be able to complete the following tasks.

Task 3.1
Answer the following questions:

1
Task 3.1.1 - Explain the pros and cons of utilising Concate-
nation for combining embeddings

Task 3.1.2 - Explain the pros and cons of utilising Addition

for combining embeddings

Task 3.1.3 - Explain the pros and cons of utilising Multi-

plication for combining embeddings

Task 3.1.4 - Explain the pros and cons of utilising Attention

for combining embeddings

Task 3.1.5 - Explain the pros and cons of utilising Difference

for combining embeddings
Task 3.2 Practical Assessment
Task Description

This exercise involves implementing an image-captioning network. You can

use any Deep Learning Framework of your choice. We will provide you with the
general steps, and you will implement them as you see fit.

You can choose training data for the images and captions. You will also
choose how to combine the embeddings and answer the questions at the end.

As a base, you can use this example Image captioning in Pytorch.

The basic high-level steps to follow are:

Image Feature Extraction:

The first step in an image captioning network is to extract the features from the
image. This is usually done using a Convolutional Neural Network (CNN). The
CNN takes the image as input and outputs a feature vector that represents the
image’s content. This feature vector serves as the input to the next part of the
network.
Example: Using Keras.

2
1 # Extract features from the image
2 base_model = VGG16(weights='imagenet', include_top=False)
3 image_features = base_model.predict(img_array)
4 image_features = image_features.reshape(image_features.shape[0], -1)

Sequence Model for Language Processing:

The next part of the network is a sequence model, usually some variant of LSTM
(Long Short-Term Memory) or GRU (Gated Recurrent Unit). This part of the
network is responsible for generating the caption. It takes the feature vector
from the CNN as input and generates a sequence of words as output.

1 # Two networks, one for images and one for captions

2 image_input = Input(shape=(image_features.shape[1],))
3 image_embedding = Dense(256, activation='relu')(image_input)
4

5 caption_input = Input(shape=(max_length,))
6 caption_embedding = Embedding(input_dim=vocab_size, output_dim=256, input_length=max_length)
7 caption_embedding = LSTM(256)(caption_embedding)

Combining Image and Text Data:

The feature vector from the CNN and the output from the RNN need to be
combined to generate the final caption. There are several ways to do this, such
as Concatenation, Addition, attention mechanisms, etc.

2 combined_embeddings = you_choose_how([image_embedding, caption_embedding])

3 # And you need to create the model, for example:
4 Model(inputs=[image_input, caption_input], outputs=output)
5 model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.001), metrics=['accuracy'

Where output is a standard FCC:

1 # For N classes, we are using Softmax

2 output = Dense(vocab_size, activation='softmax')

Training the Network:

You will need a dataset of images and their corresponding captions. Then, do
a standard training process.
In Keras:
1 model.fit([image_features, X_captions], y, epochs=10, batch_size=1)

3
Generating Captions:
When the process is completed, it should be able to generate captions on the
test set; please show them.

Evaluation:
The quality of the generated captions is typically evaluated using metrics like
BLEU, METEOR, ROUGE, or CIDEr, which compare the generated caption
to a set of reference captions.
Note: This section is optional as long as you can see the loss is decreasing;
your model will not be penalized on this.

Image Caption Generation Using Deep Learning: Department of Electronics & Instrumentation Engineering NIT Silchar, Assam
No ratings yet
Image Caption Generation Using Deep Learning: Department of Electronics & Instrumentation Engineering NIT Silchar, Assam
21 pages
The Speechwriter February 2018
100% (1)
The Speechwriter February 2018
8 pages
Aneja Convolutional Image Captioning CVPR 2018 Paper
No ratings yet
Aneja Convolutional Image Captioning CVPR 2018 Paper
10 pages
An Empirical Study of Language CNN For Image Captioning
No ratings yet
An Empirical Study of Language CNN For Image Captioning
10 pages
Gu An Empirical Study ICCV 2017 Paper PDF
No ratings yet
Gu An Empirical Study ICCV 2017 Paper PDF
10 pages
ALGORITHM Saikareddy Img Cap-1742112866980
No ratings yet
ALGORITHM Saikareddy Img Cap-1742112866980
6 pages
Ai Image Captioning
No ratings yet
Ai Image Captioning
10 pages
BTP Report
No ratings yet
BTP Report
27 pages
Presentation Manu Niha
No ratings yet
Presentation Manu Niha
11 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
9 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
Image Captioning Using CNN and LSTM
No ratings yet
Image Captioning Using CNN and LSTM
9 pages
Image Captioning With Visual Attention PDF
No ratings yet
Image Captioning With Visual Attention PDF
16 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
Project Review
No ratings yet
Project Review
12 pages
Image Captioning
No ratings yet
Image Captioning
8 pages
Imagecaptionusing CNNand LSTM
No ratings yet
Imagecaptionusing CNNand LSTM
11 pages
CNN Text Classification
No ratings yet
CNN Text Classification
12 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
Design of Machine Learning Algorithms For Object Captioning
No ratings yet
Design of Machine Learning Algorithms For Object Captioning
45 pages
Mini Project Fln..
No ratings yet
Mini Project Fln..
51 pages
Image Caption Generator Using AI: Review - 1
No ratings yet
Image Caption Generator Using AI: Review - 1
9 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Document From Deependra Singh
No ratings yet
Document From Deependra Singh
10 pages
Show and Tell: A Neural Image Caption Generator
No ratings yet
Show and Tell: A Neural Image Caption Generator
9 pages
Automatic Image Captioning Using Neural Networks
No ratings yet
Automatic Image Captioning Using Neural Networks
9 pages
Review 3
No ratings yet
Review 3
18 pages
Image Caption Generator Research Paper
No ratings yet
Image Caption Generator Research Paper
4 pages
DL Project Report
No ratings yet
DL Project Report
10 pages
Ijariie 26613
No ratings yet
Ijariie 26613
5 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Image Captioning Using Deep Stacked LSTMS, Contextual Word Embeddings and Data Augmentation
No ratings yet
Image Captioning Using Deep Stacked LSTMS, Contextual Word Embeddings and Data Augmentation
18 pages
Automated Image Captioning Using CNN and RNN
No ratings yet
Automated Image Captioning Using CNN and RNN
17 pages
Minor
No ratings yet
Minor
14 pages
Hybrid Image Captioning Model
No ratings yet
Hybrid Image Captioning Model
6 pages
DL Group 6 Rep
No ratings yet
DL Group 6 Rep
11 pages
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
No ratings yet
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
10 pages
Image Caption
No ratings yet
Image Caption
16 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Caption Generator Using Deep Learning: Guided by Dr. Ch. Bindu Madhuri, M Tech, PH.D
No ratings yet
Image Caption Generator Using Deep Learning: Guided by Dr. Ch. Bindu Madhuri, M Tech, PH.D
9 pages
Image Captioning Using R-CNN & LSTM Deep Learning Model
No ratings yet
Image Captioning Using R-CNN & LSTM Deep Learning Model
4 pages
Review 3
No ratings yet
Review 3
18 pages
TC4033 FinalQuiz 33
No ratings yet
TC4033 FinalQuiz 33
5 pages
Image Caption Generator
100% (1)
Image Caption Generator
20 pages
Image Captioning - A Deep Learning Approach
No ratings yet
Image Captioning - A Deep Learning Approach
4 pages
Automated Neural Image Caption Generator For Visually Impaired People
No ratings yet
Automated Neural Image Caption Generator For Visually Impaired People
6 pages
Project Report Image Captioning Models Prakhar Dhyani
No ratings yet
Project Report Image Captioning Models Prakhar Dhyani
8 pages
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
No ratings yet
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
10 pages
What Is The Role of Recurrent Neural Networks (RNNS) in An Image Caption Generator?
No ratings yet
What Is The Role of Recurrent Neural Networks (RNNS) in An Image Caption Generator?
10 pages
Liceria & Co.
No ratings yet
Liceria & Co.
16 pages
RP Springer
No ratings yet
RP Springer
10 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
Report On Text Classification Using CNN, RNN & HAN - Jatana - Medium
No ratings yet
Report On Text Classification Using CNN, RNN & HAN - Jatana - Medium
15 pages
Image Captioning
No ratings yet
Image Captioning
33 pages
Conference Paper A5
No ratings yet
Conference Paper A5
9 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Seminar Report Final
No ratings yet
Seminar Report Final
20 pages
Text-Image Embeddings With OpenAIs CLIP
No ratings yet
Text-Image Embeddings With OpenAIs CLIP
5 pages
CNN and RNN
No ratings yet
CNN and RNN
82 pages
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
No ratings yet
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
105 pages
English 10 LP 01
No ratings yet
English 10 LP 01
4 pages
Ontology: Motivations, Methodologies and Implementation
No ratings yet
Ontology: Motivations, Methodologies and Implementation
43 pages
Bise BWP 12th Class Result 2022 Gazette
No ratings yet
Bise BWP 12th Class Result 2022 Gazette
524 pages
Eng513 Most Repeated Question Final Term Praparation File 2025
No ratings yet
Eng513 Most Repeated Question Final Term Praparation File 2025
4 pages
Acr Onreading Month Celebration
No ratings yet
Acr Onreading Month Celebration
9 pages
Future Perfect Worksheet
No ratings yet
Future Perfect Worksheet
1 page
Relationship Matchmaker - Google Forms
No ratings yet
Relationship Matchmaker - Google Forms
1,332 pages
Cheat
No ratings yet
Cheat
10 pages
JLC ST
No ratings yet
JLC ST
21 pages
KidsBox AE TeachersBook3 Unit 1
No ratings yet
KidsBox AE TeachersBook3 Unit 1
16 pages
W2 Specifications Writing 1
100% (1)
W2 Specifications Writing 1
38 pages
Data Augmentation Approaches in Natural Language Processing A Survey
No ratings yet
Data Augmentation Approaches in Natural Language Processing A Survey
20 pages
John Dow - English Grammar in English
No ratings yet
John Dow - English Grammar in English
100 pages
Document
No ratings yet
Document
2 pages
Ict 103 Midterm Exam Test 1:: of Instructions That Tells The Computer What To Execute
No ratings yet
Ict 103 Midterm Exam Test 1:: of Instructions That Tells The Computer What To Execute
3 pages
Word Concordance of The Tanakh or The Hebrew Bible Hebrew Old Testament 1st Edition Muhammad Wolfgang G A Schmidt PDF Download
No ratings yet
Word Concordance of The Tanakh or The Hebrew Bible Hebrew Old Testament 1st Edition Muhammad Wolfgang G A Schmidt PDF Download
90 pages
Megan Rose Resume
No ratings yet
Megan Rose Resume
2 pages
Present Continuous Tense
No ratings yet
Present Continuous Tense
2 pages
Adjectives and Adverbs
No ratings yet
Adjectives and Adverbs
2 pages
الصف العاشر (الدرس الثاني - قدوتي) ..الفصل الدراسي الأول.ar.en
0% (1)
الصف العاشر (الدرس الثاني - قدوتي) ..الفصل الدراسي الأول.ar.en
9 pages
Story Teling
No ratings yet
Story Teling
3 pages
G3 Selective LessonPlan Unit2
No ratings yet
G3 Selective LessonPlan Unit2
20 pages
2024 Bi Ting 4 Kertas 4 Kertas Soalan
No ratings yet
2024 Bi Ting 4 Kertas 4 Kertas Soalan
6 pages
Grammar Tenses Master 2 Law
No ratings yet
Grammar Tenses Master 2 Law
3 pages
DB Topic 7
No ratings yet
DB Topic 7
30 pages
STUDY GUIDE No. 6
No ratings yet
STUDY GUIDE No. 6
8 pages
Angels of Men - Sustenance'N'Covering
No ratings yet
Angels of Men - Sustenance'N'Covering
123 pages
Avancemos Unidad 1.1.
No ratings yet
Avancemos Unidad 1.1.
15 pages
Actividades Tele-Trabajo Inglés
No ratings yet
Actividades Tele-Trabajo Inglés
3 pages

Practical 3

Uploaded by

Practical 3

Uploaded by

Theoretical Tasks

Image captioning can be regarded as an end-to-end Sequence problem, as it

You can read the post and Andrej Karpathy’s Architecture.

Task 3.1.2 - Explain the pros and cons of utilising Addition

Task 3.1.3 - Explain the pros and cons of utilising Multi-

Task 3.1.4 - Explain the pros and cons of utilising Attention

Task 3.1.5 - Explain the pros and cons of utilising Difference

This exercise involves implementing an image-captioning network. You can

As a base, you can use this example Image captioning in Pytorch.

The basic high-level steps to follow are:

Image Feature Extraction:

Sequence Model for Language Processing:

1 # Two networks, one for images and one for captions

Combining Image and Text Data:

2 combined_embeddings = you_choose_how([image_embedding, caption_embedding])

Where output is a standard FCC:

1 # For N classes, we are using Softmax

Training the Network:

You might also like