0% found this document useful (0 votes)

227 views

Image Captioning Using CNN and LSTM

The document discusses image captioning using a CNN-LSTM architecture. It involves using a CNN to extract visual features from images, and an LSTM to generate natural language captions describing the images. Specifically, it uses a pre-trained ResNet50 CNN to extract features from images, and an LSTM language model to translate those features into English captions. It provides details on how the CNN-LSTM model is built and trained on a dataset containing images and corresponding text descriptions, to learn to generate captions for new images.

Uploaded by

Tsegazewold Kinfu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

227 views

Image Captioning Using CNN and LSTM

Uploaded by

Tsegazewold Kinfu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Image captioning using CNN and LSTM

What is image captioning?

Image caption generator is a task that involves computer vision and natural language processing
concepts to recognize the context of an image and describe them in a natural language like English.

In this project we will be using the concept of CNN and LSTM and build a model of Image Caption
Generator which involves the concept of computer vision and Natural Language Process to
recognize the context of images and describe them in natural language like English.

The task of image captioning can be divided into two modules logically: -

Image based mode Extracts the features of our image we use CNN.

Language based model which translates the features and objects extracted by our image-based
model to a natural sentence we use LSTM.

What is CNN?

CNN is a subfield of Deep learning and specialized deep neural networks used for the recognition
and classification of images. It is used to process the data represented as a 2D matrix like images.
It can deal with scaled, translated, and rotated imagery. It analyzes the visual imagery by scanning
them from left to right and top to bottom and extracting relevant features from that. Finally, it
combines all the features for image classification.

What is LSTM

Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN)
capable of learning order dependence in sequence prediction problems. This is most commonly
used in complex problems like Machine Translation, Speech Recognition, and many more.

The reason behind developing LSTM was, when we go deeper into a neural network if the
gradients are very small or zero, then little to no training can take place, leading to poor predictive
performance and this problem was encountered when training traditional RNNs. LSTM networks
are well-suited for classifying, processing, and making predictions based on time series data since
there can be lags of unknown duration between important events in a time series.
LSTM is way more effective and better compared to the traditional RNN as it overcomes the short
term memory limitations of the RNN. LSTM can carry out relevant information throughout the
processing of inputs and discards non-relevant information with a forget gate.

CNN-LSTM ARCHITECTURE:

The CNN-LSTM architecture involves using CNN layers for feature extraction on input data
combined with LSTMs to support sequence prediction. This model is specifically designed for
sequence prediction problems with spatial inputs.

CNN LSTM
input Dense Output
model model

Building the Image Caption Generator

Pre-requests

We use Jupyter notebooks to run our caption generator. and install the following library.

pip install TensorFlow

pip install Keras
pip install pillow
pip install NumPy
Pip install tqdm

Import all the required packages

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import keras
import re
import nltk
from nltk.corpus import stopwords
import string
import json
from time import time
import pickle
from keras.applications.vgg16 import VGG16
#from keras.applications.resnet50 import ResNet50
#import tensorflow.keras.applications.ResNet50
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.inception_v3 import
preprocess_input, decode_predictions
#from keras.applications import preprocess_input,
decode_predictions
from keras.preprocessing import image
from keras.models import Model, load_model
from keras.preprocessing.sequence import pad_sequences
#from keras.utils import to_categorical
from tensorflow.keras.utils import to_categorical,plot_model
from keras.layers import Input, Dense, Dropout, Embedding,
LSTM
from keras.layers.merge import add
Prepare Text Data

The dataset contains multiple descriptions for each photograph and the text of the descriptions
requires some minimal cleaning. First, we will load the file containing all of the descriptions we
have 600 number of image description.

Each photo has a unique identifier. This identifier is used on the photo filename and in the text file
of descriptions. Next, we will step through the list of photo descriptions. Each photo identifier
maps to a list of textual descriptions
Next, we need to clean the description text. The descriptions are already tokenized and easy to
work with. We will clean the text in the following ways in order to reduce the size of the vocabulary
of words we will need to work with:

• Convert all words to lowercase.

• Remove all punctuation.
• Remove all words that are one character or less in length (e.g. ‘a’).
• Remove all words with numbers in them.

Loading dataset for model training and testing

Transfer Learning

Images -> Features Text -> Features

Using ResNet50 to extract features which is already trained on ImageNet. Resnet50 is very deep
model it has 50 layers with skip connection they don’t have suffer from Vanishing
Gradient.ResNe50 is not a sequential model it can skip connections.

Preprocessing the image:

For image detection, we are using a pre-trained model called Visual Geometry Group (VGG16).
VGG16 is already installed in the Keras library. For feature extraction, the image features are in
224*224 size. The features of the image are extracted just before the last layer of classification as
this is the model used to predict a classification for a photo. We are not interested in classifying
images, hence we excluded the last layer
Training the mode
TESTING THE MODEL:

Now that the model has been trained, we can now test the model against random images. The
predictions contain the max length of index values so we will use the same tokenizer.pkl to get the
words from their index values.

Transformers For Machine Learning A Deep Dive (Uday Kamath, Kenneth L. Graham, Wael Emara)
100% (12)
Transformers For Machine Learning A Deep Dive (Uday Kamath, Kenneth L. Graham, Wael Emara)
284 pages
Deep Learning with Python 2nd Edition François Chollet - The complete ebook version is now available for download
No ratings yet
Deep Learning with Python 2nd Edition François Chollet - The complete ebook version is now available for download
28 pages
Machine Learning - An Applied Mathematics Introduction PDF
100% (13)
Machine Learning - An Applied Mathematics Introduction PDF
246 pages
Geometry of Deep Learning - Ye
100% (3)
Geometry of Deep Learning - Ye
338 pages
The Python Bible
97% (31)
The Python Bible
506 pages
Understanding Machine Learning
100% (69)
Understanding Machine Learning
416 pages
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
100% (7)
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
530 pages
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
100% (14)
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
210 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (18)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
93% (14)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
100% (15)
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
438 pages
Machine Learning
100% (11)
Machine Learning
135 pages
Machine Learning With Python
100% (14)
Machine Learning With Python
692 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
Artificial Intelligence (Computer Vision) : by Dr. Sehat Ullah Department of Computer Science & IT University of Malakand
No ratings yet
Artificial Intelligence (Computer Vision) : by Dr. Sehat Ullah Department of Computer Science & IT University of Malakand
35 pages
Vehicle Detection and Tracking
No ratings yet
Vehicle Detection and Tracking
11 pages
Computer Vision55
100% (1)
Computer Vision55
268 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Fall 2018 Midterm Examination: Questions 1 2 3 4 5 6 7 Total Points 10 10 10 10 10 25 25 100 Score
No ratings yet
Fall 2018 Midterm Examination: Questions 1 2 3 4 5 6 7 Total Points 10 10 10 10 10 25 25 100 Score
8 pages
Image Captioning
No ratings yet
Image Captioning
33 pages
Image Caption Generator
No ratings yet
Image Caption Generator
13 pages
Image Caption Generator
100% (1)
Image Caption Generator
20 pages
Abstractive Text Summarization Using Deep Learning
No ratings yet
Abstractive Text Summarization Using Deep Learning
43 pages
Object Detection
No ratings yet
Object Detection
57 pages
Image Caption Generator
No ratings yet
Image Caption Generator
69 pages
Project
100% (1)
Project
30 pages
Face Detection and Smile Detection
No ratings yet
Face Detection and Smile Detection
8 pages
Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)
No ratings yet
Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)
63 pages
Sign Language Recognition Using Deep Learning
No ratings yet
Sign Language Recognition Using Deep Learning
6 pages
Car Make and Model Recognition Using Ima
No ratings yet
Car Make and Model Recognition Using Ima
8 pages
MalenoV Code 5 Layer CNN 65x65x65 Voxels
No ratings yet
MalenoV Code 5 Layer CNN 65x65x65 Voxels
30 pages
Colour Detection
No ratings yet
Colour Detection
6 pages
What Is The Need For Residual Learning?
No ratings yet
What Is The Need For Residual Learning?
3 pages
Yolo
No ratings yet
Yolo
10 pages
Computer Vision
No ratings yet
Computer Vision
4 pages
Forest Fire Detection Using Computer Vision
No ratings yet
Forest Fire Detection Using Computer Vision
30 pages
Object Detection - Deep Learning: Jamia Hamdard
No ratings yet
Object Detection - Deep Learning: Jamia Hamdard
26 pages
Object Detection Using Yolo
No ratings yet
Object Detection Using Yolo
42 pages
Project
No ratings yet
Project
15 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
Emotion Detection
No ratings yet
Emotion Detection
23 pages
Real Time Bangladeshi License Plate Detection & Recognition: Submitted by
No ratings yet
Real Time Bangladeshi License Plate Detection & Recognition: Submitted by
25 pages
Emotion Detection
No ratings yet
Emotion Detection
17 pages
TS25 - Deep Learning
No ratings yet
TS25 - Deep Learning
12 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
The COMPLETE TRUTH About AI Agents (2024)
No ratings yet
The COMPLETE TRUTH About AI Agents (2024)
32 pages
Computer Vision: Dr. Sanjay Jain Associate Professor, CSA
No ratings yet
Computer Vision: Dr. Sanjay Jain Associate Professor, CSA
8 pages
Analyze The Types of Artificial Intelligence: 1. Reactive Machines
No ratings yet
Analyze The Types of Artificial Intelligence: 1. Reactive Machines
2 pages
320 Cohort 9 Report Final
No ratings yet
320 Cohort 9 Report Final
46 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
Deep Learning and Computer Vision For Video Analytics
No ratings yet
Deep Learning and Computer Vision For Video Analytics
37 pages
Object Detection and Recognition System (Using TensorFlow)
No ratings yet
Object Detection and Recognition System (Using TensorFlow)
8 pages
R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
No ratings yet
R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
11 pages
Real-Time Traffic Sign and Light Recognition System For ADAS
No ratings yet
Real-Time Traffic Sign and Light Recognition System For ADAS
18 pages
Object Detector For Blind Person
No ratings yet
Object Detector For Blind Person
20 pages
Slides Deep Learning On AWS With NVIDIA From Training To Deployment
No ratings yet
Slides Deep Learning On AWS With NVIDIA From Training To Deployment
48 pages
Accelerate Computing Vision and Image Processing Using VPI 1.1 by Rodolfo Lima
No ratings yet
Accelerate Computing Vision and Image Processing Using VPI 1.1 by Rodolfo Lima
23 pages
Video Classification Using Deep Learning For Video Providers Project Report
No ratings yet
Video Classification Using Deep Learning For Video Providers Project Report
36 pages
A Review of Vehicle Detection Techniques For Intelligent Vehicles
No ratings yet
A Review of Vehicle Detection Techniques For Intelligent Vehicles
21 pages
Intelligent Vision Systems For Industry
No ratings yet
Intelligent Vision Systems For Industry
473 pages
CS231A Course Notes 1: Camera Models: Kenji Hata and Silvio Savarese
No ratings yet
CS231A Course Notes 1: Camera Models: Kenji Hata and Silvio Savarese
17 pages
Chapter 7 - Neural-Networks
100% (1)
Chapter 7 - Neural-Networks
60 pages
Predicting Hourly Boarding Demand of Bus Passengers 3.6.2
100% (1)
Predicting Hourly Boarding Demand of Bus Passengers 3.6.2
81 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
Autonomous Robotic Systems
No ratings yet
Autonomous Robotic Systems
22 pages
Report About Neural Network For Image Classification
No ratings yet
Report About Neural Network For Image Classification
51 pages
Image Processing With CUDA
No ratings yet
Image Processing With CUDA
66 pages
Computer Vision
No ratings yet
Computer Vision
18 pages
Image Captioning Using CNN & LSTM: Digital Signal Processing Laboratory (EEE - 316)
No ratings yet
Image Captioning Using CNN & LSTM: Digital Signal Processing Laboratory (EEE - 316)
24 pages
AE - IEEE - REPORT - 01fe20bei040
No ratings yet
AE - IEEE - REPORT - 01fe20bei040
5 pages
BTP Report
No ratings yet
BTP Report
27 pages
Implementation_of_Simple_and_Efficient_P
No ratings yet
Implementation_of_Simple_and_Efficient_P
8 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (9)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
TensorFlow For Machine Intelligence
100% (26)
TensorFlow For Machine Intelligence
305 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
100% (6)
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
368 pages
Machine Learning Projects Python
94% (18)
Machine Learning Projects Python
134 pages
Deep Learning in Natural Language Processing PDF
100% (9)
Deep Learning in Natural Language Processing PDF
338 pages
Deep Learning With PyTorch Guide For Beginners and Intermediate
100% (7)
Deep Learning With PyTorch Guide For Beginners and Intermediate
120 pages
Machine Learning Paradigms
100% (10)
Machine Learning Paradigms
336 pages
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
91% (11)
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
166 pages
Deep Learning For NLP and Speech Recogni
100% (5)
Deep Learning For NLP and Speech Recogni
640 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
Hackers Guide To Machine Learning With Python PDF
100% (14)
Hackers Guide To Machine Learning With Python PDF
272 pages
Deep Learning in Computer Vision - Principles and Applications
100% (3)
Deep Learning in Computer Vision - Principles and Applications
339 pages
Python Machine Learning Workbook For Beginners
No ratings yet
Python Machine Learning Workbook For Beginners
264 pages
Machine Learning - The Mastery Bible - The Definitive Guide To Machine Learning Data Science PDF
100% (4)
Machine Learning - The Mastery Bible - The Definitive Guide To Machine Learning Data Science PDF
331 pages
Design & Analysis of Algorithms: Bits, Pilani - K. K. Birla Goa Campus
No ratings yet
Design & Analysis of Algorithms: Bits, Pilani - K. K. Birla Goa Campus
27 pages
Ch1 Slides Modified SEU
No ratings yet
Ch1 Slides Modified SEU
37 pages
2023 - Midterm 2 Solution - Spring - AI
No ratings yet
2023 - Midterm 2 Solution - Spring - AI
5 pages
Assignment 1
No ratings yet
Assignment 1
1 page
Camintac Essay - Nubbh Kejriwal
No ratings yet
Camintac Essay - Nubbh Kejriwal
4 pages
Binomial Theorem Class 11
No ratings yet
Binomial Theorem Class 11
13 pages
Recurrent Neural Network Applications
No ratings yet
Recurrent Neural Network Applications
16 pages
CST401 Artificial Intelligence, May 2024
No ratings yet
CST401 Artificial Intelligence, May 2024
4 pages
Asymptotic Notations and Complexity Analysis
No ratings yet
Asymptotic Notations and Complexity Analysis
33 pages
Assignment No 1-2010
100% (1)
Assignment No 1-2010
7 pages
Introduction To Deep Learning Assignment 0: September 2023
No ratings yet
Introduction To Deep Learning Assignment 0: September 2023
3 pages
Ap7101 Advanced Digital Signal Processing
No ratings yet
Ap7101 Advanced Digital Signal Processing
1 page
Numerical Methods MCQ
No ratings yet
Numerical Methods MCQ
15 pages
Lab 8: Line Coding Techniques: Objective
No ratings yet
Lab 8: Line Coding Techniques: Objective
10 pages
Daa Kcs503 2021-22 Aktu Qpaper Sol
No ratings yet
Daa Kcs503 2021-22 Aktu Qpaper Sol
40 pages
Deep Learning - IIT Ropar - Unit 6 - Week 3
No ratings yet
Deep Learning - IIT Ropar - Unit 6 - Week 3
4 pages
MT390 (DIP) : Tutorial 3 Intensity Transformations and Spatial Filtering
No ratings yet
MT390 (DIP) : Tutorial 3 Intensity Transformations and Spatial Filtering
54 pages
Lecture 5 1 Lagrange Interpolation
No ratings yet
Lecture 5 1 Lagrange Interpolation
19 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
Nptel Notes 4
No ratings yet
Nptel Notes 4
12 pages
Three-Step Implicit Block Method For Second Order Odes: E.O. Adeyefa, F.L. Joseph and O.D. Ogwumu
No ratings yet
Three-Step Implicit Block Method For Second Order Odes: E.O. Adeyefa, F.L. Joseph and O.D. Ogwumu
5 pages
VU C5 T P Data Structure 2021
No ratings yet
VU C5 T P Data Structure 2021
4 pages
Grading Rubric
No ratings yet
Grading Rubric
1 page
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
30 pages
Ankit(Cbnst)
No ratings yet
Ankit(Cbnst)
23 pages
Deep Learning Based Brain Tumor Classification and
No ratings yet
Deep Learning Based Brain Tumor Classification and
12 pages
Satellite Instructions
No ratings yet
Satellite Instructions
3 pages
Methods Comparison: Analytical Heun's vs. Analytical Euler vs. Analytical Euler vs. Analytical
No ratings yet
Methods Comparison: Analytical Heun's vs. Analytical Euler vs. Analytical Euler vs. Analytical
10 pages

Image Captioning Using CNN and LSTM

Uploaded by

Image Captioning Using CNN and LSTM

Uploaded by

Image captioning using CNN and LSTM

What is image captioning?

Building the Image Caption Generator

pip install TensorFlow

Import all the required packages

• Convert all words to lowercase.

Loading dataset for model training and testing

Images -> Features Text -> Features

Preprocessing the image:

You might also like