Image Caption Generation
Image Caption Generation
Image Caption Generation
net/publication/377852268
CITATION READS
1 768
1 author:
Jameer Kotwal
Dr. D. Y. Patil Institute of Engineering & Technology
18 PUBLICATIONS 147 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jameer Kotwal on 01 February 2024.
Fig.3 Fig.4
XIII. FUTURE SCOPE
In our paper we have explained about generating captions for the images. Even though deep learning is
advanced upto now exact caption generation is not possible due to many reasons like hard ware requirements
problem, no proper programming logic or model to generate the exact captions because machines cannot think
or make decisions as accurately as human do. So in future with the advancement of hardware and deep learning
models we hope to generate captions with higher accuracy. It is also thought to extend this model and build
complete Image-Speech conversion by converting captions of images to speech. This is very much helpful for
blind people.
XIV. CONCLUSION
Image captioning deep learning model is proposed in this paper. We have used RESNET-LSTM model to
generate captions for each of the given image. The Flickr 8k data set has been used for the purpose of training
the model. RESNET is the architecture of convolution layer. This RESNET architecture is used for extracting the
image features and this image features are given as input to Long Short Term Memory units and captions are
generated with the help of vocabulary generated during the training process. We can conclude that this ResNet-
LSTM model has higher accuracy compared to CNN-RNN and VGG Model. This model works efficiently when we
run the model with the help of Graphic Processing Unit. This Image Captioning deep learning model is very