0% found this document useful (0 votes)

11 views6 pages

Image Captioning Techniques A Review

Uploaded by

harishkumbar.online

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views6 pages

Image Captioning Techniques A Review

Uploaded by

harishkumbar.online

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/364583341

Image Captioning Techniques: A Review

Conference Paper · July 2022

DOI: 10.1109/ICEMIS56295.2022.9914173

CITATIONS READS

8 248

3 authors, including:

Anbara Al-Jamal Mariam Jamal Bani-Amer

Jordan University of Science and Technology Jordan University of Science and Technology
1 PUBLICATION 8 CITATIONS 1 PUBLICATION 8 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Anbara Al-Jamal on 11 June 2024.

The user has requested enhancement of the downloaded file.

Image Captioning Techniques:
A Review

Anbara Z Al-Jamal Maryam J Bani-Amer Shadi Aljawarneh

Faculty of Computer and Information Faculty of Computer and Information Faculty of Computer and Information
Technology Technology Technology
Jordan University of Science and Jordan University of Science and Jordan University of Science and
2022 International Conference on Engineering & MIS (ICEMIS) | 978-1-6654-5436-0/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICEMIS56295.2022.9914173

Technology Technology Technology

Irbid, Jordan Irbid, Jordan Irbid, Jordan
[email protected] [email protected] [email protected]

as separated, relational, or vocabulary words, it depends

Abstract—Image captioning is the process of generating
on the algorithms used by developers or by the industrial
accurate and descriptive captions. As a recently emerged
research area, it is attracting more and more attention. To
sense with requirements. In biomedical industry captions,
achieve the goal of image captioning, semantic information of the images do not need to generate a complete sentence
images needs to be captured and expressed in natural languages. from the X-ray images, separate words are enough to
Image captions need to identify objects, actions, their detect or classify the disease in the image.
relationships, and some salient features that may be missing Image captioning has several applications in solving
from the image. After identification, the next step is to generate some real-world problems including providing aid to the
the most relevant and concise image description. This should be blind, autonomous cars, academic bots, and military
syntactically and semantically correct. Deep learning techniques purposes. To a large extent, the majority of image
can handle this process with CNNs and LSTMs. In this survey captioning success so far has been from the supervised
paper, we first talk about techniques used in early work that are domain whereby huge amounts of data consisting of
mainly retrieval and template-based. Then, we focus on neural
images and about two to five label captions describing the
network-based techniques, which offer contemporary results.
These techniques are in addition divided into subcategories
actions of the images are provided [3].
based on the specific framework. Each subcategory is discussed Handling images in artificial intelligence can be used
in detail. After that, state-of-the-art methods are compared on for object recognition [4], image classifications [5], action
benchmark datasets. Following that, discussions on future classification, attribute classification, and scene
research approaches are presented. recognition [6]. Image captioning is a collection of
processes that give a description for images starting from
reading the image to giving a textual caption for the image
Keywords—image captioning, machine learning, with various steps between these two stages. The
convolutional neural network, long short-term memory, remaining paper is outlined as follows: In Sec. 2 we give
deep learning, computer vision, natural language
an overview of deep neural network-based image
processing
captioning methods. In Sec.3 gives a literature review for
I. INTRODUCTION various works in this field. In Sec. 4 and Sec. 5 different
datasets and image caption evaluation metrics are
Automatic image captioning is the generation of a discussed. Finally, we give a comparison between
description for an input image in the natural language by encoder- decoder models used in different research [23]-
a machine and the relations between objects in the image [40].
[1]. Human beings can describe the things which they see.
However, it is not easy when it comes to machines. This
problem involves three disciplines of Artificial
Intelligence namely Computer Vision to identify the
objects, their relationships, and their attributes as a
Natural Language. Processing to monitor syntactic and
semantics, and Machine Learning to produce captions.
Images are captured using many devices such as mirrors,
cameras, telescopes, and others [2]. Views of two-
dimensional images generated can be understood and
described by the human brain, but when coming to
machines, images are recognized using the image caption
technique.
Image captioning is denoted to you as an image
entered into a computing device that can capture the view
in that image as text using machine learning, artificial Fig. 1. Image Captioning Process in Modern Techniques
intelligence, and a deep neural network. Captions may be

Authorized licensed use limited to: Jordan University of Science & Technology. Downloaded on June 11,2024 at 12:39:48 UTC from IEEE Xplore. Restrictions apply.
To understand the process of image captioning; we
III. LITERATURE REVIEW
should know that image captioning is more complex to
handle by traditional classification models such as zero Mao et al. [8] developed a multimodal Recurrent
rule, one rule, decision tree, etc. Computer Vision comes to Neural Network (m-RNN) model for generating novel
play a role in processing, analyzing, and interpreting visual image captions. It directly models the probability
data. The first step in designing image caption applications distribution of generating a word given previous words
is to collect a large number of images using different and an image. The vision part contains a deep CNN which
datasets like ImageNet, Flicker8k and other big datasets generates the image representation. The multimodal part
that provide a good number of images for this task and use connects the language model and the deep CNN together
them as a training data fed to the deep learning model such by a one-layer representation.
as Convolutional Neural Network CNN for feature For image caption models studies noted to use of
extraction from the image. As a second step using these language generated using deep learning algorithmsin ,
features to feed the model like theLong Short Term ,addition to using deep learning for extracted features
Memory LSTM model which in turn generates the image from images. Some researchers [11] used theCNN model
captions. as an image encoder and RNN as a decoder to generate
sentences. To get a good performance some researchers
use decoder models-multi encoder for image captioning.
II. DEEP LEARNING MODELS IN IMAGE CAPTIONING To get better performance of the image caption model
A. Convolutional Neural Network Cyganek et al. [4] try to increase the resolution of the
images to extract a high level of features before feeding
CNN [7] stands for Convolutional Neural Network; is an
the CNN as preprocessing data analysis step. As another
artificial deep learning neural network. It is used for image
preprocessing step; images can be resized and rechanged
classifications, computer vision, image recognition, and
their extension before the decoder process according to
object detection. CNN image classifications take an input
the selected decoder algorithm. The figure below shows
image, process it, and classify it under certain categories (E.g.,
this process, increasing the resolution of training and
Dog, Cat, etc.). Such scans images from left to right and top
testing images is an optional step in the image caption
to bottom to pull out important features from the image and
model. In [3] they care about the resolution of images in
combines the feature to classify images.
their study.
B. Recurrent Neural Network (LSTM) As part of a text generation in the model, most of the
LSTM [7] stands for Long Short-Term Memory; they text documents are available in the English language, but
are a type of RNN (recurrent neural network) that is well Pa Aung et al. [2] used text documents in the Myanmar
suited for sequence prediction problems. Based on the language as a new challenge in image caption generation
previous text, we can predict what the next word will be. in Myanmar using deep learning. BLEU scores and 10-
It has proven itself more effective than the traditional fold cross-validation were used as evaluation metrics.
RNN by overcoming the limitations of RNN which had They created the first corpus of image captioning for the
short-term memory. LSTM can carry out relevant Myanmar language, and manually checked and built the
information throughout the processing of inputs and with descriptions in detail to match captions and images. But it
a forget gate, it discards non-relevant information. was easy to use the English databases that were available
In image captioning, researchers use CNN and LSTM in public and use the image caption model to get the
as an encoder-decoder architecture in general, see Fig. 2. captions in English then convert these captions from
below: English to Myanmar language such as using APIs for
language translation provided by Google, instead of
wasting time in preparing their database.

IV. DATASETS
The development of this research area greatly depends
on the availability of large datasets that contain images
with corresponding descriptions. In addition to the size of
Fig. 2. Image Caption Model as Encoder-Decoder Architecture the dataset, an image captioning model benefits also
significantly from the quality of captions in the spirit of
natural language and their adaptation to a given task [10].
TABLE I. COMMON TRADITIONAL IMAGE CAPTIONING TECHNIQUES There are several publicly available datasets that are
useful for training image captioning tasks. The most
popular datasets include ImageNet [5], UIUC PASCAL
Method Functionality
[13], Flickr8k [14], Flickr30k [15], MSCOCO dataset
[16], PASCAL VOC [17], some datasets of images are
Encoder (CNN) Image feature extraction
available online for public to access and download to use
[2].
Decoder (LSTM) Language modeling
ImageNet is a dataset of over 15 million labeled high-
resolution images belonging to roughly 22,000
categories. The images were collected from the web and

Authorized licensed use limited to: Jordan University of Science & Technology. Downloaded on June 11,2024 at 12:39:48 UTC from IEEE Xplore. Restrictions apply.
labeled by human labelers using Amazon’s Mechanical V. EVALUATION METRICS
Turk crowd-sourcing tool. Starting in 2010, as part of the Human evaluations of machine translation are
Pascal Visual Object Challenge, an annual competition extensive but expensive. Human evaluations can take
called the ImageNet Large-Scale Visual Recognition months to finish and involve human labor that cannot be
Challenge (ILSVRC) has been held. ILSVRC uses a reused. Several metrics for automatic evaluation of
subset of ImageNet with roughly 1000 images in each of machine translation are proposed for quick, inexpensive
1000 categories. In all, there are roughly 1.2 million and language-independent, that correlates highly with
training images, 50,000 validation images, and 150,000 human evaluation. BLEU (Bilingual Evaluation
testing images. ILSVRC-2010 is the only version of Understudy) [19]: as a metric, it counts the number of
ILSVRC for which the test set labels are available, so this matching n-grams in the model’s prediction compared to
is the version on which we performed most of our the ground truth. With this, precision is calculated based
experiments. Since we also entered our model in the on the mean n-grams computed, and the recall is
ILSVRC-2012 competition, in Section 6 we report our computed via the introduction of a brevity penalty in the
results on this version of the dataset as well, for which test caption label.
set labels are unavailable. On ImageNet, it is customary
to report two error rates: top-1 and top-5, where the top-5 ROUGE (Recall-Oriented Understudy for Gisting
error, the rate is the fraction of test images for which the Evaluation) [20]: it is useful for summary evaluation and
correct label is not among the five labels considered most is calculated as the overlap of either 1-gram or bigrams
probable by the model. ImageNet consists of variable- between the referenced caption and the predicted
resolution images, while our system requires a constant sequence. Using the longest sequence available, the co-
input dimensionality. Therefore, we down-sampled the occurrence F-score mean of the predicted sequence’s
images to a fixed resolution of 256 × 256. Given a recall and prediction is obtained.
rectangular image, we first rescaled the image such that METEOR (Metric for Evaluation of Translation with
the shorter side was of length 256 and then cropped out Explicit Ordering) [21]: it addresses the drawback of
the central 256×256 patch from the resulting image. We BLEU, and it is based on a weighted F-score computation
did not pre-process the images in any other way, except as well as a penalty function meant to check the order of
for subtracting the mean activity over the training set from the candidate sequence. It adopts synonyms matching in
each pixel. So, we trained our network on the (centered) the detection of similarity between sentences.
raw RGB values of the pixels.
CIDEr (Consensus-based Image Description
UIUC PASCAL Sentences was one of the first image- Evaluation) [18]: it determines the consensus between a
caption datasets, consisting of 1,000 images and reference sequence and a predicted sequence via cosine
associated with five different descriptions collected via similarity, stemming, and TF-IDF weighting. -e predicted
crowdsourcing. It was used by early image captioning sequence is compared to the combination of all available
systems, but it is rarely used due to its limited domain, reference sequences.
small size, and relatively simple captions.
SPICE (Semantic Propositional Image Caption
Evaluation) [22]: it is a relatively new caption metric that
Flickr 30K includes and extends the previous Flickr relates to the semantic interrelationship between the
8K dataset. It consists of 31,783 images showing generated and referenced sequence. Its graph-based
everyday activities, events, and scenes described by methodology uses a scene graph of semantic
158,915 captions obtained via crowdsourcing. representations to indicate details of objects and their
interaction to describe their textual illustrations.
Microsoft COCO Captions dataset containing more
complex images of everyday objects and scenes. By
adding human-generated captions, two datasets were VI. DISCUSSION
created: c5 with five captions for each of the more than
300K images and an additional, c40 dataset with 40 Table 2 is a comparison between encoder- decoder models
different captions for the randomly chosen 5K images. used in different research. Language CNN helps to
The c40 was created because it was observed [18] that understand better word embeddings and better history word
some evaluation metrics benefit from more reference representations even with fewer data.
captions
PASCAL VOC: PASCAL Visual Object Classes TABLE II. ENCODER-DECODER COMPARISON
(VOC) is arguably the most popular semantic
segmentation dataset with 21 classes of predefined object Model Encod Deco
er der
labels, background included. -e dataset contains images model model
and annotations which could be used for detection, M. Bhalekar et al. [12] D- LST
classification, action classification, person layout, and CNN M
segmentation tasks. The dataset’s training, validation, and S.pa Aung et al. [2] VGG VGG
test set have 1464, 1449, and 1456 images, respectively. Oxford 16,
Yearly, the dataset has been used for public competitions net VGG
since 2005 CNN 19
with
Flickr 30K and MS COCO Captions are widely LST
accepted as benchmark datasets for image captioning by M
most models using deep neural networks. M. Zakir Hossain et al. [9] CNN LST
M

Authorized licensed use limited to: Jordan University of Science & Technology. Downloaded on June 11,2024 at 12:39:48 UTC from IEEE Xplore. Restrictions apply.
Y. Chu et al. [1] IJCAI Int. Jt. Conf. Artif. Intell., vol. 2015-January, pp. 4188–4192,
ResNe LST
2015.
t50 M/
Soft [15] P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, “From image
Attent descriptions to visual denotations: New similarity metrics for semantic
ion inference over event descriptions,” Trans. Assoc. Comput. Linguist.,
vol. 2, pp. 67–78, 2014, doi: 10.1162/tacl_a_00166.
VII. CONCLUSION [16] X. Chen et al., “Microsoft COCO Captions: Data Collection and
Evaluation Server,” pp. 1–7, 2015, [Online]. Available:
In this paper, we have discussed various image http://arxiv.org/abs/1504.00325.
captioning models. We have also presented the limitations [17] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A.
of the discussed approaches. Different evaluation metrics Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J.
are also presented and discussed. We have shown the Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010, doi: 10.1007/s11263-
results of various methods performed on Flickr8k, 009-0275-4.
Flickr30k, ImageNet, and Microsoft COCO datasets. In [18] R. Vedantam, C. L. Zitnick, and D. Parikh, “CIDEr: Consensus-based
image description evaluation,” Proc. IEEE Comput. Soc. Conf.
the future, models using reinforcement learning and Comput. Vis. Pattern Recognit., vol. 07-12-June-2015, pp. 4566–4575,
unsupervised learning will be highly acceptable for the 2015, doi: 10.1109/CVPR.2015.7299087.
captioning of natural scenes. Integration of textual content [19] C. Cormier, “Bleu,” Landscapes, vol. 7, no. 1, pp. 16–17, 2005, doi:
and visual regions will definitely enhance the image 10.3917/chev.030.0107.
captioning task to great extent. [20] G. Tsuchiya, “Postmortem Angiographic Studies on the Intercoronary
Arterial Anastomoses.: Report I. Studies on Intercoronary Arterial
Anastomoses in Adult Human Hearts and the Influence on the
REFERENCES Anastomoses of Strictures of the Coronary Arteries.,” Jpn. Circ. J., vol.
[1] Y. Chu, X. Yue, L. Yu, M. Sergei, and Z. Wang, “Automatic Image 34, no. 12, pp. 1213–1220, 1971, doi: 10.1253/jcj.34.1213.
Captioning Based on ResNet50 and LSTM with Soft Attention,” Wirel.
[21] S. Banerjee and A. Lavie, “METEOR: An automatic metric for mt
Commun. Mob. Comput., vol. 2020, 2020, doi:
evaluation with improved correlation with human judgments,” Intrinsic
10.1155/2020/8909458.
Extrinsic Eval. Meas. Mach. Transl. and/or Summ. Proc. Work. ACL
[2] S. Pa Pa Aung, W. Pa Pa, and T. L. Nwe, “Automatic Myanmar Image 2005, no. June, pp. 65–72, 2005.
Captioning using CNN and LSTM-Based Language Model,” Proc. 1st
[22] P. Anderson, B. Fernando, M. Johnson, and S. Gould, “SPICE:
Jt. Work. Spok. Lang. Technol. Under-resourced Lang. Collab.
Semantic propositional image caption evaluation,” Lect. Notes
Comput. Under-Resourced Lang., no. May, pp. 139–143, 2020,
Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
[Online]. Available: https://www.aclweb.org/anthology/2020.sltu-
Bioinformatics), vol. 9909 LNCS, pp. 382–398, 2016, doi:
1.19.
10.1007/978-3-319-46454-1_24.
[3] A. Oluwasammi et al., “Features to text: A comprehensive survey of
[23] S. Aljawarneh, V. Radhakrishna, and G. R. Kumar, “An imputation
deep learning on semantic segmentation and image captioning,”
measure for data imputation and disease classification of medical
Complexity, vol. 2021, 2021, doi: 10.1155/2021/5538927.
datasets,” in AIP Conference Proceedings, 2019, vol. 2146.
[4] M. Koziarski and B. Cyganek, “Impact of low resolution on image
[24] S. Aljawarneh and J. A. Lara, “Data science for analyzing and
recognition with deep neural networks: An experimental study,” Int. J.
improving educational processes,” J. Comput. High. Educ., vol. 33, no.
Appl. Math. Comput. Sci., vol. 28, no. 4, pp. 735–744, 2018, doi:
3, pp. 545–550, 2021.
10.2478/amcs-2018-0056.
[25] J. A. Lara, A. A. De Sojo, S. Aljawarneh, R. P. Schumaker, and B. Al-
[5] T. F. Gonzalez, “Handbook of approximation algorithms and
Shargabi, “Developing big data projects in open university engineering
metaheuristics,” Handb. Approx. Algorithms Metaheuristics, pp. 1–
courses: Lessons learned,” IEEE Access, vol. 8, pp. 22988–23001,
1432, 2007, doi: 10.1201/9781420010749.
2020.
[6] S. Bai and S. An, “A survey on automatic image caption generation,”
[26] S. Aljawarneh and J. A. Lara, “Editorial: Special Issue onQuality
Neurocomputing, vol. 311, pp. 291–304, 2018, doi:
Assessment and Management in Big Data-Part i,” ACM Trans. Embed.
10.1016/j.neucom.2018.05.080.
Comput. Syst., vol. 13, no. 2, 2021.
[7] A. A. Mohamed, “Image Caption using CNN & LSTM,” no. June,
[27] S. A. Aljawarneh, “Formulating models to survive multimedia big
2020.
content from integrity violation,” J. Ambient Intell. Humaniz. Comput.,
[8] J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, “Deep 2018.
captioning with multimodal recurrent neural networks (m-RNN),” 3rd
[28] S. A. Aljawarneh, R. Vangipuram, V. K. Puligadda, and J. Vinjamuri,
Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., vol. 1090,
“G-SPAMINE: An approach to discover temporal association patterns
no. 2014, pp. 1–17, 2015.
and trends in internet of things,” Futur. Gener. Comput. Syst., vol. 74,
[9] H. Sharma, M. Agrahari, S. K. Singh, M. Firoj, and R. K. Mishra, pp. 430–443, 2017.
“Image Captioning: A Comprehensive Survey,” 2020 Int. Conf. Power
[29] V. Chang, S. A. Aljawarneh, and C.-S. Li, “Special issue on ‘advances
Electron. IoT Appl. Renew. Energy its Control. PARC 2020, pp. 325–
in visual analytics and mining visual data,’” Expert Syst., vol. 37, no.
328, 2020, doi: 10.1109/PARC49193.2020.236619.
5, 2020.
[10] I. Hrga and M. Ivašic-Kos, “Deep image captioning: An overview,”
[30] J. A. Lara, S. Aljawarneh, and S. Pamplona, “Special issue on the
2019 42nd Int. Conv. Inf. Commun. Technol. Electron. Microelectron.
current trends in E-learning Assessment,” J. Comput. High. Educ., vol.
MIPRO 2019 - Proc., pp. 995–1000, 2019, doi:
32, no. 1, 2020.
10.23919/MIPRO.2019.8756821.
[31] J. A. Lara, J. Pazos, A. A. de Sojo, and S. Aljawarneh, “The Paternity
[11] Z. Shi, H. Liu, and X. Zhu, “Enhancing Descriptive Image Captioning
of the Modern Computer,” Found. Sci., 2021.
with Natural Language Inference,” ACL-IJCNLP 2021 - 59th Annu.
Meet. Assoc. Comput. Linguist. 11th Int. Jt. Conf. Nat. Lang. Process. [32] S. Aljawarneh, V. Radhakrishna, P. V. Kumar and V. Janaki, "A
Proc. Conf., vol. 2, pp. 269–277, 2021, doi: 10.18653/v1/2021.acl- similarity measure for temporal pattern discovery in time series data
short.36. generated by IoT," 2016 International Conference on Engineering &
MIS (ICEMIS), 2016, pp. 1-4, doi: 10.1109/ICEMIS.2016.7745355.
[12] M. Bhalekar, “D-CNN : A New model for Generating Image Captions
with Text Extraction Using Deep Learning for Visually Challenged [33] S. A. Aljawarneh, V. Radhakrishna and A. Cheruvu, "Extending the
Individuals,” vol. 12, no. 2, pp. 8366–8373, 2022. Gaussian membership function for finding similarity between temporal
patterns," 2017 International Conference on Engineering & MIS
[13] R. Vedantam, C. L. Zitnick, and D. Parikh, “Collecting Image
(ICEMIS), 2017, pp. 1-6, doi: 10.1109/ICEMIS.2017.8273100.
Description Datasets using Crowdsourcing,” 2014, [Online].
Available: http://arxiv.org/abs/1411.3041. [34] E. Ayoubi and S. Aljawarneh, “Challenges and opportunities of
adopting business intelligence in SMEs: Collaborative model,”
[14] M. Hodosh, P. Young, and J. Hockenmaier, “Framing image
in Proceedings of the First International Conference on Data Science,
description as a ranking task: Data, models and evaluation metrics,”
E-learning and Information Systems, 2018.

Authorized licensed use limited to: Jordan University of Science & Technology. Downloaded on June 11,2024 at 12:39:48 UTC from IEEE Xplore. Restrictions apply.
[35] M. N. Mouchili, S. Aljawarneh, and W. Tchouati, “Smart city data
analysis,” in Proceedings of the First International Conference on Data
Science, E-learning and Information Systems, 2018.
[36] A. Nagaraja, S. Aljawarneh, and P. H., “PAREEKSHA: A machine
learning approach for intrusion and anomaly detection,” in Proceedings
of the First International Conference on Data Science, E-learning and
Information Systems, 2018.
[37] B. K. Muslmani, S. Kazakzeh, E. Ayoubi, and S. Aljawarneh,
“Reducing integration complexity of cloud-based ERP systems,”
in Proceedings of the First International Conference on Data Science,
E-learning and Information Systems, 2018.
[38] M. N. Mouchili, J. W. Atwood, and S. Aljawarneh, “Call data record
based big data analytics for smart cities,” in Proceedings of the Second
International Conference on Data Science, E-Learning and Information
Systems - DATA ’19, 2019.
[39] S. Aljawarneh and M. Malhotra, Critical research on scalability and
security issues in virtual cloud environments. Hershey, PA: IGI Global,
2017.
[40] S. Aljawarneh and M. Malhotra, Impacts and Challenges of Cloud
Business Intelligence. Hershey, PA: Business Science Reference,
2020.

Authorized licensed use limited to: Jordan University of Science & Technology. Downloaded on June 11,2024 at 12:39:48 UTC from IEEE Xplore. Restrictions apply.

View publication stats

Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
100% (6)
Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
637 pages
Paper 17881
No ratings yet
Paper 17881
6 pages
Overview On Image Captioning Techniques
No ratings yet
Overview On Image Captioning Techniques
6 pages
Image Captioning Using R-CNN & LSTM Deep Learning Model
No ratings yet
Image Captioning Using R-CNN & LSTM Deep Learning Model
4 pages
Image Captionbot For Assistive Technology
No ratings yet
Image Captionbot For Assistive Technology
3 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Gray Scale Image Captioning Using CNN and LSTM
No ratings yet
Gray Scale Image Captioning Using CNN and LSTM
8 pages
A Comprehensive Guide To Deep Neural Network-Based Image Captions
No ratings yet
A Comprehensive Guide To Deep Neural Network-Based Image Captions
17 pages
Attention Based Image Caption Generation ABICG Using Encoder-Decoder Architecture
No ratings yet
Attention Based Image Caption Generation ABICG Using Encoder-Decoder Architecture
9 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
8 pages
Image Caption Generator
No ratings yet
Image Caption Generator
6 pages
Image Captioning Generator Using CNN and LSTM
No ratings yet
Image Captioning Generator Using CNN and LSTM
8 pages
Image To Caption Generator
No ratings yet
Image To Caption Generator
7 pages
Image Captioning - A Deep Learning Approach Using CNN and LSTM Network
No ratings yet
Image Captioning - A Deep Learning Approach Using CNN and LSTM Network
6 pages
He 2017
No ratings yet
He 2017
8 pages
Ref 12
No ratings yet
Ref 12
7 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
TSP CMC 53245
No ratings yet
TSP CMC 53245
18 pages
A Comprehensive Survey of Deep Learning For Image Captioning
No ratings yet
A Comprehensive Survey of Deep Learning For Image Captioning
36 pages
10 - 22-A Thorough Review of Models, Evaluation Metrics, and Datasets On Image Captioning
No ratings yet
10 - 22-A Thorough Review of Models, Evaluation Metrics, and Datasets On Image Captioning
23 pages
Research Paper Final
No ratings yet
Research Paper Final
5 pages
Fin Irjmets1681386363
No ratings yet
Fin Irjmets1681386363
5 pages
Bai 2018
No ratings yet
Bai 2018
14 pages
PGCON Paper Final
No ratings yet
PGCON Paper Final
4 pages
A Survey On Automatic Image Caption Generation: Neurocomputing May 2018
No ratings yet
A Survey On Automatic Image Caption Generation: Neurocomputing May 2018
17 pages
6 - 23 - Deep Learning Approaches On Image Captioning A Review
No ratings yet
6 - 23 - Deep Learning Approaches On Image Captioning A Review
41 pages
IJCRT2310418
No ratings yet
IJCRT2310418
8 pages
ch13 Linear Factor Models
No ratings yet
ch13 Linear Factor Models
33 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
Image Caption Generation Research Paper
No ratings yet
Image Caption Generation Research Paper
9 pages
Image Captioning Synopsis
No ratings yet
Image Captioning Synopsis
17 pages
Image Caption Generation Research Paper
No ratings yet
Image Caption Generation Research Paper
8 pages
Conference Paper A5
No ratings yet
Conference Paper A5
9 pages
IJNRD2309143
No ratings yet
IJNRD2309143
11 pages
Research Paper - Virtual Assistant
No ratings yet
Research Paper - Virtual Assistant
15 pages
DW & Caption Generator - Paper 1
No ratings yet
DW & Caption Generator - Paper 1
6 pages
(Ankitveer)
No ratings yet
(Ankitveer)
18 pages
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
No ratings yet
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
8 pages
Image Caption
No ratings yet
Image Caption
8 pages
DL Group 6 Rep
No ratings yet
DL Group 6 Rep
11 pages
Image Captioning Using Deep Learning Mait
No ratings yet
Image Captioning Using Deep Learning Mait
8 pages
Image Caption Bot With Keras and Speech Generation For
No ratings yet
Image Caption Bot With Keras and Speech Generation For
7 pages
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
No ratings yet
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
6 pages
Report 1
No ratings yet
Report 1
34 pages
Major Report Final
No ratings yet
Major Report Final
40 pages
Image Captioning
No ratings yet
Image Captioning
8 pages
Image Captioning - A Deep Learning Approach
No ratings yet
Image Captioning - A Deep Learning Approach
4 pages
Annu Maria-Introduction To Modelling and Simulation
0% (1)
Annu Maria-Introduction To Modelling and Simulation
7 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
Image Caption Generator Using AI: Review - 1
No ratings yet
Image Caption Generator Using AI: Review - 1
9 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
DL - Review of Research Papers - Image - Caption - Generation
No ratings yet
DL - Review of Research Papers - Image - Caption - Generation
34 pages
Applsci 13 11103 v2
No ratings yet
Applsci 13 11103 v2
38 pages
Mini Project Fln..
No ratings yet
Mini Project Fln..
51 pages
Synopsis May 2024 (Pradeep, Vikas) - 1
No ratings yet
Synopsis May 2024 (Pradeep, Vikas) - 1
14 pages
Research Paper of Generating Caption From Image
No ratings yet
Research Paper of Generating Caption From Image
5 pages
Two Tier LSTM Model
No ratings yet
Two Tier LSTM Model
13 pages
Image Caption Generation Using Deep Neural Networks
No ratings yet
Image Caption Generation Using Deep Neural Networks
3 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
Mini Project Final
No ratings yet
Mini Project Final
27 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
Module 3: Linear Programming: Graphical Method: Learning Outcomes
No ratings yet
Module 3: Linear Programming: Graphical Method: Learning Outcomes
9 pages
Classification Applications With Deep Learning and Machine Learning Technologies
100% (1)
Classification Applications With Deep Learning and Machine Learning Technologies
287 pages
Stodola
No ratings yet
Stodola
3 pages
Nis
100% (1)
Nis
13 pages
Digital Communication I Lecture # 3 Pulse Code Modulation (PCM) PCM, Noise in Pulse Code Modulation.
No ratings yet
Digital Communication I Lecture # 3 Pulse Code Modulation (PCM) PCM, Noise in Pulse Code Modulation.
23 pages
DAA R19 - All Units
No ratings yet
DAA R19 - All Units
219 pages
Q-State Potts Model - Theory and Simulat PDF
No ratings yet
Q-State Potts Model - Theory and Simulat PDF
13 pages
Earley Parsing PDF
No ratings yet
Earley Parsing PDF
27 pages
Daa QB
No ratings yet
Daa QB
2 pages
Generative AI Lifecycle Patterns. Part 2 - Maturing GenAI - Patterns - by Ali Arsanjani - Sep, 2023 - Medium
No ratings yet
Generative AI Lifecycle Patterns. Part 2 - Maturing GenAI - Patterns - by Ali Arsanjani - Sep, 2023 - Medium
24 pages
Module 1 QB PDF
No ratings yet
Module 1 QB PDF
2 pages
Ee 708 Report
No ratings yet
Ee 708 Report
3 pages
CH 01
No ratings yet
CH 01
11 pages
Lecture 1 Course Contents
No ratings yet
Lecture 1 Course Contents
7 pages
CFM - Programming Task
No ratings yet
CFM - Programming Task
10 pages
Anyasor C Applied Machine Learning. A Practical Guide From Novice To Pro 2024
No ratings yet
Anyasor C Applied Machine Learning. A Practical Guide From Novice To Pro 2024
322 pages
Goertzel's Algorithm
No ratings yet
Goertzel's Algorithm
4 pages
Sta150 Contoh
No ratings yet
Sta150 Contoh
10 pages
Journal of Forecasting - 2024 - Lei - Volatility Forecasting For Stock Market in
No ratings yet
Journal of Forecasting - 2024 - Lei - Volatility Forecasting For Stock Market in
25 pages
978 0 7503 3395 5.preview
No ratings yet
978 0 7503 3395 5.preview
26 pages
Karatsuba Algorithm
No ratings yet
Karatsuba Algorithm
3 pages
Power Point Presentation On-: Array Based Applications in C Language
No ratings yet
Power Point Presentation On-: Array Based Applications in C Language
20 pages
Higher Non-Calculator Mark Scheme
No ratings yet
Higher Non-Calculator Mark Scheme
4 pages
Risk Analysis
No ratings yet
Risk Analysis
4 pages
Cse330:Competitive Coding Approaches-Techniques: Course Outcomes
No ratings yet
Cse330:Competitive Coding Approaches-Techniques: Course Outcomes
2 pages
EE 203 Syllabus
No ratings yet
EE 203 Syllabus
2 pages
DAMT Formulas
No ratings yet
DAMT Formulas
1 page
Training Facility Norms and Standard Equipment Lists: Volume 1---Precision Engineering or Machining
From Everand
Training Facility Norms and Standard Equipment Lists: Volume 1---Precision Engineering or Machining
Fook Yen Chong
No ratings yet
Training Facility Norms and Standard Equipment Lists: Volume 2---Mechatronics Technology
From Everand
Training Facility Norms and Standard Equipment Lists: Volume 2---Mechatronics Technology
Fook Yen Chong
No ratings yet

Image Captioning Techniques A Review

Uploaded by

Image Captioning Techniques A Review

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Image Captioning Techniques: A Review

Conference Paper · July 2022

Anbara Al-Jamal Mariam Jamal Bani-Amer

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Anbara Z Al-Jamal Maryam J Bani-Amer Shadi Aljawarneh

Technology Technology Technology

as separated, relational, or vocabulary words, it depends

978-1-6654-5436-0/22/$31.00 ©2022 IEEE

View publication stats

You might also like