0% found this document useful (0 votes)

39 views8 pages

A Comparative Analysis of Attention Mechanism in RNN-LSTMs For Improved Image Captioning Performance

Image captioning which links computer vision with NATURAL LANGUAGE PROCESSING is critical in providing descriptions for the image. The proposed solution in this research is a hierarchical attention model which includes use of CNN features on images and LSTM networks with attention mechanisms for generating captions. By utilizing both object level and image level features, our method enhances the quality and relevance of captions, enhancing the variability of the automated image description.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views8 pages

A Comparative Analysis of Attention Mechanism in RNN-LSTMs For Improved Image Captioning Performance

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

A Comparative Analysis of Attention

Mechanism in RNN-LSTMs for Improved
Image Captioning Performance
Mehwish Mirza1 Muhammad Talha Siddiqui2
Department of Mechatronics Engineering Department of Robotics and Mechatronics Engineering
Mehran University of Engineering and Technology University of Genoa, Italy

Abstract:- Image captioning which links computer vision The specialized arrangement of neural networks,
with NATURAL LANGUAGE PROCESSING is critical in characterized by an assortment of interconnected “non-linear
providing descriptions for the image. The proposed functions”, differs greatly from that of standard algorithms. In
solution in this research is a hierarchical attention model contrast to conventional techniques, which depend on
which includes use of CNN features on images and LSTM inflexible and pre-structured guidelines, neural networks adapt
networks with attention mechanisms for generating by learning from data and tuning their parameters through
captions. By utilizing both object level and image level many layers to address complicated problems. This makes
features, our method enhances the quality and relevance of them quite successful in domains which cannot be dealt with
captions, enhancing the variability of the automated image logical problem solving as in speech recognition, image
description. recognition, story writing and even music.

Keywords:- Image Captioning, Deep Learning, Artificial For image captioning, applying Deep neural networks
Intelligence, Natural language Processing. such as CNNs for visual feature extraction and RNN-LSTMs
with attention mechanism for language generation helps for
I. INTRODUCTION image captioning which is not only descriptive but also
intelligent. While putting all these sequences into action, the
The use of neural networks has revolutionized the model keeps modifying its grasp of the image as each word of
practice of image classification, resulting in great progress in the sequence is executed, leading to image captions that
artificial intelligence as well as computer vision. There is a encapsulate most of the aspects of the image content and
trend, however, as these systems develop, that the researchers context. This outlines the possibility of producing sensible and
rather seek for even complicated usage that go beyond what cohesive captions, thus eliminating the language-cognition
ever machines are able to do. One of the fundamental aspects barrier, which serves as a link between images and language.
in this quest is the viewing and the rendering of images and
videos, which goes beyond the conventional image object The aim of this research paper is the development of
recognition tasks to producing a coherent natural language automated image captioning systems that combine within
description of the visual content. This development resonates themselves the technologies of computer vision and natural
to the increasing aspirations in the context of AI which is to language processing as this direction becomes more and more
have machines perceiving and reporting about the demanded. Since the problem of image captioning is the
environment as human beings do. contextualization of visual data, a further probing of attention
mechanisms in Long Short Term Memory Recurrent Neural
In this research paper, we propose a novel approach to Networks warrants the study. This study endeavors to enhance
the image captioning challenge which aims at describing the a strong hierarchical attention network by fusing local object
main settings and events presented in photographs without parts with the global features obtained from Convolutional
human assistance. Image captioning is considered to be an Neural Networks. The intention is to build a system that will
extremely difficult task because it does not only involve not only be able to correctly render these captions in relation
detecting objects, but also providing information about the to the content but also settle the issues surrounding the
relations of the objects to each other and their surroundings relationship between local and global features.
and many other things that are often hard for people with good
visual memories.

IJISRT24OCT678 www.ijisrt.com 1341

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

Fig 1 Schematic Diagram of the Project

(Author’s Self-Generated).

II. METHODOLOGY through a VGG16 model that is already trained, whereby all
the images were converted to fixed length vectors for the
 Data Set Overview model to read.
This research employs the Kaggle-sourced Flickr30k
dataset. This dataset allows training on the computer due to its  Caption Pre-Processing
size and content variation that makes it research-friendly. In In preparing the captions, we also introduced a start of
this dataset, there are a total of 31,783 images and each image caption token and end of caption token to indicate the
is appended with five captions to enable the efficient training beginning and end of any prepared caption thus helping the
of the model. The preprocessing stage includes lower casing model in creating text sequences whose grammar makes sense.
of all the text, punctuation, stop words, numbers and extra This method is useful for ensuring that the captions logically
whitespaces eradication. The image features were obtained make sense.

Fig 2 Caption Pre-Processing

(Author’s Self Generated)

IJISRT24OCT678 www.ijisrt.com 1342

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

 Data Splitting  Data Generator

The division of the dataset into training set (90%), In order to support scalable memory utilization and
validation set (5%) and test set (5%) was then undertaken. The improve the training process on a large scale dataset, a data
training portion of the dataset helps to build and enhance the generator function was implemented to load data in batches.
model performance quite well, the validation portion of the This function vectorizes the captions into a matrices and gives
dataset is useful in checking on the model overfitting while the out image features in batches.
test portion of the dataset checks the model performance on
unseen data.  Model Architecture
The image captioning task is accomplished with the help
 Caption Vectorization of convolutional neural networks and recurrent neural
The captions were encoded using the TextVectorization networks. The task of the VGG16 model is to obtain the image
layer from TensorFlow so that the model will be able to work features, while the captions are processed with text
with text data. This layer also prepares the text by punching vectorization layer, embedding and LSTM layers. The
out unnecessary words, and unifies different words used in embedding, dropout, LSTM, and dense model layers are a few
various captions. examples of the model architecture. Categorical cross-entropy
loss and the Adam optimizer were used to train it.

Fig 3 Model Architecture

(Author’s Self Generated)

IJISRT24OCT678 www.ijisrt.com 1343

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

 Model Training against the reference captions. It was done through the NLTK
A custom data generator parsed data in batches, and the library available in Python.
model was trained over five epochs. An epoch training loss
was recorded by the ‘Early Stop’ callback which was aimed at  Caption Visualization
preventing the model from overfitting. The model was The top-5 revealed captions for every image were
evaluated based on BLEU score which is a common technique generated by beam search. This technique begins with a single
used to evaluate how well a generated captions matches to a token and throughout the process searches for the top-k most
reference caption. probable continuations of the sequence, known as captioning,
until an end token is encountered.
 BLEU Score
The bleu score is measured in ratio form, 0 and 1, and its
purpose is to determine how precise the suggested captions

Fig 4 Caption Visualization

(Author’s Self Generated)

 Interface Development visual features and text adds more meaning on the ability of
For the backend, Flask was used as the framework and the model to handle visual contents. The study explains the
for the frontend HTML/CSS/JavaScript were used to design a merits and demerits of the model and finally argues that the
basic web interface. This interface enables users to upload approach proposed enhances existing ones for image
images and get captions predicted by the model. captioning research. Such systems may find application in
aids, imaging retrieval, and HCI systems. An RNN which
III. RESULTS incorporates LSTM units helps in processing the image parts
by producing efficient captioned word sequences, while the
This paper describes the procedures for image captioning training process is supported by a data generator. It achieves a
with RNN-LSTM model with the attention mechanism on the BLEU score of 99% indicating that the model is accurate.
Flickr30k dataset. It covers data preprocessing, feature Furthermore, the model has a ‘playable with’ visualization
extraction, model training and testing, and visualization while interface which allows a person to go to the web, upload an
proving the ability of the model to produce efficient and image and straightaway get a caption for the image as well.
coherent captions with high BLEU scores. The combination of

IJISRT24OCT678 www.ijisrt.com 1344

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

Table 1 BLEU Score Result

(Author’s Self Generated)

Fig 5 Final Result of Generated Caption

(Author’s Self Generated)

IJISRT24OCT678 www.ijisrt.com 1345

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

Fig 6 Final Results 2 of Generated Captions

(Author’s Self Generated)

IV. CONCLUSION captions as an output, which proves the extent to which the
model was trained to comprehend and recreate a 3-
This Research paper reviews the image captioning dimensional composition including its environmental features.
process enabling the usage of RNN-LSTM with attention Except for the qualitative evaluation, this research study
mechanism and the Flickr30k dataset. The aforementioned served the purpose of gaining an insight into the strengths and
approach includes data preprocessing, feature representation weaknesses of the model. It can be concluded that this work
extraction, model building, training, testing, and visualization has enriched the methods of image captioning while also
creating a large homogenous area of work regarding the image having demonstrated through the experiment that the proposed
captioning task. Thus, checking and assessing on the turn out method works. In view of the rich potential inherent in the
model, this model is able to generate picture captions with a topic, the recommendations of the study may be utilized in the
high level of efficiency and the different models result in a development of assistive devices, image-based content
high BLEU coefficient. The integration of image features with retrieval systems, and man-computer interfaces. These models
text captions, as well as the use of deep learning methods, shall require additional research and advancement to expand
shows the effectiveness of the approach when it comes to and enhance the manner in which image captioning aids in
survey any visual information. In addition, it is also worth multi-modal input and output understanding and engagement.
noting the fact that test images were provided with generated

Fig 7 Graphical Representation of Final Results

(Author’s Self Generated)

IJISRT24OCT678 www.ijisrt.com 1346

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

FUTURE RECOMMENDATION [9]. Gaurav & Mathur, P. (2021). A survey on various deep
learning models for automatic image captioning.
Numerous major directions for further exploration and ICMAI.
progress have been specified to promote the domain of image [10]. Hendricks, L.A., Venugopalan, S. & Rohrbach, M.
captioning. First things first, being transformed into better and (2016). Deep compo_sitional captioning: Describing
better model architectures and coming to constructs like BERT novel object categories without paired training data.
and ViTs have the potential of enhancing the quality of the IEEE.
produced captions and their diversity. Attention mechanisms, [11]. Huang, L., Wang, W., Chen, J. & Wei, X.Y. (2019).
which are present in most modern models, may be further Attention on attention for image captioning. In IEEE.
enhanced by modifications such as self-attention or multi-head [12]. Jandial, S., Badjatiya, P., Chawla, P. & Krishnamurthy,
attention, allowing for more precise targeting of important B. (2022). Sac: Semantic attention composition for
locations within the image. New technologies, as creating text-conditioned image retrieval. In IEEE/CVF Winter
additive training sets, can increase the robustness of the model Conference on Applications of Computer Vision
while also new technologies can combine different models (WACV).
predictions for better results. However, regardless of the [13]. Khaled, R., R., T.T. & Arabnia, H.R. (2020).
performance marked by the BLEU parameters, it makes sense Automatic image and video caption generation with
to assess the caption quality with evaluating metrics like deep learning: A concise review and algorithmic over
METEOR or CIDEr as well. lap. IEEE.
[14]. Khan, R., Islam, M.S., Kanwal, K., Iqbal, M., Hossain,
Experiments with users and their opinions allow M.I. & Ye, Z. (2022). A deep neural framework for
measuring the effectiveness of the model in terms of quality, image caption generation using gru-based attention
which is subjective, and helps improving the model. Transfer mechanism. IEEE.
learning and domain adaption strategies could make it possible [15]. Lew, M.S., Liu, Y., Guo, Y. & Bakker, E.M. (2017a).
to use the developed model for new datasets or domains even Learning a recurrent residual fusion network for
with limited data. Cross modal fusion techniques may improve multimodal matching. In IEEE.
the combination of images and text. The technique of creating [16]. Lew, Y.L., Guo, Y., Bakker, E.M. & Lew, M.S.
captions is designed specifically for English; however, its (2017b). Learning a recurrent residual fusion network
application for other languages like Italian and French is for multimodal matching. IEEE.
suggested. Advanced our developed technique for creation of [17]. Mathur, A. (2022). Image captioning system using
other content element such as sticker pictures and their recurrent neural network lstm. International Journal of
integration with the text. Enhancing the capacity of the Engineering Research and Technology (IJERT).
technique from generating one caption about objects in a [18]. Mundargi, M.S. & Mohanty, M.H. (2020). Image
picture to generating several different captions associated with captioning using attention mechanism with resnet, vgg
objects in a picture. and inception models. International Research Journal
of Engineering and Technology (IRJET).
REFERENCES [19]. Parameshwaran, A.P. (2020). Deep architectures for
visual recognition and description. Scholarworks.
[1]. Al-Malla, M.A., Jafar, A. & Ghneim, N. (2020). Image [20]. Pedersoli, M., Lucas, T., Schmid, C. & Verbeek, J.
captioning model using attention and object features to (2017). Areas of attention for image captioning. In
mimic human image understanding. IEEE. IEEE International Conference on Computer Vision
[2]. Aneja, J., Deshpande, A. & Schwing, A.G. (2017). (ICCV).
Convolutional image captioning. IEEE. [21]. Rajendra, A., Rajendra, R., Mengshoel, O.J., Zeng, M.
[3]. Ayoub, S., Reegu, F.A. & Turaev, S. (2022). & Haider, M. (2018). Captioning with language-based
Generating image captions using bahdanau attention attention. In IEEE 5th International Conference on
mechanism and transfer learning. In Symmetry 2022, Data Science and Advanced Analytics.
14, 2681. [22]. Raut, R., Patil, S., Borkar, P. & Zore, P. (2023). Image
[4]. Bai, T., Zhou, S., Pang, Y., Luo, J., Wang, H. & Du, Y. captioning using resnet rs and attention mechanism.
(2023). An image caption model based on attention International Journal of Intelligent Systems and
mechanism and deep reinforcement learning. IEEE Applications in Engineering.
Conference. [23]. Shukla, S.K., Dubey, S., Pandey, A.K., Mishra, V. &
[5]. Cao, P., Yang, Z., Sun, L., Liang, Y., Yang, M.Q. & Awasthi, M. (2021). Image caption generator using
Guan, R. (2019). Image captioning with bidirectional neural networks. International Journal of Scientific
semantic attention-based guiding of long short-term Research in Computer Science, Engineering and
memory. Neural Processing Letters, 50, 103–119. Information Technology.
[6]. Chaudhri, S., Mithal, V., Polatkan, G. & Ramanath, R. [24]. Soh, M. (2016). Learning cnn-lstm architectures for
(2021). An attentive survey of attention models. IEEE. image caption generation. In IEEE.
[7]. Chen, J., Dong, W. & Li, M. (2021). Image caption [25]. Sonntag, D., Biswas, R. & Barz, M. (2020). Towards
generator based on deep neural networks. IEEE. explanatory inter_active image captioning using top-
[8]. Galassi, A., Lippi, M. & Torroni, P. (2021). Attention down and bottom-up features, beam search and re-
in natural language processing. IEEE Transactions on ranking. In KI - K¨unstliche Intelligenz.
Neural Networks and Learning Systems, 32.

IJISRT24OCT678 www.ijisrt.com 1347

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

[26]. Sun, J. & Lapuschkin, S. (2018). Explain and improve:

Lrp-inference fine tuning for image captioning models.
IEEE.
[27]. Vinyals, O. (2015). Show and tell: A neural image
caption generator. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
[28]. Yan, S., Xie, Y., Wu, F., Smith, J.S., Lu, W. & Zhang,
B. (2019). Image captioning via a hierarchical attention
mechanism and policy gradient optimization. IEEE.
[29]. Yao, T., Pan, Y., Li, Y., Qiu, Z. & Mei, T. (2017).
Boosting image cap_tioning with attributes. IEEE.
[30]. You, Q., Jin, H., Wang, Z., Fang, C. & Luo, J. (2016).
Image captioning with semantic attention. IEEE.

IJISRT24OCT678 www.ijisrt.com 1348

CHAPTER 1 Grading System
100% (2)
CHAPTER 1 Grading System
22 pages
Marketing Strategy of Campus Active Wear Private Limited
100% (1)
Marketing Strategy of Campus Active Wear Private Limited
85 pages
Pharmacological Evaluation of the Analgesic Potential of Eleusine indica (Poaceae) Ethanolic Root Extract
No ratings yet
Pharmacological Evaluation of the Analgesic Potential of Eleusine indica (Poaceae) Ethanolic Root Extract
15 pages
From Resilience to Success: An Appreciative Inquiry into the Experiences of Criminologist Licensure Examination Passers
No ratings yet
From Resilience to Success: An Appreciative Inquiry into the Experiences of Criminologist Licensure Examination Passers
17 pages
Rethinking Urban Mobility Through Public Parking Facilities in Yaounde : A Case Study of Mokolo, Yaounde
No ratings yet
Rethinking Urban Mobility Through Public Parking Facilities in Yaounde : A Case Study of Mokolo, Yaounde
17 pages
Global Public Health Ecological Foundations Accessible DOCX Download
100% (9)
Global Public Health Ecological Foundations Accessible DOCX Download
14 pages
Chief Pharmacist Ref
No ratings yet
Chief Pharmacist Ref
4 pages
A Comprehensive Insight into Adult Congenital Heart Disease: A Battle of Survival into Adulthood
No ratings yet
A Comprehensive Insight into Adult Congenital Heart Disease: A Battle of Survival into Adulthood
11 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Alzheimer's Disease: Advances in Early Diagnosis and Emerging Therapeutics
No ratings yet
Alzheimer's Disease: Advances in Early Diagnosis and Emerging Therapeutics
4 pages
Design and Implementation of Smart Dustbin for Automated Wet and Dry Waste Segregation
No ratings yet
Design and Implementation of Smart Dustbin for Automated Wet and Dry Waste Segregation
5 pages
Marketing Research: University Bookstore
No ratings yet
Marketing Research: University Bookstore
29 pages
Case Study Assignment 1
No ratings yet
Case Study Assignment 1
18 pages
Khadiza Rahman
No ratings yet
Khadiza Rahman
279 pages
Sba-Intro 1
No ratings yet
Sba-Intro 1
40 pages
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
No ratings yet
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
18 pages
Reviving Chettinad Architecture: A Cultural Legacy of Tamil Nadu
No ratings yet
Reviving Chettinad Architecture: A Cultural Legacy of Tamil Nadu
9 pages
Strategic Planning in Small and Medium Enterprises (Smes) : A Case Study of Botswana Smes
No ratings yet
Strategic Planning in Small and Medium Enterprises (Smes) : A Case Study of Botswana Smes
30 pages
Perception and Readiness of Graduate Level Students Toward E-Governance Implementation in Nepal: A Study at Far Western University
No ratings yet
Perception and Readiness of Graduate Level Students Toward E-Governance Implementation in Nepal: A Study at Far Western University
15 pages
NPAs and Profitability in Indian Private Sector Banks: Evidence from a Panel Study
No ratings yet
NPAs and Profitability in Indian Private Sector Banks: Evidence from a Panel Study
7 pages
Ginkgo Biloba-Derived Flavonoids as Metal Chelators in Alzheimer’s Neurochemistry: A Biochemical Approach
No ratings yet
Ginkgo Biloba-Derived Flavonoids as Metal Chelators in Alzheimer’s Neurochemistry: A Biochemical Approach
7 pages
Molecular Insights into Prion Degradation in Creutzfeldt Jakob Disease’s Challenges and Future Directions: A Review
No ratings yet
Molecular Insights into Prion Degradation in Creutzfeldt Jakob Disease’s Challenges and Future Directions: A Review
13 pages
Cementing “Optimization Techniques” in Social Sciences Research: Towards Non-Mathematical Optimization Techniques for the Social Sciences
No ratings yet
Cementing “Optimization Techniques” in Social Sciences Research: Towards Non-Mathematical Optimization Techniques for the Social Sciences
10 pages
An Analysis of Cognitive Flexibility and Student Engagement: Reimagining Teaching Strategies in Post-Pandemic Higher Education
No ratings yet
An Analysis of Cognitive Flexibility and Student Engagement: Reimagining Teaching Strategies in Post-Pandemic Higher Education
9 pages
Benchmarking and Engineering Specifications
No ratings yet
Benchmarking and Engineering Specifications
21 pages
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
100% (1)
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
7 pages
The Impact of Artificial Intelligence Interventions on Adolescent Mental Health: A Multidimensional Study Using ChatGPT, Gemini, and DeepSeek
No ratings yet
The Impact of Artificial Intelligence Interventions on Adolescent Mental Health: A Multidimensional Study Using ChatGPT, Gemini, and DeepSeek
8 pages
Cia - 4 End Term Examination Fundamentals of Business Analytics
No ratings yet
Cia - 4 End Term Examination Fundamentals of Business Analytics
14 pages
From Global Standards to Local Fields: Redefining Labour Through MGNREGS in Kerala’s Tribal Heartlands – An Interrogation of ILO Norms
No ratings yet
From Global Standards to Local Fields: Redefining Labour Through MGNREGS in Kerala’s Tribal Heartlands – An Interrogation of ILO Norms
7 pages
Innovation of Detector Score Plaque Sensor Based to Improve the Effectiveness and Afficiency of Dental Health Services
No ratings yet
Innovation of Detector Score Plaque Sensor Based to Improve the Effectiveness and Afficiency of Dental Health Services
7 pages
Managing Cardiovascular Toxicities in Cancer Therapy
No ratings yet
Managing Cardiovascular Toxicities in Cancer Therapy
9 pages
Promptsecure: Secure Prompt Engineering Protocols for Regulated Genai Environments
No ratings yet
Promptsecure: Secure Prompt Engineering Protocols for Regulated Genai Environments
9 pages
Obe Syllabus Ge World Sem 2 2023 2024
No ratings yet
Obe Syllabus Ge World Sem 2 2023 2024
15 pages
IMPROVE Floodeye: Integrated Mobile System for Predictive Routing and Optimized Vehicle Navigation Using Ensemble Algorithm
No ratings yet
IMPROVE Floodeye: Integrated Mobile System for Predictive Routing and Optimized Vehicle Navigation Using Ensemble Algorithm
6 pages
Zinner Syndrome: A Radiological Case Report with Multimodal Imaging Insights
No ratings yet
Zinner Syndrome: A Radiological Case Report with Multimodal Imaging Insights
6 pages
Internship Report (Sanjay Final)
No ratings yet
Internship Report (Sanjay Final)
45 pages
Mba Tancet Cutoff 2021
No ratings yet
Mba Tancet Cutoff 2021
23 pages
An Overview of Evans Syndrome–A Rare Disease
No ratings yet
An Overview of Evans Syndrome–A Rare Disease
5 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Impact of Yogic Intervention on Refractive Error Among Adolescents: An Experimental Study
No ratings yet
Impact of Yogic Intervention on Refractive Error Among Adolescents: An Experimental Study
5 pages
New PDF
No ratings yet
New PDF
48 pages
Asset Valuation, Part I by Tamrat M.
100% (1)
Asset Valuation, Part I by Tamrat M.
36 pages
Bringing India to the Global Table: The Transformative Power of International Joint Ventures
No ratings yet
Bringing India to the Global Table: The Transformative Power of International Joint Ventures
4 pages
Integrative Approach to Type 1 Diabetes Mellitus: An Unani Perspective on Asbab-E-Sitta Zaruriya
No ratings yet
Integrative Approach to Type 1 Diabetes Mellitus: An Unani Perspective on Asbab-E-Sitta Zaruriya
3 pages
Supply Chain Management in Times of Crisis: A Systematic Review
No ratings yet
Supply Chain Management in Times of Crisis: A Systematic Review
57 pages
DL - Review of Research Papers - Image - Caption - Generation
No ratings yet
DL - Review of Research Papers - Image - Caption - Generation
34 pages
Summary of Findings Thesis
100% (2)
Summary of Findings Thesis
5 pages
Pine Needle Length Comparisons in Conifers
0% (1)
Pine Needle Length Comparisons in Conifers
6 pages
Mini Project Report
No ratings yet
Mini Project Report
31 pages
Image Caption
No ratings yet
Image Caption
16 pages
Mini Project Final
No ratings yet
Mini Project Final
27 pages
Paper 17881
No ratings yet
Paper 17881
6 pages
Report Contents Image Caption Generation-1
No ratings yet
Report Contents Image Caption Generation-1
42 pages
Smart Villages
No ratings yet
Smart Villages
26 pages
Report 1
No ratings yet
Report 1
34 pages
Preliminaries
No ratings yet
Preliminaries
11 pages
Kuba Raffia Technology, A Symbol of Authenticity for the Dress Code of Ancestral Value in Congo-Kinshasa
No ratings yet
Kuba Raffia Technology, A Symbol of Authenticity for the Dress Code of Ancestral Value in Congo-Kinshasa
3 pages
Managing Performance and Building Digital Trust in Remote Teams Through Cybersecurity-Conscious HRM Policies and the Economics of Remote Work
No ratings yet
Managing Performance and Building Digital Trust in Remote Teams Through Cybersecurity-Conscious HRM Policies and the Economics of Remote Work
14 pages
Understanding Students’ Entrepreneurial Mindset in Sorsogon State University
No ratings yet
Understanding Students’ Entrepreneurial Mindset in Sorsogon State University
16 pages
The School of Talents as an Empowerment Catalyst in Transforming Women’s Lives and Promoting Gender Equality in Pentecostal Communities
No ratings yet
The School of Talents as an Empowerment Catalyst in Transforming Women’s Lives and Promoting Gender Equality in Pentecostal Communities
11 pages
The Role of Streptococci in Infective Endocarditis
No ratings yet
The Role of Streptococci in Infective Endocarditis
6 pages
Activity 7 - Quality Control
No ratings yet
Activity 7 - Quality Control
4 pages
Practical MXNet Applications: Definitive Reference for Developers and Engineers
From Everand
Practical MXNet Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
M6 Assignment-SpringA25 - WFAvQj1
No ratings yet
M6 Assignment-SpringA25 - WFAvQj1
20 pages
Efficacy and Safety of Greater Occipital Nerve Block For The Treatment of Cluster Headache - A Systematic Review and Meta-Analysis - 2020
No ratings yet
Efficacy and Safety of Greater Occipital Nerve Block For The Treatment of Cluster Headache - A Systematic Review and Meta-Analysis - 2020
32 pages
10 35377-Saucis 1339931-3317713
No ratings yet
10 35377-Saucis 1339931-3317713
11 pages
A Comprehensive Guide To Deep Neural Network-Based Image Captions
No ratings yet
A Comprehensive Guide To Deep Neural Network-Based Image Captions
17 pages
Image Caption Generator PCL
No ratings yet
Image Caption Generator PCL
19 pages
Module A Answers Feb 2014
No ratings yet
Module A Answers Feb 2014
27 pages
Image Captioning Synopsis
No ratings yet
Image Captioning Synopsis
17 pages
NTCC Final Report 1000
No ratings yet
NTCC Final Report 1000
27 pages
Ijem V14 N1 3
No ratings yet
Ijem V14 N1 3
15 pages
Visual Image Caption Generator Using Deep Learning
No ratings yet
Visual Image Caption Generator Using Deep Learning
7 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
Survey Paper
No ratings yet
Survey Paper
9 pages
Generating Caption From Images Using Flickr Image Dataset
No ratings yet
Generating Caption From Images Using Flickr Image Dataset
7 pages
Minor
No ratings yet
Minor
14 pages
Image Captioning Using Deep Learning and NLP Techniques
No ratings yet
Image Captioning Using Deep Learning and NLP Techniques
14 pages
Project Management Assignment
No ratings yet
Project Management Assignment
13 pages
8 - 23 - Image Captioning Based On Scene Graphs - A Survey
No ratings yet
8 - 23 - Image Captioning Based On Scene Graphs - A Survey
24 pages
Synopsis May 2024 (Pradeep, Vikas) - 1
No ratings yet
Synopsis May 2024 (Pradeep, Vikas) - 1
14 pages
IJNRD2309143
No ratings yet
IJNRD2309143
11 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
Implementation of Simple and Efficient P
No ratings yet
Implementation of Simple and Efficient P
8 pages
Important Files and Document
No ratings yet
Important Files and Document
6 pages
Generating Video Descriptions With Attention-Driven LSTM Models in Hindi Language
No ratings yet
Generating Video Descriptions With Attention-Driven LSTM Models in Hindi Language
9 pages
DW & Caption Generator - Paper 1
No ratings yet
DW & Caption Generator - Paper 1
6 pages
Image Captioning
No ratings yet
Image Captioning
8 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
Image Caption Generator Using CNN and LSTM
No ratings yet
Image Caption Generator Using CNN and LSTM
8 pages
Fin Irjmets1730175122
No ratings yet
Fin Irjmets1730175122
6 pages
Image Caption Generator by Using CNN and LSTM: International Journal For Multidisciplinary Research
No ratings yet
Image Caption Generator by Using CNN and LSTM: International Journal For Multidisciplinary Research
6 pages
Conference Paper A5
No ratings yet
Conference Paper A5
9 pages
Base Paper
No ratings yet
Base Paper
6 pages
Fin Irjmets1681386363
No ratings yet
Fin Irjmets1681386363
5 pages
Documents 5
No ratings yet
Documents 5
5 pages
Research Paper of Generating Caption From Image
No ratings yet
Research Paper of Generating Caption From Image
5 pages
Fin Irjmets1689950550
No ratings yet
Fin Irjmets1689950550
5 pages
Image Captioning - A Deep Learning Approach Using CNN and LSTM Network
No ratings yet
Image Captioning - A Deep Learning Approach Using CNN and LSTM Network
6 pages
Image Caption Generator
No ratings yet
Image Caption Generator
6 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
8 pages
01 Low Satisfaction
No ratings yet
01 Low Satisfaction
5 pages
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
No ratings yet
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
6 pages
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
No ratings yet
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
8 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
Image Captioning Using Deep Learning Mait
No ratings yet
Image Captioning Using Deep Learning Mait
8 pages
Image Caption Generation Using Deep Neural Networks
No ratings yet
Image Caption Generation Using Deep Neural Networks
3 pages
Gray Scale Image Captioning Using CNN and LSTM
No ratings yet
Gray Scale Image Captioning Using CNN and LSTM
8 pages
Image Captioning Generator Using CNN and LSTM
No ratings yet
Image Captioning Generator Using CNN and LSTM
8 pages
Image To Caption Generator
No ratings yet
Image To Caption Generator
7 pages
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
No ratings yet
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
4 pages
Bioanalytical Services in USA
No ratings yet
Bioanalytical Services in USA
2 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
Automatic Image Caption Generation System
No ratings yet
Automatic Image Caption Generation System
4 pages
RRL Suggestion
No ratings yet
RRL Suggestion
3 pages
Image Captionbot For Assistive Technology
No ratings yet
Image Captionbot For Assistive Technology
3 pages
Image Captioning - A Deep Learning Approach
No ratings yet
Image Captioning - A Deep Learning Approach
4 pages
Entrepreneur
No ratings yet
Entrepreneur
2 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Image Captioning Using R-CNN & LSTM Deep Learning Model
No ratings yet
Image Captioning Using R-CNN & LSTM Deep Learning Model
4 pages

A Comparative Analysis of Attention Mechanism in RNN-LSTMs For Improved Image Captioning Performance

Uploaded by

A Comparative Analysis of Attention Mechanism in RNN-LSTMs For Improved Image Captioning Performance

Uploaded by

Volume 9, Issue 10, October– 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT678

A Comparative Analysis of Attention

IJISRT24OCT678 www.ijisrt.com 1341

Fig 1 Schematic Diagram of the Project

Fig 2 Caption Pre-Processing

IJISRT24OCT678 www.ijisrt.com 1342

 Data Splitting  Data Generator

Fig 3 Model Architecture

IJISRT24OCT678 www.ijisrt.com 1343

Fig 4 Caption Visualization

IJISRT24OCT678 www.ijisrt.com 1344

Table 1 BLEU Score Result

(Author’s Self Generated)

Fig 5 Final Result of Generated Caption

IJISRT24OCT678 www.ijisrt.com 1345

Fig 6 Final Results 2 of Generated Captions

Fig 7 Graphical Representation of Final Results

IJISRT24OCT678 www.ijisrt.com 1346

IJISRT24OCT678 www.ijisrt.com 1347

[26]. Sun, J. & Lapuschkin, S. (2018). Explain and improve:

IJISRT24OCT678 www.ijisrt.com 1348

You might also like