0% found this document useful (0 votes)

18 views8 pages

Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory

This document discusses the implementation of an automatic image caption generator using Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) models, highlighting that RNN achieved a higher accuracy of 91% compared to LSTM's 76%. The study utilized the Flickr8k dataset and employed various deep learning techniques to enhance caption generation for images, particularly benefiting visually impaired individuals. The findings suggest that RNN is a more effective classifier for generating image descriptions than LSTM.

Uploaded by

baforemmanuel1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory

Uploaded by

baforemmanuel1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

OriginalArticle

Implementing Complexity in Automatic Image

Caption Generator using Recurrent Neural
Network over Long Short-Term Memory
SaiTeja. N.R1, Rashmitha Khilar2
1
Research Scholar,Department of Information Technology,Saveetha School of Engineering,Saveetha Institute of Medical and
Technical Sciences,Saveetha University, Chennai, Tamilnadu, India, Pincode - 602105.
2
Project Guide, Corresponding Author,Department of Information Technology,Saveetha School of Engineering,Saveetha
Institute of Medical and Technical Sciences,Saveetha University, Chennai, Tamilnadu, India, Pincode - 602105.

Abstract
Aim: To grasp the context of a picture and explain it in natural languages, such as English, using an image caption generator
and processing ideas. Materials and Methods: The performance analysis for the highest accuracy in picture caption
generator using beam search (N=10) and long short term memory (N=10) with 70% and 30% split sizes of training and test
datasets, using G-power setting parameters: (α=0.05 and power=0.86) respectively Results: RNN has significantly better
accuracy (91%) compared to long short term memory accuracy (76%) and attained the significance value of 0.670 (Two-
tailed, p>0.05). Conclusion: Recurrent neural networks achieved significantly better classification than Long short-term
memory for generating a description of the image.

Keywords: Deep Learning, Recurrent neural network, Long short term memory, Accuracy, Novel image caption, Encoder-
Decoder.

DOI: 10.47750/pnr.2022.13.S03.014

INTRODUCTION

Automatic caption generation is a tough undertaking that can aid visually challenged persons in understanding
the content of web images (Bai and An 2018). It may also have a significant impact on search engines and
robots. This problem is substantially more difficult than image categorization or object recognition, both of
which have been extensively researched (Mishra and Banerjee 2020). We have explored a few techniques to
produce good results since researchers have been involved in discovering an effective strategy to generate better
forecasts (Kameswari 2021). To create a good model, we used deep neural networks and machine learning
techniques. We used the Flickr8k dataset, which contains approximately 8000 example photographs with five
captions each (Wang et al. 2016). The applications include editing apps, novel capitalizations on generation in
virtual assistants, encoder-decoder, picture indexing, visually impaired people, for social media and a variety of
other natural language processing applications are among them. It aids in the creation of an image caption
(Dehaqi, Seydi, and Madadi 2021)

The LSTM and simple RNN were used in different ways. Recent articles have sparked my interest.
Approximately 175 papers were located in IEEE Xplore, while 213 papers were identified in the ScienceDirect
database (Han and Choi 2020; Agrawal et al. 2021). The Python libraries utilized throughout the development
included Keras, which features a VCG net for9image recognition, and TensorFlow(Brownlee 2018). We tested
numerous encoder-decoder models on our system to determine how they affect captions development and to
demonstrate various application cases (Vo, n.d.). For the image caption generator, develop a unique parallel-
fusion RNN and LSTM architecture (Verma et al. 2021). The proposed technique involves improving
performance and efficiency. Make a different caption generation survey available. Split photo captioning
approaches into groups based on the strategy in each method was quite beneficial in knowing how to execute
novel image captions with a flickr8k dataset of images (Tan and Chan 2019). Our team has extensive knowledge
and research experience that has translate into high quality publications(Bhansali et al. 2021; Jayanth et al.
2021; Sudhakar, Ravel, and Perumal 2021; Sathiyamoorthi et al. 2021; Deepanraj et al. 2021; Raju et al. 2021;
Arun Prakash et al. 2020; Kamath et al. 2020; Shanmugam et al. 2021; Rajasekaran et al. 2020; Adhinarayanan
et al. 2020; Rajesh et al. 2020; Aurtherson et al. 2021)

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 123

SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

The topic of improving feature extraction and RNN classifier efficiency was thoroughly covered. In the
novel image caption generation, the Long Short Term Memory classifier that was used to train flickr8k data
produced better results. The flaw in the existing system's research gap has a lower degree of accuracy. The aim
of this research is to increase classification accuracy by adding RNN and comparing its performance to that of
LSTM by encoder-decoder models (Aghav 2020). With the use of novel image caption and deep learning
techniques, the proposed model improves classifiers to better discriminate objects (Kinghorn, Zhang, and Shao
2018).

Materials And Methods

The study setting of the proposed work is done inDBMS Laboratory, Department of Information
Technology at Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai.
Two different groups are used for the research. Group 1 is the RNN and LSTM algorithm. The 10 samples are
collected for each group and a total of 20 samples are carried out for study with alpha 0.05 and beta is 0.2, 91%
confidence interval, and 80% pretest power. In this research study, the performance of two algorithms RNN and
LSTM are compared. The independent variables are image type and the dependent variables are image size.
This paper gives a new strategy to beautify the accuracy of the classifier by combining RNN (Classification
algorithm) with the LSTM algorithm and finally using RNN to make top-quality desires on the classification
problem. Experiments have proven that this new methodology has elevated the accuracy of the classification
hassle and hence serves the intended purpose.

Flickr8k dataset, which contains approximately 8000 example photographs with five captions each as a
dataset. The encoder-decoder model used a collection of photos, roughly 680 images with descriptions on
innovative captions generated. Recurrent neural networks were used to extract the captions, which were then
preprocessed. The RNN algorithm, which accomplishes classification by forming groups of every single class in
the data, is the first group in this study. The RNN classifier uses k groups as its input size and attempts to
classify them as the value of significance. The proposed work is designed and implemented with the help of
googlecolab software. The platform to assess deep learning was Windows 10 OS. The Hardware configuration
was an Intel corei7 processor with a RAM size of 8GB. The system sort used was 64-bit. For the
implementation of code, the python programming language was used. As for code execution, the flickr8k
dataset is worked behind to perform an output process for accuracy.

Recurrent Neural Network

Presenting a parallel-fusion RNN-LSTM architecture that has only two major structures and no extra
pieces when compared to the general model. The component of the novel image representation that uses CNN is
based on RNN structures. while the part of the novel image caption generation that uses RNN structures is based
on CNN. Used to extract picture features as well as align visual and verbal data. The parallel-fusion mode has
been proposed. RNNs are a sort of Neural Network in which the output from the previous step is used as input
in the next phase. All of the inputs and outputs in standard neural networks are independent of one another,
however in some circumstances. The currently hidden state h(t) of the vanilla RNN is generated from the
previously hidden h(t-1) and the current input x(t) by the basic equation of RNN is shown in (1)
a(t) = b+Wh(t-1)+Ux(t) (1)

Pseudocodefor Recurrent Neural Network

INPUT: Training the flicker8K dataset for image caption generator
OUTPUT: Description of each image and obtained accuracy.
Step 1. Training the RNN Model
Step 2. Features ("Images", "Captions")
Step 3. Classes['Group']
Step 4. X dataset [Features].values
Step 5. Y dataset [Classes). values
Step 6. Train data, Test_Data, Valid_Data Test Train Split
Step 7. Batch Size 4
Step 8. LSTM Model Sequential Model
Embedding layer (train data.length, Output_length, train data.columns),
LSTM_Layer (Output_length)),
Dense layer (Output_length, activation='sigmoid'))
Step 9. Loss 'binary_crossentropy", optimizer 'adam', Epochs 10
Step 10. RNN_model.compile (Loss, optimizer)s

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 124

SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

Step 11. RNN_model.train (Train_data, Epoch ,batch_size, Valid_data).

Long Short-Term Memory

An LSTM is a type of RNN that can deal with vanishing and exploding gradients as well as extended
dependencies. A memory cell and different gates govern input, output, and memory behaviors in an LSTM.
With input gate, input modulation gate an(𝑡)output gate Ux(t), and forgetting gate f(t) we use a Wh(t-1) this is the
number of hidden units. The LSTM may carry out relevant information throughout the processing of inputs, and
it can discard non-related information using a forget gate equation 2.

𝑎(𝑡) = Wℎ(𝑡−1) +𝑈𝑥 (𝑡) (2)

Pseudocode for Long Short-Term Memory

INPUT: Caption generation.

OUTPUT: Classifier accuracy

Step: 1 Generate five descriptions for each image.

Step: 2 Get the data values and extract them.
Step: 3 Find the dependent and independent attributes and divide them.
Step: 4 Adjust the attributes so that there will be a loss function between them.
Step: 5 finally make the regularization of the penalties for the loss function calculated.
Step: 6 Return the predicted class.
Step: 7 End program.

Statistical Analysis
SPSS software is used for statistical analysis of Recurrent Neural networks and Long Short Term
Memory. Independent variables are images, caption generator, vocabulary, preprocessed words, and description
length. Dependent variables are accuracy, precision, T-test analysis was carried out to calculate accuracy for
both methods.

Results
With a sample size of 10, the suggested RNN algorithm and LSTM were run in Google colab at
different periods. Table 1 shows the encoder-decoder models' anticipated novel image caption accuracy and
recognition of novel image caption production. These ten data samples, along with their loss values, are utilized
to create statistical values that may be compared for each algorithm. The mean accuracy of the RNN algorithm
was 91%, while the LSTM method was 76% according to the data. RNN and LSTM mean accuracy values are
shown in Table 3. The RNN's mean value is higher than the LSTM, with standard deviations of 7.16608 and
7.71992, respectively. Table 4 presents the RNN and LSTM Independent sample T-test data, with a significant
value of 0.670 (two-tailed, p>0.05). In terms of mean accuracy and loss, Fig. 1 shows a comparison of RNN and
LSTM.

Deep learning also specifies the group statistics value, as well as the mean, standard deviation, and
standard error mean for the two techniques. The loss between two algorithms of RNN and LSTM is classified in
the graphical form of comparative analysis. This shows that Recurrent Networks are substantially better with
91% accuracy when compared to the 76% accuracy of Long Short Term Memory.

Discussion
The significance value achieved in the provided studyis 0.670 because, of a large number of datasets
with fewer parameters (Two-tailed, p>0.05), implying that RNN appears to be superior to LSTM. The RNN
classifier has a 91% accuracy rate, while the LSTM classifier has a 76% accuracy rate. In this work, a previous
comparison of RNN versus LSTM is shown (Alahmadi, Park, and Hahn 2019). When compared to the LSTM
classifier, this clearly shows that RNN appears to be a stronger classifier. This research compares the accuracy
of RNN and LSTM shown in table 2, finding that RNN has a 91% accuracy and LSTM has a 76% accuracy
(Poghosyan and Sarukhanyan 2017). RNN is a sort of artificial neural network used in deep learning to create
captions for new images using previously saved datasets.

RNN makes the relationship between these two concealed layers (Ly, Traore, and Dia 2021). The
output layer can receive data from both the past and the future at the same time (Huang 2020). A similar LSTM
may carry out relevant data throughout the interpretation of inputs, and it can discard non-related information

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 125

SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

using a forget gate (K. 2020). The Opposite Recommendations in editing apps, Novel Caption generation in
automated systems, encoder decoder, picture indexing, visually impaired people, for social media, and various
more natural language processing applications were amongst these uses. It aids in the creation of an image
caption (Tomar et al. 2022).

The study's drawbacks include the fact that training a convolutional neural network takes a long time,
especially with flickr8K datasets in deep learning (Yang et al. 2020). The dataset has several attributes that the
classifier can utilize to improve prediction accuracy and work more effectively at achieving the vision. The
future scope of image caption generators should be increased accuracy and exact precision numbers can be
raised as a result of features like these, accuracy and exact precision numbers can be increased. The system
should be enhanced to accommodate a bigger number of photos with less time spent training the data set in the
future scope of this work.

Conclusion
This proposed work used both the algorithms RNN and LSTM Machine Algorithm to predict the
accuracy. The RNN-LSTM model was created with the goal of automatically generating captions for the input
images. This model can be applied to a wide range of situations. Learned about the RNN model, and LSTM
models, and verified that the model is capable of creating captions for the input images. It is observed that the
RNN gives the best accuracy with 91% compared to the LSTM 76%

DECLARATIONS

Conflicts of Interests
No conflict of interest in this manuscript.

Authors Contribution
Author ST was involved in data collection, data analysis, and manuscript writing. Author RG
was involved in conceptualization, data validation, and critical reviews of manuscripts.

Acknowledgment
The authors would like to express their gratitude towards Saveetha School of Engineering, Saveetha Institute of
Medical and Technical Sciences (formerly known as Saveetha University)
for providing the necessary infrastructure to carry out this work successfully.

Funding: We thank the following organizations for providing financial support that enabled us
to complete the study.

1. Infysec Solution, Chennai.

2. Saveetha University.
3. Saveetha Institute of Medical and Technical Sciences.
4. Saveetha School of Engineering.

References
1. Adhinarayanan, Rajesh, AravindhRamakrishnan, GopalKaliyaperumal, Melvinvíctor De Poures, Rajesh Kumar Babu, and
DamodharanDillikannan. 2020. “Comparative Analysis on the Effect of 1-Decanol and Di-N-Butyl Ether as Additive with
diesel/LDPE Blends in Compression Ignition Engine.” Energy Sources, Part A: Recovery, Utilization, and Environmental Effects,
June, 1–18.
2. Aghav, Jagannath. 2020. “Image Captioning Using Deep Learning.” International Journal for Research in Applied Science and
Engineering Technology. https://doi.org/10.22214/ijraset.2020.6232.
3. Agrawal, Vaishnavi, Shariva Dhekane, Neha Tuniya, and Vibha Vyas. 2021. “Image Caption Generator Using Attention
Mechanism.” 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT).
https://doi.org/10.1109/icccnt51525.2021.9579967.
4. Alahmadi, Rehab, Chung Hyuk Park, and James Hahn. 2019. “Sequence-to-Sequence Image Caption Generator.” Eleventh
International Conference on Machine Vision (ICMV 2018). https://doi.org/10.1117/12.2523174.
5. Arun Prakash, V. R., J. Francis Xavier, G. Ramesh, T. Maridurai, K. Siva Kumar, and R. Blessing Sam Raj. 2020. “Mechanical,
Thermal and Fatigue Behaviour of Surface-Treated Novel Caryota Urens Fibre–reinforced Epoxy Composite.” Biomass Conversion
and Biorefinery, August. https://doi.org/10.1007/s13399-020-00938-0.
6. Aurtherson, P. Babu, Bhanu Teja Nalla, Karthikeyan Srinivasan, Kulmani Mehar, and Yuvarajan Devarajan. 2021. “Biofuel
Production from Novel Prunus Domestica Kernel Oil: Process Optimization Technique.” Biomass Conversion and Biorefinery,
May. https://doi.org/10.1007/s13399-021-01551-5.
7. Bai, Shuang, and Shan An. 2018. “A Survey on Automatic Image Caption Generation.” Neurocomputing.

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 126

SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

https://doi.org/10.1016/j.neucom.2018.05.080.
8. Bhansali, Karan J., Kamlesh R. Balinge, Subodh U. Raut, Shubham A. Deshmukh, M. Senthil Kumar, C. Ramesh Kumar, and
Pundlik R. Bhagat. 2021. “Visible Light Assisted Sulfonic Acid-Functionalized Porphyrin Comprising Benzimidazolium Moiety for
PhotocatalyticTransesterification of Castor Oil.” Fuel 304 (November): 121490.
9. Brownlee, Jason. 2018. Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python.
Machine Learning Mastery.
10. Deepanraj, B., N. Senthilkumar, D. Mala, and A. Sathiamourthy. 2021. “Cashew Nut Shell Liquid as Alternate Fuel for CI
Engine—optimization Approach for Performance Improvement.” Biomass Conversion and Biorefinery, February.
https://doi.org/10.1007/s13399-021-01312-4.
11. Dehaqi, Ali Mollaahmadi, Vahid Seydi, and Yeganeh Madadi. 2021. “Adversarial Image Caption Generator Network.” SN
Computer Science. https://doi.org/10.1007/s42979-021-00486-y.
12. Han, Seung-Ho, and Ho-Jin Choi. 2020. “Domain-Specific Image Caption Generator with Semantic Ontology.” 2020 IEEE
International Conference on Big Data and Smart Computing (BigComp). https://doi.org/10.1109/bigcomp48618.2020.00-12.
13. Huang, Chien-Lin. 2020. “Speaker Characterization Using TDNN, TDNN-LSTM, TDNN-LSTM-Attention Based Speaker
Embeddings for NIST SRE 2019.” The Speaker and Language Recognition Workshop (Odyssey 2020).
https://doi.org/10.21437/odyssey.2020-60.
14. Jayanth, BellappuVenkat, Melvin Victor Depoures, GopalKaliyaperumal, DamodharanDillikannan, DilipsinghJawahar,
KumaranPalani, and Ganesha Prasad MeravanigeeShivappa. 2021. “A Comprehensive Study on the Effects of Multiple Injection
Strategies and Exhaust Gas Recirculation on Diesel Engine Characteristics That Utilize Waste High Density Polyethylene Oil.”
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, June, 1–18.
15. Kamath, Manjunath, Subha Krishna Rao, Jaison, Sridhar, Kasthuri, Gopinath, Sivaperumal, and ShantanuPatil. 2020. “Melatonin
Delivery from PCL Scaffold Enhances Glycosaminoglycans Deposition in Human Chondrocytes – Bioactive Scaffold Model for
Cartilage Regeneration.” Process Biochemistry 99 (December): 36–47.
16. Kameswari, A. V. N. 2021. “Image Caption Generator Using Deep Learning.” International Journal for Research in Applied
Science and Engineering Technology. https://doi.org/10.22214/ijraset.2021.38652.
17. Kinghorn, Philip, Li Zhang, and Ling Shao. 2018. “A Region-Based Image Caption Generator with Refined Descriptions.”
Neurocomputing. https://doi.org/10.1016/j.neucom.2017.07.014.
18. K., Sahityabhilash. 2020. “Impact of Loss Function Using M-LSTM Classifier for Sequence Data.” International Journal of
Psychosocial Rehabilitation. https://doi.org/10.37200/ijpr/v24i5/pr202059.
19. Ly, Racine, FousseiniTraore, and Khadim Dia. 2021. Forecasting Commodity Prices Using Long-Short-Term Memory Neural
Networks. Intl Food Policy Res Inst.
20. Mishra, Sanjukta, and Minakshi Banerjee. 2020. “Automatic Caption Generation of Retinal Diseases with Self-Trained RNN Merge
Model.” Advances in Intelligent Systems and Computing. https://doi.org/10.1007/978-981-15-2930-6_1.
21. Poghosyan, Aghasi, and Hakob Sarukhanyan. 2017. “Short-Term Memory with Read-Only Unit in Neural Image Caption
Generator.” 2017 Computer Science and Information Technologies (CSIT). https://doi.org/10.1109/csitechnol.2017.8312163.
22. Rajasekaran, S., D. Damodharan, K. Gopal, B. Rajesh Kumar, and Melvin Victor De Poures. 2020. “Collective Influence of 1-
Decanol Addition, Injection Pressure and EGR on Diesel Engine Characteristics Fueled with diesel/LDPE Oil Blends.” Fuel 277
(October): 118166.
23. Rajesh, A., K. Gopal, De Poures Melvin Victor, B. Rajesh Kumar, A. P. Sathiyagnanam, and D. Damodharan. 2020. “Effect of
Anisole Addition to Waste Cooking Oil Methyl Ester on Combustion, Emission and Performance Characteristics of a DI Diesel
Engine without Any Modifications.” Fuel 278 (October): 118315.
24. Raju, P., K. Raja, K. Lingadurai, T. Maridurai, and S. C. Prasanna. 2021. “Glass/Caryota Urens Hybridized Fibre-Reinforced
nanoclay/SiC Toughened Epoxy Hybrid Composite: Mechanical, Drop Load Impact, Hydrophobicity and Fatigue Behaviour.”
Biomass Conversion and Biorefinery, March. https://doi.org/10.1007/s13399-021-01427-8.
25. Sathiyamoorthi, Ramalingam, Gomathinayakam Sankaranarayanan, Dinesh Babu Munuswamy, and Yuvarajan Devarajan. 2021.
“Experimental Study of Spray Analysis for Palmarosa Biodiesel‐diesel Blends in a Constant Volume Chamber.” Environmental
Progress & Sustainable Energy 40 (6). https://doi.org/10.1002/ep.13696.
26. Shanmugam, Rajasekaran, DamodharanDillikannan, GopalKaliyaperumal, Melvin Victor De Poures, and Rajesh Kumar Babu.
2021. “A Comprehensive Study on the Effects of 1-Decanol, Compression Ratio and Exhaust Gas Recirculation on Diesel Engine
Characteristics Powered with Low Density Polyethylene Oil.” Energy Sources, Part A: Recovery, Utilization, and Environmental
Effects 43 (23): 3064–81.
27. Sudhakar, M. P., Merlyn Ravel, and K. Perumal. 2021. “Pretreatment and Process Optimization of Bioethanol Production from
Spent Biomass of GanodermaLucidum Using Saccharomyces Cerevisiae.” Fuel 306 (December): 121680.
28. Tan, Ying Hua, and Chee Seng Chan. 2019. “Phrase-Based Image Caption Generator with Hierarchical LSTM Network.”
Neurocomputing. https://doi.org/10.1016/j.neucom.2018.12.026.
29. Tomar, Dimpal, Pradeep Tomar, Arpit Bhardwaj, and G. R. Sinha. 2022. “Deep Learning Neural Network Prediction System
Enhanced with Best Window Size in Sliding Window Algorithm for Predicting Domestic Power Consumption in a Residential
Building.” Computational Intelligence and Neuroscience 2022 (March): 7216959.
30. Verma, Akash, Harshit Saxena, Mugdha Jaiswal, and Poonam Tanwar. 2021. “Intelligence Embedded Image Caption Generator
Using LSTM Based RNN Model.” 2021 6th International Conference on Communication and Electronics Systems (ICCES).
https://doi.org/10.1109/icces51350.2021.9489253.
31. Vo, Tham. n.d. “FuzzSemNIC: A Deep Fuzzy Neural Network Semantic-Enhanced Approach of Neural Image Captioning.”
https://doi.org/10.21203/rs.3.rs-610265/v1.
32. Wang, Minsi, Li Song, Xiaokang Yang, and Chuanfei Luo. 2016. “A Parallel-Fusion RNN-LSTM Architecture for Image Caption
Generation.” 2016 IEEE International Conference on Image Processing (ICIP). https://doi.org/10.1109/icip.2016.7533201.
33. Yang, Min, Junhao Liu, Ying Shen, Zhou Zhao, Xiaojun Chen, Qingyao Wu, and Chengming Li. 2020. “An Ensemble of
Generation- and Retrieval-Based Image Captioning with Dual Generator Generative Adversarial Network.” IEEE Transactions on
Image Processing: A Publication of the IEEE Signal Processing Society PP (October). https://doi.org/10.1109/TIP.2020.3028651.

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 127

SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

TABLES AND FIGURES

Table 1. Group, Accuracy, and Loss value uses 8 columns with 8 width data for novel image caption generators.
SI.NO Name Type Width Decimal Columns Measure Role

1 Group Numeric 8 2 8 Nominal Input

2 Accuracy Numeric 8 2 8 Scale Input

3 Loss Numeric 8 2 8 Scale Input

Table 2. Accuracy and Loss Analysis of recurrent neural network and Long short term memory.
S.No GROUPS ACCURACY LOSS

91.00 9.00

81.68 18.32

74.56 25.44

1 RNN
86.25 13.75

78.64 21.36

85.78 14.22

68.94 31.06

90.56 9.44

84.36 15.64

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 128

SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

76.25 23.75

78.00 22.00

67.21 32.79

61.78 38.22
2
LSTM
73.56 26.44

63.75 36.25

59.14 40.86

57.56 42.44

75.12 24.88

60.53 39.47

56.85 43.15

Table 3. Group Statistical Analysis of RNN and LSTM. Mean, Standard Deviation, and Standard Error Mean
are obtained for 10 samples. RNN has higher mean accuracy and lower mean loss when compared to LSTM.
Name GROUP N Mean Std.Deviation Std.Error
Mean

ACCURACY RNN 10 81.8020 7.16608 2.26611

LSTM 10 85.3500 7.71992 2.44125

LOSS RNN 10 18.1980 7.16608 2.26611

LSTM 10 34.6500 7.71992 2.44125

Table 4. Independent Sample T-test: RNN is insignificantly better than LSTM with a p-value 0.670 (Two-
tailed, p>0.05)
Name Variance F Sig. t df Sig Mean Std.Erro Lower Upper
s (2- Diffencen r
tail e differenc
ed) e

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 129

SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

Equal .18 .67 4.93 18 .00 16.45200 3.33091 9.45401 23.4499

variances 8 0 9 0 9
assumed
ACCURAC
Y
Equal _ _ 4.93 17.90 .00 16.45200 3.33091 9.45124 23.4527
Variances 9 1 0 6
not
assumed

Equal .18 .67 - 18 .00 -16.45200 3.33091 - -9.45401

variances 8 0 4.93 0 23.4499
assumed 9 9

LOSS
Equal _ _ - 17.90 .00 -16.45200 3.33091 - -9.45124
Variances 4.93 1 0 23.4527
not 9 6
assumed

Fig. 1. Simple Bar Mean of Accuracy by RNN and LSTM Machine Algorithm, the bar chart representing the
comparison of mean accuracy of RNN is 91 % and LSTM is 76 %. X-Axis: RNN vs LSTM Machine
Algorithm. Y-Axis: Mean accuracy. The error bars are 95% for both algorithms. The Standard Deviation Error
Bars are +/- 1 SD.

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 130

ProfEd221 - Unit 5 - Feedbacking and Communicating Assessment Results PDF
100% (4)
ProfEd221 - Unit 5 - Feedbacking and Communicating Assessment Results PDF
12 pages
Stability of Food Emulsions (2) : David Julian Mcclements
No ratings yet
Stability of Food Emulsions (2) : David Julian Mcclements
37 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
PCP Comprehensive Solutions
No ratings yet
PCP Comprehensive Solutions
8 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
No ratings yet
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
55 pages
Image Caption
No ratings yet
Image Caption
16 pages
Image Caption Generator PCL
No ratings yet
Image Caption Generator PCL
19 pages
Computer Network - CS610 Power Point Slides Lecture 12
No ratings yet
Computer Network - CS610 Power Point Slides Lecture 12
20 pages
KAWALAN MOTOR English Version
0% (1)
KAWALAN MOTOR English Version
59 pages
Visual Image Caption Generator Using Deep Learning
No ratings yet
Visual Image Caption Generator Using Deep Learning
7 pages
Jordan University of Science and Technology: Abstract
No ratings yet
Jordan University of Science and Technology: Abstract
23 pages
Image Caption Generation Using Deep Learning: Department of Electronics & Instrumentation Engineering NIT Silchar, Assam
No ratings yet
Image Caption Generation Using Deep Learning: Department of Electronics & Instrumentation Engineering NIT Silchar, Assam
21 pages
Power Electronics For Electric Vehicles
No ratings yet
Power Electronics For Electric Vehicles
51 pages
Aust Cse Thesis Final Book
No ratings yet
Aust Cse Thesis Final Book
72 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
8 pages
Image To Caption Generator
No ratings yet
Image To Caption Generator
7 pages
Finalworm 160204043543
No ratings yet
Finalworm 160204043543
20 pages
Distribution and Habitat Association of Somali Ostrich in Samburu, Kenya
No ratings yet
Distribution and Habitat Association of Somali Ostrich in Samburu, Kenya
9 pages
Mini Project Fln..
No ratings yet
Mini Project Fln..
51 pages
(Buehler & Griffin & Peetz-2012) The Planning Fallacy - Cognitive, Motivational, and Social Origins
No ratings yet
(Buehler & Griffin & Peetz-2012) The Planning Fallacy - Cognitive, Motivational, and Social Origins
62 pages
Review 3
No ratings yet
Review 3
18 pages
Image Captioning Generator Using CNN and LSTM
No ratings yet
Image Captioning Generator Using CNN and LSTM
8 pages
CdS/Graphene Photocatalysts
No ratings yet
CdS/Graphene Photocatalysts
28 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
Gray Scale Image Captioning Using CNN and LSTM
No ratings yet
Gray Scale Image Captioning Using CNN and LSTM
8 pages
Design of Machine Learning Algorithms For Object Captioning
No ratings yet
Design of Machine Learning Algorithms For Object Captioning
45 pages
Mutations
No ratings yet
Mutations
48 pages
Image Caption Generator Using Deep Learning: Guided by Dr. Ch. Bindu Madhuri, M Tech, PH.D
No ratings yet
Image Caption Generator Using Deep Learning: Guided by Dr. Ch. Bindu Madhuri, M Tech, PH.D
9 pages
Air Pollution: Classification of Air Pollutants
No ratings yet
Air Pollution: Classification of Air Pollutants
33 pages
AIML - Final Report - Version1
No ratings yet
AIML - Final Report - Version1
24 pages
Economic-Geology-1965 - v60-n07 - P1459-P1477structural Analysis of Ore Shoots at Greenside
No ratings yet
Economic-Geology-1965 - v60-n07 - P1459-P1477structural Analysis of Ore Shoots at Greenside
19 pages
Review 3
No ratings yet
Review 3
18 pages
Project Management: - Dr. Gyanesh Kumar Sinha Associate Professor - Operations and Analytics
No ratings yet
Project Management: - Dr. Gyanesh Kumar Sinha Associate Professor - Operations and Analytics
10 pages
Automatic Image Caption Generation System
No ratings yet
Automatic Image Caption Generation System
4 pages
BTP Report
No ratings yet
BTP Report
27 pages
Imagecaptionusing CNNand LSTM
No ratings yet
Imagecaptionusing CNNand LSTM
11 pages
Review 2
No ratings yet
Review 2
34 pages
Menalled Et Al Canopy Develop Trop Tree Plantations
No ratings yet
Menalled Et Al Canopy Develop Trop Tree Plantations
15 pages
Fenomenologia Da Psicologia
No ratings yet
Fenomenologia Da Psicologia
24 pages
Pronoun-Antecedent Rules
No ratings yet
Pronoun-Antecedent Rules
22 pages
What Is The Role of Recurrent Neural Networks (RNNS) in An Image Caption Generator?
No ratings yet
What Is The Role of Recurrent Neural Networks (RNNS) in An Image Caption Generator?
10 pages
Inner Ring
No ratings yet
Inner Ring
16 pages
Implementation of Simple and Efficient P
No ratings yet
Implementation of Simple and Efficient P
8 pages
Minor
No ratings yet
Minor
14 pages
Fuzzy Logic To Controlled Signal System
No ratings yet
Fuzzy Logic To Controlled Signal System
10 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
Process Design and Reengineering
No ratings yet
Process Design and Reengineering
23 pages
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
No ratings yet
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
4 pages
Generating Caption From Images Using Flickr Image Dataset
No ratings yet
Generating Caption From Images Using Flickr Image Dataset
7 pages
Features Features Features Features
No ratings yet
Features Features Features Features
8 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
EL BR 023 CA EN 0120.1 - PVC Duct DB2 ES2 Pipe Fittings
No ratings yet
EL BR 023 CA EN 0120.1 - PVC Duct DB2 ES2 Pipe Fittings
8 pages
Image Captioning Using CNN and LSTM
No ratings yet
Image Captioning Using CNN and LSTM
9 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
9 pages
RP Springer
No ratings yet
RP Springer
10 pages
IJNRD2309143
No ratings yet
IJNRD2309143
11 pages
Project Review
No ratings yet
Project Review
12 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Image Caption Generation
No ratings yet
Image Caption Generation
8 pages
A Guide To Image Captioning. How Deep Learning Helps in Captioning
No ratings yet
A Guide To Image Captioning. How Deep Learning Helps in Captioning
17 pages
8.design and Analysis of A Conformal MIMO Ingestible Bolus Sensor Antenna For Wireless Capsule Endoscopy For Animal Husbandry
No ratings yet
8.design and Analysis of A Conformal MIMO Ingestible Bolus Sensor Antenna For Wireless Capsule Endoscopy For Animal Husbandry
9 pages
Automated Image Captioning Using CNN and RNN
No ratings yet
Automated Image Captioning Using CNN and RNN
17 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
DL Group 6 Rep
No ratings yet
DL Group 6 Rep
11 pages
Image Captioning - A Deep Learning Approach
No ratings yet
Image Captioning - A Deep Learning Approach
4 pages
Image Captioning
No ratings yet
Image Captioning
8 pages
Fin Irjmets1689950550
No ratings yet
Fin Irjmets1689950550
5 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
Image Captioning: - A Deep Learning Approach
No ratings yet
Image Captioning: - A Deep Learning Approach
14 pages
Conference Paper A5
No ratings yet
Conference Paper A5
9 pages
Automatic Image Captioning Using Neural Networks
No ratings yet
Automatic Image Captioning Using Neural Networks
9 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
Apply Deep Learning-Based CNN and LSTM For Visual Image Caption Generator
No ratings yet
Apply Deep Learning-Based CNN and LSTM For Visual Image Caption Generator
6 pages
Image Caption Generator by Using CNN and LSTM: International Journal For Multidisciplinary Research
No ratings yet
Image Caption Generator by Using CNN and LSTM: International Journal For Multidisciplinary Research
6 pages
DW & Caption Generator - Paper 1
No ratings yet
DW & Caption Generator - Paper 1
6 pages
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
No ratings yet
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
8 pages
Hybrid Image Captioning Model
No ratings yet
Hybrid Image Captioning Model
6 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
Gender Display in Advertisement and McRobbie Four Ideological Codes
No ratings yet
Gender Display in Advertisement and McRobbie Four Ideological Codes
9 pages
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
No ratings yet
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
10 pages
Writing Letter of Apllication and Resume
No ratings yet
Writing Letter of Apllication and Resume
10 pages
Image Caption Generator Research Paper
No ratings yet
Image Caption Generator Research Paper
4 pages
Fórmulas Basicas de Derivadas e Integrales
No ratings yet
Fórmulas Basicas de Derivadas e Integrales
1 page
Project Synopsis Imagecaptioning
No ratings yet
Project Synopsis Imagecaptioning
5 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
Q2 Lesson 1 Worksheet
No ratings yet
Q2 Lesson 1 Worksheet
2 pages
Gas Laws Practice Worksheet
No ratings yet
Gas Laws Practice Worksheet
2 pages
Project 619839 EPP 1 2020 1 FI EPPKA1 JMD MOB
No ratings yet
Project 619839 EPP 1 2020 1 FI EPPKA1 JMD MOB
2 pages

Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory

Uploaded by

Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory

Uploaded by

OriginalArticle

Implementing Complexity in Automatic Image

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 123

Materials And Methods

Recurrent Neural Network

Pseudocodefor Recurrent Neural Network

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 124

Step 11. RNN_model.train (Train_data, Epoch ,batch_size, Valid_data).

Long Short-Term Memory

𝑎(𝑡) = Wℎ(𝑡−1) +𝑈𝑥 (𝑡) (2)

INPUT: Caption generation.

Step: 1 Generate five descriptions for each image.

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 125

1. Infysec Solution, Chennai.

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 126

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 127

TABLES AND FIGURES

1 Group Numeric 8 2 8 Nominal Input

2 Accuracy Numeric 8 2 8 Scale Input

3 Loss Numeric 8 2 8 Scale Input

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 128

ACCURACY RNN 10 81.8020 7.16608 2.26611

LSTM 10 85.3500 7.71992 2.44125

LOSS RNN 10 18.1980 7.16608 2.26611

LSTM 10 34.6500 7.71992 2.44125

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 129

Equal .18 .67 4.93 18 .00 16.45200 3.33091 9.45401 23.4499

Equal .18 .67 - 18 .00 -16.45200 3.33091 - -9.45401

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 130

You might also like