0% found this document useful (0 votes)
18 views8 pages

Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory

This document discusses the implementation of an automatic image caption generator using Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) models, highlighting that RNN achieved a higher accuracy of 91% compared to LSTM's 76%. The study utilized the Flickr8k dataset and employed various deep learning techniques to enhance caption generation for images, particularly benefiting visually impaired individuals. The findings suggest that RNN is a more effective classifier for generating image descriptions than LSTM.

Uploaded by

baforemmanuel1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory

This document discusses the implementation of an automatic image caption generator using Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) models, highlighting that RNN achieved a higher accuracy of 91% compared to LSTM's 76%. The study utilized the Flickr8k dataset and employed various deep learning techniques to enhance caption generation for images, particularly benefiting visually impaired individuals. The findings suggest that RNN is a more effective classifier for generating image descriptions than LSTM.

Uploaded by

baforemmanuel1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

OriginalArticle

Implementing Complexity in Automatic Image


Caption Generator using Recurrent Neural
Network over Long Short-Term Memory
SaiTeja. N.R1, Rashmitha Khilar2
1
Research Scholar,Department of Information Technology,Saveetha School of Engineering,Saveetha Institute of Medical and
Technical Sciences,Saveetha University, Chennai, Tamilnadu, India, Pincode - 602105.
2
Project Guide, Corresponding Author,Department of Information Technology,Saveetha School of Engineering,Saveetha
Institute of Medical and Technical Sciences,Saveetha University, Chennai, Tamilnadu, India, Pincode - 602105.

Abstract
Aim: To grasp the context of a picture and explain it in natural languages, such as English, using an image caption generator
and processing ideas. Materials and Methods: The performance analysis for the highest accuracy in picture caption
generator using beam search (N=10) and long short term memory (N=10) with 70% and 30% split sizes of training and test
datasets, using G-power setting parameters: (α=0.05 and power=0.86) respectively Results: RNN has significantly better
accuracy (91%) compared to long short term memory accuracy (76%) and attained the significance value of 0.670 (Two-
tailed, p>0.05). Conclusion: Recurrent neural networks achieved significantly better classification than Long short-term
memory for generating a description of the image.

Keywords: Deep Learning, Recurrent neural network, Long short term memory, Accuracy, Novel image caption, Encoder-
Decoder.

DOI: 10.47750/pnr.2022.13.S03.014

INTRODUCTION

Automatic caption generation is a tough undertaking that can aid visually challenged persons in understanding
the content of web images (Bai and An 2018). It may also have a significant impact on search engines and
robots. This problem is substantially more difficult than image categorization or object recognition, both of
which have been extensively researched (Mishra and Banerjee 2020). We have explored a few techniques to
produce good results since researchers have been involved in discovering an effective strategy to generate better
forecasts (Kameswari 2021). To create a good model, we used deep neural networks and machine learning
techniques. We used the Flickr8k dataset, which contains approximately 8000 example photographs with five
captions each (Wang et al. 2016). The applications include editing apps, novel capitalizations on generation in
virtual assistants, encoder-decoder, picture indexing, visually impaired people, for social media and a variety of
other natural language processing applications are among them. It aids in the creation of an image caption
(Dehaqi, Seydi, and Madadi 2021)

The LSTM and simple RNN were used in different ways. Recent articles have sparked my interest.
Approximately 175 papers were located in IEEE Xplore, while 213 papers were identified in the ScienceDirect
database (Han and Choi 2020; Agrawal et al. 2021). The Python libraries utilized throughout the development
included Keras, which features a VCG net for9image recognition, and TensorFlow(Brownlee 2018). We tested
numerous encoder-decoder models on our system to determine how they affect captions development and to
demonstrate various application cases (Vo, n.d.). For the image caption generator, develop a unique parallel-
fusion RNN and LSTM architecture (Verma et al. 2021). The proposed technique involves improving
performance and efficiency. Make a different caption generation survey available. Split photo captioning
approaches into groups based on the strategy in each method was quite beneficial in knowing how to execute
novel image captions with a flickr8k dataset of images (Tan and Chan 2019). Our team has extensive knowledge
and research experience that has translate into high quality publications(Bhansali et al. 2021; Jayanth et al.
2021; Sudhakar, Ravel, and Perumal 2021; Sathiyamoorthi et al. 2021; Deepanraj et al. 2021; Raju et al. 2021;
Arun Prakash et al. 2020; Kamath et al. 2020; Shanmugam et al. 2021; Rajasekaran et al. 2020; Adhinarayanan
et al. 2020; Rajesh et al. 2020; Aurtherson et al. 2021)

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 123


SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

The topic of improving feature extraction and RNN classifier efficiency was thoroughly covered. In the
novel image caption generation, the Long Short Term Memory classifier that was used to train flickr8k data
produced better results. The flaw in the existing system's research gap has a lower degree of accuracy. The aim
of this research is to increase classification accuracy by adding RNN and comparing its performance to that of
LSTM by encoder-decoder models (Aghav 2020). With the use of novel image caption and deep learning
techniques, the proposed model improves classifiers to better discriminate objects (Kinghorn, Zhang, and Shao
2018).

Materials And Methods


The study setting of the proposed work is done inDBMS Laboratory, Department of Information
Technology at Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai.
Two different groups are used for the research. Group 1 is the RNN and LSTM algorithm. The 10 samples are
collected for each group and a total of 20 samples are carried out for study with alpha 0.05 and beta is 0.2, 91%
confidence interval, and 80% pretest power. In this research study, the performance of two algorithms RNN and
LSTM are compared. The independent variables are image type and the dependent variables are image size.
This paper gives a new strategy to beautify the accuracy of the classifier by combining RNN (Classification
algorithm) with the LSTM algorithm and finally using RNN to make top-quality desires on the classification
problem. Experiments have proven that this new methodology has elevated the accuracy of the classification
hassle and hence serves the intended purpose.

Flickr8k dataset, which contains approximately 8000 example photographs with five captions each as a
dataset. The encoder-decoder model used a collection of photos, roughly 680 images with descriptions on
innovative captions generated. Recurrent neural networks were used to extract the captions, which were then
preprocessed. The RNN algorithm, which accomplishes classification by forming groups of every single class in
the data, is the first group in this study. The RNN classifier uses k groups as its input size and attempts to
classify them as the value of significance. The proposed work is designed and implemented with the help of
googlecolab software. The platform to assess deep learning was Windows 10 OS. The Hardware configuration
was an Intel corei7 processor with a RAM size of 8GB. The system sort used was 64-bit. For the
implementation of code, the python programming language was used. As for code execution, the flickr8k
dataset is worked behind to perform an output process for accuracy.

Recurrent Neural Network


Presenting a parallel-fusion RNN-LSTM architecture that has only two major structures and no extra
pieces when compared to the general model. The component of the novel image representation that uses CNN is
based on RNN structures. while the part of the novel image caption generation that uses RNN structures is based
on CNN. Used to extract picture features as well as align visual and verbal data. The parallel-fusion mode has
been proposed. RNNs are a sort of Neural Network in which the output from the previous step is used as input
in the next phase. All of the inputs and outputs in standard neural networks are independent of one another,
however in some circumstances. The currently hidden state h(t) of the vanilla RNN is generated from the
previously hidden h(t-1) and the current input x(t) by the basic equation of RNN is shown in (1)
a(t) = b+Wh(t-1)+Ux(t) (1)

Pseudocodefor Recurrent Neural Network


INPUT: Training the flicker8K dataset for image caption generator
OUTPUT: Description of each image and obtained accuracy.
Step 1. Training the RNN Model
Step 2. Features ("Images", "Captions")
Step 3. Classes['Group']
Step 4. X dataset [Features].values
Step 5. Y dataset [Classes). values
Step 6. Train data, Test_Data, Valid_Data Test Train Split
Step 7. Batch Size 4
Step 8. LSTM Model Sequential Model
Embedding layer (train data.length, Output_length, train data.columns),
LSTM_Layer (Output_length)),
Dense layer (Output_length, activation='sigmoid'))
Step 9. Loss 'binary_crossentropy", optimizer 'adam', Epochs 10
Step 10. RNN_model.compile (Loss, optimizer)s

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 124


SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

Step 11. RNN_model.train (Train_data, Epoch ,batch_size, Valid_data).

Long Short-Term Memory


An LSTM is a type of RNN that can deal with vanishing and exploding gradients as well as extended
dependencies. A memory cell and different gates govern input, output, and memory behaviors in an LSTM.
With input gate, input modulation gate an(𝑡)output gate Ux(t), and forgetting gate f(t) we use a Wh(t-1) this is the
number of hidden units. The LSTM may carry out relevant information throughout the processing of inputs, and
it can discard non-related information using a forget gate equation 2.

𝑎(𝑡) = Wℎ(𝑡−1) +𝑈𝑥 (𝑡) (2)


Pseudocode for Long Short-Term Memory

INPUT: Caption generation.


OUTPUT: Classifier accuracy

Step: 1 Generate five descriptions for each image.


Step: 2 Get the data values and extract them.
Step: 3 Find the dependent and independent attributes and divide them.
Step: 4 Adjust the attributes so that there will be a loss function between them.
Step: 5 finally make the regularization of the penalties for the loss function calculated.
Step: 6 Return the predicted class.
Step: 7 End program.

Statistical Analysis
SPSS software is used for statistical analysis of Recurrent Neural networks and Long Short Term
Memory. Independent variables are images, caption generator, vocabulary, preprocessed words, and description
length. Dependent variables are accuracy, precision, T-test analysis was carried out to calculate accuracy for
both methods.

Results
With a sample size of 10, the suggested RNN algorithm and LSTM were run in Google colab at
different periods. Table 1 shows the encoder-decoder models' anticipated novel image caption accuracy and
recognition of novel image caption production. These ten data samples, along with their loss values, are utilized
to create statistical values that may be compared for each algorithm. The mean accuracy of the RNN algorithm
was 91%, while the LSTM method was 76% according to the data. RNN and LSTM mean accuracy values are
shown in Table 3. The RNN's mean value is higher than the LSTM, with standard deviations of 7.16608 and
7.71992, respectively. Table 4 presents the RNN and LSTM Independent sample T-test data, with a significant
value of 0.670 (two-tailed, p>0.05). In terms of mean accuracy and loss, Fig. 1 shows a comparison of RNN and
LSTM.

Deep learning also specifies the group statistics value, as well as the mean, standard deviation, and
standard error mean for the two techniques. The loss between two algorithms of RNN and LSTM is classified in
the graphical form of comparative analysis. This shows that Recurrent Networks are substantially better with
91% accuracy when compared to the 76% accuracy of Long Short Term Memory.

Discussion
The significance value achieved in the provided studyis 0.670 because, of a large number of datasets
with fewer parameters (Two-tailed, p>0.05), implying that RNN appears to be superior to LSTM. The RNN
classifier has a 91% accuracy rate, while the LSTM classifier has a 76% accuracy rate. In this work, a previous
comparison of RNN versus LSTM is shown (Alahmadi, Park, and Hahn 2019). When compared to the LSTM
classifier, this clearly shows that RNN appears to be a stronger classifier. This research compares the accuracy
of RNN and LSTM shown in table 2, finding that RNN has a 91% accuracy and LSTM has a 76% accuracy
(Poghosyan and Sarukhanyan 2017). RNN is a sort of artificial neural network used in deep learning to create
captions for new images using previously saved datasets.

RNN makes the relationship between these two concealed layers (Ly, Traore, and Dia 2021). The
output layer can receive data from both the past and the future at the same time (Huang 2020). A similar LSTM
may carry out relevant data throughout the interpretation of inputs, and it can discard non-related information

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 125


SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

using a forget gate (K. 2020). The Opposite Recommendations in editing apps, Novel Caption generation in
automated systems, encoder decoder, picture indexing, visually impaired people, for social media, and various
more natural language processing applications were amongst these uses. It aids in the creation of an image
caption (Tomar et al. 2022).

The study's drawbacks include the fact that training a convolutional neural network takes a long time,
especially with flickr8K datasets in deep learning (Yang et al. 2020). The dataset has several attributes that the
classifier can utilize to improve prediction accuracy and work more effectively at achieving the vision. The
future scope of image caption generators should be increased accuracy and exact precision numbers can be
raised as a result of features like these, accuracy and exact precision numbers can be increased. The system
should be enhanced to accommodate a bigger number of photos with less time spent training the data set in the
future scope of this work.

Conclusion
This proposed work used both the algorithms RNN and LSTM Machine Algorithm to predict the
accuracy. The RNN-LSTM model was created with the goal of automatically generating captions for the input
images. This model can be applied to a wide range of situations. Learned about the RNN model, and LSTM
models, and verified that the model is capable of creating captions for the input images. It is observed that the
RNN gives the best accuracy with 91% compared to the LSTM 76%

DECLARATIONS

Conflicts of Interests
No conflict of interest in this manuscript.

Authors Contribution
Author ST was involved in data collection, data analysis, and manuscript writing. Author RG
was involved in conceptualization, data validation, and critical reviews of manuscripts.

Acknowledgment
The authors would like to express their gratitude towards Saveetha School of Engineering, Saveetha Institute of
Medical and Technical Sciences (formerly known as Saveetha University)
for providing the necessary infrastructure to carry out this work successfully.

Funding: We thank the following organizations for providing financial support that enabled us
to complete the study.

1. Infysec Solution, Chennai.


2. Saveetha University.
3. Saveetha Institute of Medical and Technical Sciences.
4. Saveetha School of Engineering.

References
1. Adhinarayanan, Rajesh, AravindhRamakrishnan, GopalKaliyaperumal, Melvinvíctor De Poures, Rajesh Kumar Babu, and
DamodharanDillikannan. 2020. “Comparative Analysis on the Effect of 1-Decanol and Di-N-Butyl Ether as Additive with
diesel/LDPE Blends in Compression Ignition Engine.” Energy Sources, Part A: Recovery, Utilization, and Environmental Effects,
June, 1–18.
2. Aghav, Jagannath. 2020. “Image Captioning Using Deep Learning.” International Journal for Research in Applied Science and
Engineering Technology. https://doi.org/10.22214/ijraset.2020.6232.
3. Agrawal, Vaishnavi, Shariva Dhekane, Neha Tuniya, and Vibha Vyas. 2021. “Image Caption Generator Using Attention
Mechanism.” 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT).
https://doi.org/10.1109/icccnt51525.2021.9579967.
4. Alahmadi, Rehab, Chung Hyuk Park, and James Hahn. 2019. “Sequence-to-Sequence Image Caption Generator.” Eleventh
International Conference on Machine Vision (ICMV 2018). https://doi.org/10.1117/12.2523174.
5. Arun Prakash, V. R., J. Francis Xavier, G. Ramesh, T. Maridurai, K. Siva Kumar, and R. Blessing Sam Raj. 2020. “Mechanical,
Thermal and Fatigue Behaviour of Surface-Treated Novel Caryota Urens Fibre–reinforced Epoxy Composite.” Biomass Conversion
and Biorefinery, August. https://doi.org/10.1007/s13399-020-00938-0.
6. Aurtherson, P. Babu, Bhanu Teja Nalla, Karthikeyan Srinivasan, Kulmani Mehar, and Yuvarajan Devarajan. 2021. “Biofuel
Production from Novel Prunus Domestica Kernel Oil: Process Optimization Technique.” Biomass Conversion and Biorefinery,
May. https://doi.org/10.1007/s13399-021-01551-5.
7. Bai, Shuang, and Shan An. 2018. “A Survey on Automatic Image Caption Generation.” Neurocomputing.

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 126


SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

https://doi.org/10.1016/j.neucom.2018.05.080.
8. Bhansali, Karan J., Kamlesh R. Balinge, Subodh U. Raut, Shubham A. Deshmukh, M. Senthil Kumar, C. Ramesh Kumar, and
Pundlik R. Bhagat. 2021. “Visible Light Assisted Sulfonic Acid-Functionalized Porphyrin Comprising Benzimidazolium Moiety for
PhotocatalyticTransesterification of Castor Oil.” Fuel 304 (November): 121490.
9. Brownlee, Jason. 2018. Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python.
Machine Learning Mastery.
10. Deepanraj, B., N. Senthilkumar, D. Mala, and A. Sathiamourthy. 2021. “Cashew Nut Shell Liquid as Alternate Fuel for CI
Engine—optimization Approach for Performance Improvement.” Biomass Conversion and Biorefinery, February.
https://doi.org/10.1007/s13399-021-01312-4.
11. Dehaqi, Ali Mollaahmadi, Vahid Seydi, and Yeganeh Madadi. 2021. “Adversarial Image Caption Generator Network.” SN
Computer Science. https://doi.org/10.1007/s42979-021-00486-y.
12. Han, Seung-Ho, and Ho-Jin Choi. 2020. “Domain-Specific Image Caption Generator with Semantic Ontology.” 2020 IEEE
International Conference on Big Data and Smart Computing (BigComp). https://doi.org/10.1109/bigcomp48618.2020.00-12.
13. Huang, Chien-Lin. 2020. “Speaker Characterization Using TDNN, TDNN-LSTM, TDNN-LSTM-Attention Based Speaker
Embeddings for NIST SRE 2019.” The Speaker and Language Recognition Workshop (Odyssey 2020).
https://doi.org/10.21437/odyssey.2020-60.
14. Jayanth, BellappuVenkat, Melvin Victor Depoures, GopalKaliyaperumal, DamodharanDillikannan, DilipsinghJawahar,
KumaranPalani, and Ganesha Prasad MeravanigeeShivappa. 2021. “A Comprehensive Study on the Effects of Multiple Injection
Strategies and Exhaust Gas Recirculation on Diesel Engine Characteristics That Utilize Waste High Density Polyethylene Oil.”
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, June, 1–18.
15. Kamath, Manjunath, Subha Krishna Rao, Jaison, Sridhar, Kasthuri, Gopinath, Sivaperumal, and ShantanuPatil. 2020. “Melatonin
Delivery from PCL Scaffold Enhances Glycosaminoglycans Deposition in Human Chondrocytes – Bioactive Scaffold Model for
Cartilage Regeneration.” Process Biochemistry 99 (December): 36–47.
16. Kameswari, A. V. N. 2021. “Image Caption Generator Using Deep Learning.” International Journal for Research in Applied
Science and Engineering Technology. https://doi.org/10.22214/ijraset.2021.38652.
17. Kinghorn, Philip, Li Zhang, and Ling Shao. 2018. “A Region-Based Image Caption Generator with Refined Descriptions.”
Neurocomputing. https://doi.org/10.1016/j.neucom.2017.07.014.
18. K., Sahityabhilash. 2020. “Impact of Loss Function Using M-LSTM Classifier for Sequence Data.” International Journal of
Psychosocial Rehabilitation. https://doi.org/10.37200/ijpr/v24i5/pr202059.
19. Ly, Racine, FousseiniTraore, and Khadim Dia. 2021. Forecasting Commodity Prices Using Long-Short-Term Memory Neural
Networks. Intl Food Policy Res Inst.
20. Mishra, Sanjukta, and Minakshi Banerjee. 2020. “Automatic Caption Generation of Retinal Diseases with Self-Trained RNN Merge
Model.” Advances in Intelligent Systems and Computing. https://doi.org/10.1007/978-981-15-2930-6_1.
21. Poghosyan, Aghasi, and Hakob Sarukhanyan. 2017. “Short-Term Memory with Read-Only Unit in Neural Image Caption
Generator.” 2017 Computer Science and Information Technologies (CSIT). https://doi.org/10.1109/csitechnol.2017.8312163.
22. Rajasekaran, S., D. Damodharan, K. Gopal, B. Rajesh Kumar, and Melvin Victor De Poures. 2020. “Collective Influence of 1-
Decanol Addition, Injection Pressure and EGR on Diesel Engine Characteristics Fueled with diesel/LDPE Oil Blends.” Fuel 277
(October): 118166.
23. Rajesh, A., K. Gopal, De Poures Melvin Victor, B. Rajesh Kumar, A. P. Sathiyagnanam, and D. Damodharan. 2020. “Effect of
Anisole Addition to Waste Cooking Oil Methyl Ester on Combustion, Emission and Performance Characteristics of a DI Diesel
Engine without Any Modifications.” Fuel 278 (October): 118315.
24. Raju, P., K. Raja, K. Lingadurai, T. Maridurai, and S. C. Prasanna. 2021. “Glass/Caryota Urens Hybridized Fibre-Reinforced
nanoclay/SiC Toughened Epoxy Hybrid Composite: Mechanical, Drop Load Impact, Hydrophobicity and Fatigue Behaviour.”
Biomass Conversion and Biorefinery, March. https://doi.org/10.1007/s13399-021-01427-8.
25. Sathiyamoorthi, Ramalingam, Gomathinayakam Sankaranarayanan, Dinesh Babu Munuswamy, and Yuvarajan Devarajan. 2021.
“Experimental Study of Spray Analysis for Palmarosa Biodiesel‐diesel Blends in a Constant Volume Chamber.” Environmental
Progress & Sustainable Energy 40 (6). https://doi.org/10.1002/ep.13696.
26. Shanmugam, Rajasekaran, DamodharanDillikannan, GopalKaliyaperumal, Melvin Victor De Poures, and Rajesh Kumar Babu.
2021. “A Comprehensive Study on the Effects of 1-Decanol, Compression Ratio and Exhaust Gas Recirculation on Diesel Engine
Characteristics Powered with Low Density Polyethylene Oil.” Energy Sources, Part A: Recovery, Utilization, and Environmental
Effects 43 (23): 3064–81.
27. Sudhakar, M. P., Merlyn Ravel, and K. Perumal. 2021. “Pretreatment and Process Optimization of Bioethanol Production from
Spent Biomass of GanodermaLucidum Using Saccharomyces Cerevisiae.” Fuel 306 (December): 121680.
28. Tan, Ying Hua, and Chee Seng Chan. 2019. “Phrase-Based Image Caption Generator with Hierarchical LSTM Network.”
Neurocomputing. https://doi.org/10.1016/j.neucom.2018.12.026.
29. Tomar, Dimpal, Pradeep Tomar, Arpit Bhardwaj, and G. R. Sinha. 2022. “Deep Learning Neural Network Prediction System
Enhanced with Best Window Size in Sliding Window Algorithm for Predicting Domestic Power Consumption in a Residential
Building.” Computational Intelligence and Neuroscience 2022 (March): 7216959.
30. Verma, Akash, Harshit Saxena, Mugdha Jaiswal, and Poonam Tanwar. 2021. “Intelligence Embedded Image Caption Generator
Using LSTM Based RNN Model.” 2021 6th International Conference on Communication and Electronics Systems (ICCES).
https://doi.org/10.1109/icces51350.2021.9489253.
31. Vo, Tham. n.d. “FuzzSemNIC: A Deep Fuzzy Neural Network Semantic-Enhanced Approach of Neural Image Captioning.”
https://doi.org/10.21203/rs.3.rs-610265/v1.
32. Wang, Minsi, Li Song, Xiaokang Yang, and Chuanfei Luo. 2016. “A Parallel-Fusion RNN-LSTM Architecture for Image Caption
Generation.” 2016 IEEE International Conference on Image Processing (ICIP). https://doi.org/10.1109/icip.2016.7533201.
33. Yang, Min, Junhao Liu, Ying Shen, Zhou Zhao, Xiaojun Chen, Qingyao Wu, and Chengming Li. 2020. “An Ensemble of
Generation- and Retrieval-Based Image Captioning with Dual Generator Generative Adversarial Network.” IEEE Transactions on
Image Processing: A Publication of the IEEE Signal Processing Society PP (October). https://doi.org/10.1109/TIP.2020.3028651.

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 127


SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

TABLES AND FIGURES

Table 1. Group, Accuracy, and Loss value uses 8 columns with 8 width data for novel image caption generators.
SI.NO Name Type Width Decimal Columns Measure Role

1 Group Numeric 8 2 8 Nominal Input

2 Accuracy Numeric 8 2 8 Scale Input

3 Loss Numeric 8 2 8 Scale Input

Table 2. Accuracy and Loss Analysis of recurrent neural network and Long short term memory.
S.No GROUPS ACCURACY LOSS

91.00 9.00

81.68 18.32

74.56 25.44

1 RNN
86.25 13.75

78.64 21.36

85.78 14.22

68.94 31.06

90.56 9.44

84.36 15.64

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 128


SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

76.25 23.75

78.00 22.00

67.21 32.79

61.78 38.22
2
LSTM
73.56 26.44

63.75 36.25

59.14 40.86

57.56 42.44

75.12 24.88

60.53 39.47

56.85 43.15

Table 3. Group Statistical Analysis of RNN and LSTM. Mean, Standard Deviation, and Standard Error Mean
are obtained for 10 samples. RNN has higher mean accuracy and lower mean loss when compared to LSTM.
Name GROUP N Mean Std.Deviation Std.Error
Mean

ACCURACY RNN 10 81.8020 7.16608 2.26611

LSTM 10 85.3500 7.71992 2.44125

LOSS RNN 10 18.1980 7.16608 2.26611

LSTM 10 34.6500 7.71992 2.44125

Table 4. Independent Sample T-test: RNN is insignificantly better than LSTM with a p-value 0.670 (Two-
tailed, p>0.05)
Name Variance F Sig. t df Sig Mean Std.Erro Lower Upper
s (2- Diffencen r
tail e differenc
ed) e

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 129


SaiTeja,etal.: Implementing Complexity in Automatic Image Caption Generator using Recurrent Neural Network over Long Short-Term
Memory

Equal .18 .67 4.93 18 .00 16.45200 3.33091 9.45401 23.4499


variances 8 0 9 0 9
assumed
ACCURAC
Y
Equal _ _ 4.93 17.90 .00 16.45200 3.33091 9.45124 23.4527
Variances 9 1 0 6
not
assumed

Equal .18 .67 - 18 .00 -16.45200 3.33091 - -9.45401


variances 8 0 4.93 0 23.4499
assumed 9 9

LOSS
Equal _ _ - 17.90 .00 -16.45200 3.33091 - -9.45124
Variances 4.93 1 0 23.4527
not 9 6
assumed

Fig. 1. Simple Bar Mean of Accuracy by RNN and LSTM Machine Algorithm, the bar chart representing the
comparison of mean accuracy of RNN is 91 % and LSTM is 76 %. X-Axis: RNN vs LSTM Machine
Algorithm. Y-Axis: Mean accuracy. The error bars are 95% for both algorithms. The Standard Deviation Error
Bars are +/- 1 SD.

Journal of Pharmaceutical Negative Results ¦Volume13¦SpecialIssue 4¦2022 130

You might also like