0% found this document useful (0 votes)

78 views5 pages

Smartphone-Based Image Captioning For Visually and Hearing Impaired

The document describes a smartphone-based image captioning system for visually and hearing impaired people. The system uses a deep learning model (VGG16) to extract visual attributes from images. These attributes are then input to a LSTM model to generate natural language captions. An Android app called "Eye of Horus" transfers images from a smartphone camera to a remote server, where the captioning process occurs. Captions are then sent back to the app for users to view or hear via text-to-speech. The integrated system aims to improve accessibility and quality of life for visually and hearing impaired individuals.

Uploaded by

ABD BEST

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views5 pages

Smartphone-Based Image Captioning For Visually and Hearing Impaired

Uploaded by

ABD BEST

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/339265429

Smartphone-based Image Captioning for Visually and Hearing Impaired

Conference Paper · November 2019

DOI: 10.23919/ELECO47770.2019.8990395

CITATION READS

1 336

2 authors, including:

Volkan Kilic
University of Surrey
33 PUBLICATIONS 301 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Robust audio visual tracking of multiple moving sources for robot audition View project

All content following this page was uploaded by Volkan Kilic on 13 July 2020.

The user has requested enhancement of the downloaded file.

Smartphone-based Image Captioning for Visually and Hearing Impaired
Burak Makav, Volkan Kılıç

Electrical and Electronics Engineering Graduate Program, Izmir Katip Celebi University, Izmir, Turkey
[email protected], [email protected]

Abstract tributes extracted from images affect the overall captioning per-
formance which leads researchers to run sophisticated and com-
Visually and hearing impaired people face troubles due
plex methodologies on images. Deep learning is currently pop-
to inaccessible infrastructure and social challenges in daily
ular method used in state-of-the-art studies and there are various
life. To increase the life quality of those people, we report a
deep learning architectures reported in the literature like ZFNet
portable and user-friendly smartphone-based platform ca-
[7], AlexNet [8], GoogLeNet [9] and VGGNet [10]. In this
pable of generating captions and text descriptions, including
study, the VGG16, a member of VGGNet family, is employed
the option of a narrator, using image obtained from a smart-
due to the success of VGGNet over other architectures. On
phone camera. Image captioning is to generate a sentence to
the other hand, researchers from the NLP side focus on better
describe the visual content of an image in natural language
description of visual attributes with natural language and pro-
and has attracted an increasing amount of attention in the
posed models like “Nearest Neighbor” (NN) [11], “Recurrent
ﬁelds of computer vision and natural language processing
neural network (RNN)” [12], “Random” [13], “1NNfc7” [14],
due to its potential applications. Generating image captions
“Human” [15], “Stanford” [16] and Long Short-Term Memory
with proper linguistic properties is a challenging task as it
(LSTM) [17].
needs to combine advanced level of image understanding al-
gorithms with natural language processing methods. In this In this study, we propose to use the VGG16 deep learning
study, we propose to use Long Short-Term Memory (LSTM) architecture followed by the LSTM model to generate a caption.
model to generate a caption after images are trained using We show in our experiments that incorporating the VGG16 ar-
VGG16 deep learning architecture. The visual attributes of chitecture and LSTM model in this way improves the caption-
images are extracted with the VGG16, which conveys richer ing performance signiﬁcantly. Moreover, we develop a custom-
content, and then they are fed into the LSTM model for cap- designed Android application, named as “Eye of Horus” capa-
tion generation. This system is integrated with our custom- ble of generating caption of an image taken by a smartphone
designed Android application, named as “Eye of Horus” camera. Eye of Horus transmits images via cloud system to the
which transfers the images from smartphone to the remote remoter server which runs our proposed image captioning ap-
server via a cloud system, and displays the captions after proach. After a caption is generated in the remoter server, it is
the images are processed with the proposed captioning ap- sent back to the cloud again and Eye of Horus receives the cap-
proach. The results show that the integrated platform has tion and displays on the screen. With narrator option, the user
great potential to be used for image captioning by visually can hear the caption.
and hearing impaired people with advantages such as porta- The rest of this paper is organized as follows: the next sec-
bility, simple operation and rapid response. tion introduces the proposed approach for caption generation.
Section 3 presents dataset, textitEye of Horus and discussion of
1. Introduction the results. Closing remarks are given in Section 4.

Automated caption generation has received increasing in-

terest in the fields of computer vision, as a result of recent
2. Proposed Captioning Approach
progress in artificial intelligence (AI) and natural language pro- In this section, we describe our proposed approach using
cessing (NLP). Image captioning is the task of understanding the VGG16 deep learning architecture and the LSTM model.
a visual scene and expressing it in terms of natural language The image captioning has two steps. First step is that visual at-
descriptions [1, 2]. It has found many industrial and practi- tributes of images need to be extracted with richer content and
cal applications, such as visual intelligence in chatting robots, as a second step, these attributes need to be sent to the NLP
photo tagging and sharing on social media, image indexing or model to generate the most human-like captions. To accom-
retrieval, virtual assistants, image understanding and assistive plish both of these steps, we propose to incorporate the VGG16
activities for visually and hearing impaired people. architecture with LSTM model as presented in following sub-
In order to improve the life quality of those people, several sections.
studies such as “guide dog” [3], “smart glasses” [4] and “image
captioning” [5] have been reported. Generation of meaning- 2.1. VGG16 Deep Learning Architecture
ful and natural language description of an image from the vi-
sual content needs to employ advanced algorithms beyond im- After the VGGNet [10] became the first runner-up in im-
age classification and object detection which attracts the inter- age classification and winner in localization category in Ima-
est of two major areas of AI: computer vision and NLP [6]. Re- geNet Large Scale Visual Recognition Competition (ILSVRC)
searchers from computer vision fields focus on extracting visual 2014, improved versions of VGGNet such as VGG11, VGG13,
attributes with richer contents to feed the NLP which constructs VGG16 and VGG19 have been released with numbers at the end
a sentence in the form of natural-looking text. The visual at- which indicates the number of weight layers used in the archi-

950
Figure 1: The architecture of VGG16 [10]

tecture. The VGGNet is simply a convolutional neural network Figure 2: Sample Image from MSCOCO
model and the VGG16 is found adequate for image classiﬁca-
tion based on experimental studies. The VGG16 includes 16 • A lady sitting at an enormous dining table with lots of
weight layers including thirteen convolutional layers, two fully food.
connected layers and one output later with softmax activation.
The convolutional layers are categorized into 5 groups with a • A woman with eye glasses sitting at a table covered with
max-pooling layer at the end. Illustration of the VGG16 archi- food.
tecture is given in Figure 1. • Several plates of food on a dining table.
Outstanding performance of the VGG16 over the previous
• A guest looks over the plates of fruit on the table.
generation of models like AlexNet and GoogLeNet leads us
to employ in our captioning approach to extract the visual at- • A woman standing near a table with plates covered in
tributes. food.

2.2. LSTM Architecture 3.2. Smartphone Application: Eye of Horus

LSTM networks are repetitive neural networks that have a Here, we demonstrate a portable smartphone-based plat-
specific transition mechanism that controls access to memory form for image captioning controlled by software, named as
cells [17]. Since the gates can prevent the rest of the network Eye of Horus, developed in Android Studio. A simple and user-
from changing the contents of the memory cells for many time friendly interface is designed to provide a simple operation for
periods, the LSTM protects the signals and propagates the er- visually and hearing impaired. Screenshots of the Eye of Horus
rors for a much longer period than ordinary repetitive neural app given in Figure 3 present the flow of running procedures.
networks. Independently, they can also learn to participate in
certain sections of the input signals and to ignore other sections
by reading, writing, and deleting content from memory cells.
These features allow the LSTM networks to process data with
complex and discrete dependencies. For example, it enables
speech recognition [18], offline handwriting recognition [19],
machine translation [20], and image captioning [13, 21]. Thus,
we follow the LSTM architecture to generate the captions from
the visual attributes with rich semantic content.

3. Experimental Results
In the previous section, the VGG16 and LSTM models are
introduced. In this section, dataset, Android application and
results will be discussed.

3.1. Dataset
Flickr [22] and MSCOCO [23] are commonly used datasets
used in image captioning. In this study, the MSCOCO dataset is
chosen as it contains approximately one hundred sixty thousand
pictures while thirty thousand pictures in Flickr. Additionally,
the MSCOCO also contains ﬁve reference captions per image.
The number of images in the dataset and the number of refer-
ence captions per image are important as they are used to train
overall system.
Figure 3: Steps of image captioning with Eye of Horus
Sample image from MSCOCO dataset is given in Figure 2.
The reference caption entered for this image is as follows.

951
Figure 4: Flowchart of overall system

When the user runs the Eye of Horus, “tap me to select” Eye of Horus. In Figure 5, sample captions that generated with
page is displayed after opening page. In this page, the user can proposed approach are given. The caption in Figure 5a is “a ze-
choose an image from the gallery or capture a new image us- bra standing in a field with tall grass” which is very close to the
ing the camera. After the image is selected, the user needs to visual content. The generated caption in Figure 5b is “a tennis
tap “upload” button to send the image to the remote server via player is swinging her racket during a serve” which is very sim-
Firebase cloud system. In the remoter server, a script coded in ilar to natural-looking text. The result shows that the proposed
python downloads the image from the Firebase to generate a system has potential to be used for image captioning by visually
caption. The generated caption is sent back to the Eye of Horus and hearing impaired people.
via the Firebase again. The Eye of Horus displays the caption
under the image with the narrator button. If the user taps this 4. Conclusion
button, the caption is read out loudly.
In this paper, we presented a smartphone-based image cap-
3.3. Results tioning for visually and hearing impaired using the VGG16
deep learning architecture and LSTM model. Our proposed ap-
In this study, proposed approach was tested on the proach was tested on the MSCOCO dataset and then integrated
MSCOCO dataset. First, pre-trained VGG16 model is used with our custom-designed Android application “Eye of Horus”
to extract visual attributes of an image. Then they are fed to to provide user-friendly interface that visually and hearing im-
the LSTM model to generate a caption. Flowchart of over- paired can use it with simple operation. The user either selects
all system is illustrated in Figure 4. The model trained with image from gallery or captures new image using the smartphone
the configurable value of parameters of the VGG16 which are camera. The selected image will be uploaded to the Firebase in
epoch and batch-size. The epoch represents one iteration over order to transfer to the remote server which runs our proposed
the entire training set to process each image and caption pair image captioning approach. The generated captions will send
only once.Batch-Size is defined as the number of total training back to the app again via Firebase to display the caption. The
examples available at one epoch. Obtained results might vary user has also an option to listen the caption. The app will be fur-
according to the parameters. The parameters values used for ther improved to include various capabilities such as translating
model training are set to 55 for Epoch and 1024 for Batch-Size. the English captions to Turkish or other languages and running
After our propose approach is trained, it is integrated with the on ios platform.

952
“Going deeper with convolutions,” in Proceedings of the
IEEE conference on computer vision and pattern recogni-
tion, 2015, pp. 1–9.
[10] K. Simonyan and A. Zisserman, “Very deep convolu-
tional networks for large-scale image recognition,” arXiv
preprint arXiv:1409.1556, 2014.
[11] A. Karpathy and L. Fei-Fei, “Deep visual-semantic align-
ments for generating image descriptions,” in Proceedings
of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 3128–3137.
[12] X. Chen and C. Lawrence Zitnick, “Mind’s eye: A recur-
rent visual representation for image caption generation,”
in Proceedings of the IEEE conference on computer vi-
sion and pattern recognition, 2015, pp. 2422–2431.
[13] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show
and tell: A neural image caption generator,” in Proceed-
ings of the IEEE conference on computer vision and pat-
tern recognition, 2015, pp. 3156–3164.
[14] J. Donahue, L. Anne Hendricks, S. Guadarrama,
M. Rohrbach, S. Venugopalan, K. Saenko, and T. Dar-
(a) (b) rell, “Long-term recurrent convolutional networks for vi-
sual recognition and description,” in Proceedings of the
Figure 5: Captioning results of the proposed approach IEEE conference on computer vision and pattern recogni-
tion, 2015, pp. 2625–2634.
[15] G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li,
5. References Y. Choi, A. C. Berg, and T. L. Berg, “Babytalk: Un-
[1] V. Ramanishka, A. Das, D. H. Park, S. Venugopalan, L. A. derstanding and generating simple image descriptions,”
Hendricks, M. Rohrbach, and K. Saenko, “Multimodal IEEE Transactions on Pattern Analysis and Machine In-
video description,” in Proceedings of the 24th ACM intelligence, vol. 35, no. 12, pp. 2891–2903, 2013.
ternational conference on Multimedia. ACM, 2016, pp. [16] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard,
1092–1096. and D. McClosky, “The stanford corenlp natural language
[2] Y. Li, T. Yao, Y. Pan, H. Chao, and T. Mei, “Pointing novel processing toolkit,” in Proceedings of 52nd annual meet-
objects in image captioning,” in Proceedings of the IEEE ing of the association for computational linguistics: sys-
Conference on Computer Vision and Pattern Recognition, tem demonstrations, 2014, pp. 55–60.
2019, pp. 12 497–12 506. [17] S. Hochreiter and J. Schmidhuber, “Long short-term
[3] L. S. Batt, M. S. Batt, J. A. Baguley, and P. D. McGreevy, memory,” Neural computation, vol. 9, no. 8, pp. 1735–
“Factors associated with success in guide dog training,” 1780, 1997.
Journal of Veterinary Behavior, vol. 3, no. 4, pp. 143–151, [18] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recog-
2008. nition with deep recurrent neural networks,” in interna-
[4] J. Bai, S. Lian, Z. Liu, K. Wang, and D. Liu, “Smart guid- tional conference on acoustics, speech and signal process-
ing glasses for visually impaired people in indoor envi- ing. IEEE, 2013, pp. 6645–6649.
ronment,” IEEE Transactions on Consumer Electronics,
[19] A. Graves and J. Schmidhuber, “Offline handwriting
vol. 63, no. 3, pp. 258–266, 2017.
recognition with multidimensional recurrent neural net-
[5] J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing works,” in Advances in neural information processing sys-
when to look: Adaptive attention via a visual sentinel for tems, 2009, pp. 545–552.
image captioning,” in Proceedings of the IEEE confer-
[20] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to se-
ence on computer vision and pattern recognition, 2017,
quence learning with neural networks,” in Advances in
pp. 375–383.
neural information processing systems, 2014, pp. 3104–
[6] Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, “Image 3112.
captioning with semantic attention,” in Proceedings of the
[21] R. Kiros, R. Salakhutdinov, and R. S. Zemel, “Unifying
IEEE conference on computer vision and pattern recogni-
visual-semantic embeddings with multimodal neural lan-
tion, 2016, pp. 4651–4659.
guage models,” arXiv preprint arXiv:1411.2539, 2014.
[7] M. D. Zeiler and R. Fergus, “Visualizing and understand-
ing convolutional networks,” in European conference on [22] M. Hodosh, P. Young, and J. Hockenmaier, “Framing im-
computer vision. Springer, 2014, pp. 818–833. age description as a ranking task: Data, models and evalu-
ation metrics,” Journal of Artificial Intelligence Research,
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet vol. 47, pp. 853–899, 2013.
classification with deep convolutional neural networks,” in
Advances in neural information processing systems, 2012, [23] X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta,
pp. 1097–1105. P. Dollár, and C. L. Zitnick, “Microsoft coco captions:
Data collection and evaluation server,” arXiv preprint
[9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
arXiv:1504.00325, 2015.
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, 953

View publication stats

Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
Visual Image Caption Generator Using Deep Learning
No ratings yet
Visual Image Caption Generator Using Deep Learning
7 pages
118 Presentation
No ratings yet
118 Presentation
26 pages
Attention Based Image Caption Generation ABICG Using Encoder-Decoder Architecture
No ratings yet
Attention Based Image Caption Generation ABICG Using Encoder-Decoder Architecture
9 pages
Abstract:: Doi: 10.5281/zenodo.7923088
No ratings yet
Abstract:: Doi: 10.5281/zenodo.7923088
12 pages
New PDF
No ratings yet
New PDF
48 pages
ANew Image Captioning Approachfor Visually Impaired People
No ratings yet
ANew Image Captioning Approachfor Visually Impaired People
6 pages
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
No ratings yet
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
4 pages
Hybrid Image Captioning Model
No ratings yet
Hybrid Image Captioning Model
6 pages
Rich Image Captioning in The Wild
No ratings yet
Rich Image Captioning in The Wild
8 pages
Detection and Recognition of Objects in Image Caption Generator System A Deep Learning Approach
No ratings yet
Detection and Recognition of Objects in Image Caption Generator System A Deep Learning Approach
3 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Report Contents Image Caption Generation-1
No ratings yet
Report Contents Image Caption Generation-1
42 pages
Ex 3 SRS
No ratings yet
Ex 3 SRS
5 pages
Major Report Final
No ratings yet
Major Report Final
40 pages
Image Captioning Based Website Forvisuall y Impaired
No ratings yet
Image Captioning Based Website Forvisuall y Impaired
5 pages
Review 3
No ratings yet
Review 3
18 pages
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
No ratings yet
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
10 pages
Report 1
No ratings yet
Report 1
34 pages
Base Paper
No ratings yet
Base Paper
6 pages
RP Springer
No ratings yet
RP Springer
10 pages
TSP CMC 53245
No ratings yet
TSP CMC 53245
18 pages
Synopsis May 2024 (Pradeep, Vikas) - 1
No ratings yet
Synopsis May 2024 (Pradeep, Vikas) - 1
14 pages
Minor
No ratings yet
Minor
14 pages
Image Captioning
No ratings yet
Image Captioning
8 pages
Ref 12
No ratings yet
Ref 12
7 pages
IJNRD2309143
No ratings yet
IJNRD2309143
11 pages
Papers
No ratings yet
Papers
9 pages
DL Project Report
No ratings yet
DL Project Report
10 pages
Image Caption Generator: Minor Project (BCA 5005)
No ratings yet
Image Caption Generator: Minor Project (BCA 5005)
15 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
ACFrOgDue3K5VpDVWq3 TRJqGwgxWZYnRmC34d1zzUQQdfyf7mshQhNh7FuZS 1QF qkY82truBm87vRLQam2YUZRTH
No ratings yet
ACFrOgDue3K5VpDVWq3 TRJqGwgxWZYnRmC34d1zzUQQdfyf7mshQhNh7FuZS 1QF qkY82truBm87vRLQam2YUZRTH
7 pages
Image Caption Generator Research Paper
No ratings yet
Image Caption Generator Research Paper
4 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Literature Survey1
No ratings yet
Literature Survey1
4 pages
Review 3
No ratings yet
Review 3
18 pages
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
No ratings yet
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
6 pages
Generating Caption From Images Using Flickr Image Dataset
No ratings yet
Generating Caption From Images Using Flickr Image Dataset
7 pages
Project Review
No ratings yet
Project Review
12 pages
Image Captioning Using Deep Learning Mait
No ratings yet
Image Captioning Using Deep Learning Mait
8 pages
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
No ratings yet
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
8 pages
Imagecaptionusing CNNand LSTM
No ratings yet
Imagecaptionusing CNNand LSTM
11 pages
Image Caption Generator
No ratings yet
Image Caption Generator
6 pages
Image Caption Generation Using Deep Neural Networks
No ratings yet
Image Caption Generation Using Deep Neural Networks
3 pages
Image Captioning - A Deep Learning Approach Using CNN and LSTM Network
No ratings yet
Image Captioning - A Deep Learning Approach Using CNN and LSTM Network
6 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
Visual Image Caption Generator 38
No ratings yet
Visual Image Caption Generator 38
6 pages
Abstract Final Major Project
No ratings yet
Abstract Final Major Project
1 page
Aneja Convolutional Image Captioning CVPR 2018 Paper
No ratings yet
Aneja Convolutional Image Captioning CVPR 2018 Paper
10 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
9 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
Project Synopsis Imagecaptioning
No ratings yet
Project Synopsis Imagecaptioning
5 pages
Image Captioning - A Deep Learning Approach
No ratings yet
Image Captioning - A Deep Learning Approach
4 pages
Mini Project Fln..
No ratings yet
Mini Project Fln..
51 pages
Image Caption Bot With Keras and Speech Generation For
No ratings yet
Image Caption Bot With Keras and Speech Generation For
7 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
8 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
Image Caption Generation Using Deep Learning: Department of Electronics & Instrumentation Engineering NIT Silchar, Assam
No ratings yet
Image Caption Generation Using Deep Learning: Department of Electronics & Instrumentation Engineering NIT Silchar, Assam
21 pages
DL Lab Manual A.Y 2022-23-1
100% (1)
DL Lab Manual A.Y 2022-23-1
67 pages
CEE Paperrs
No ratings yet
CEE Paperrs
11 pages
Before The Hon'Ble Supreme Court of Zindia
0% (1)
Before The Hon'Ble Supreme Court of Zindia
31 pages
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
No ratings yet
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
14 pages
Al3451 - Question Bank
100% (1)
Al3451 - Question Bank
12 pages
Intelligent Control Syllabus Updated
No ratings yet
Intelligent Control Syllabus Updated
3 pages
School of Distance Education: ST ND RD
No ratings yet
School of Distance Education: ST ND RD
6 pages
Auto GPT
100% (2)
Auto GPT
18 pages
Eric Dietrich_ Chris Fields_ John P. Sullins_ Bram van Heuveln_ Robin Zebrowski - Great Philosophical Objections to Artificial Intelligence_ The History and Legacy of the AI Wars-Bloomsbury Academic (
No ratings yet
Eric Dietrich_ Chris Fields_ John P. Sullins_ Bram van Heuveln_ Robin Zebrowski - Great Philosophical Objections to Artificial Intelligence_ The History and Legacy of the AI Wars-Bloomsbury Academic (
313 pages
5 Tips To Prepare For Data Scientist Interview
No ratings yet
5 Tips To Prepare For Data Scientist Interview
17 pages
Deep Seak Pera Job Data
No ratings yet
Deep Seak Pera Job Data
43 pages
Res 3Q PBT12 Revised
No ratings yet
Res 3Q PBT12 Revised
72 pages
Screenshot (2120)
No ratings yet
Screenshot (2120)
916 pages
10 Days Speaking
No ratings yet
10 Days Speaking
10 pages
Linked in Marketing
No ratings yet
Linked in Marketing
26 pages
Anomaly Detection of Deepfake Audio Based On Real Audio Using Generative Adversarial Network Model
No ratings yet
Anomaly Detection of Deepfake Audio Based On Real Audio Using Generative Adversarial Network Model
16 pages
Screenshot (3036)
No ratings yet
Screenshot (3036)
384 pages
Maths Roadmap For Machine Learning - Linear Algebra-1
No ratings yet
Maths Roadmap For Machine Learning - Linear Algebra-1
5 pages
Đề THPT
No ratings yet
Đề THPT
32 pages
Upi Fraud Detection Using Machine Learning
No ratings yet
Upi Fraud Detection Using Machine Learning
3 pages
Roadmap
No ratings yet
Roadmap
77 pages
ObjectiveQ&a Mid-I NNDL
No ratings yet
ObjectiveQ&a Mid-I NNDL
15 pages
Third Eye An Aid For Visually Impaired 1
No ratings yet
Third Eye An Aid For Visually Impaired 1
6 pages
Feature Engineering and Deep Learning
No ratings yet
Feature Engineering and Deep Learning
2 pages
Unit 4
No ratings yet
Unit 4
54 pages
Vaibhav Jain - Data Scientist
No ratings yet
Vaibhav Jain - Data Scientist
1 page
7 Full Stack Optimization of Tra
No ratings yet
7 Full Stack Optimization of Tra
6 pages
Facial Recognition Smart Glasses For Visually Challenged Persons
No ratings yet
Facial Recognition Smart Glasses For Visually Challenged Persons
5 pages
Presentation About Organic Farming With Ai
No ratings yet
Presentation About Organic Farming With Ai
20 pages
SIH2024 IDEA Presentation Format
No ratings yet
SIH2024 IDEA Presentation Format
7 pages
An Efficient Edge Deep Learning Computer Vision System To Prevent Sudden Infant Death Syndrome
No ratings yet
An Efficient Edge Deep Learning Computer Vision System To Prevent Sudden Infant Death Syndrome
6 pages
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
No ratings yet
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
65 pages
Automatic Image Captioning Using Neural Networks
No ratings yet
Automatic Image Captioning Using Neural Networks
9 pages
Camera & Sensors-Based Assistive Devices For Visually Impaired Persons: A Systematic Review
No ratings yet
Camera & Sensors-Based Assistive Devices For Visually Impaired Persons: A Systematic Review
20 pages
SVD Project
No ratings yet
SVD Project
3 pages
西湖大学2
No ratings yet
西湖大学2
64 pages
L.D. Davis, Handbook of Genetic Algorithms : Artificial Intelligence
No ratings yet
L.D. Davis, Handbook of Genetic Algorithms : Artificial Intelligence
6 pages
STAT 1005 Lecture 1 (DR Lau) - Sep 2022
No ratings yet
STAT 1005 Lecture 1 (DR Lau) - Sep 2022
29 pages
1 s2.0 S2666920X22000108 Main
No ratings yet
1 s2.0 S2666920X22000108 Main
7 pages
Hedlin Novian Napitupulu Tugas3
No ratings yet
Hedlin Novian Napitupulu Tugas3
7 pages
Haar Like Features: Weak Classifier
No ratings yet
Haar Like Features: Weak Classifier
3 pages
OpenCV Essentials
From Everand
OpenCV Essentials
Oscar Deniz Suarez
No ratings yet

Smartphone-Based Image Captioning For Visually and Hearing Impaired

Uploaded by

Smartphone-Based Image Captioning For Visually and Hearing Impaired

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Smartphone-based Image Captioning for Visually and Hearing Impaired

Conference Paper · November 2019

The user has requested enhancement of the downloaded file.

Automated caption generation has received increasing in-

2.2. LSTM Architecture 3.2. Smartphone Application: Eye of Horus

View publication stats

You might also like