AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies

The paper presents a deep learning-based approach for object recognition and caption generation aimed at assisting visually impaired individuals. It combines computer vision and natural language processing to create a system that provides real-time audio descriptions of the environment, enhancing navigation and object recognition. The proposed model utilizes recurrent neural networks (RNN) and deep convolutional neural networks (DCNN) to generate captions and guide users through their surroundings safely.

Uploaded by

nuri.omid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies

Uploaded by

nuri.omid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2021International Conference on Computing, Communication, and Intelligent Systems(ICCCIS)

AI Optics: Object recognition and caption generation

for Blinds using Deep Learning Methodologies
2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) | 978-1-7281-8529-3/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICCCIS51004.2021.9397143

Moksh Grover Rajat Rathi Chinkit Kanishk Garg Ravinder

Dept. of C.S.E. Dept. of C.S.E. Manchanda Dept. of C.S.E. Beniwal
HMR ITM HMR ITM Dept. of C.S.E. HMR ITM Dept. of C.S.E.
Delhi, India Delhi, India HMR ITM Delhi, India HMR ITM
mokshmg@gmail. rajatrathi25@gmai Delhi, India kanishkgarg85@g Delhi, India
com l.com chinkitm51@gmai mail.com ravin.beniwal29@
l.com gmail.com

Abstract—With the exponential development in the field of developed to implement range finder and camera for obstacle
artificial intelligence in recent years, many researchers have detection and navigation, the device also used solar panel for
focused their attention towards the topic of image caption charging.
generation. With this topic being that of arduous task and
interest people take it as a challenge to perform to excel in the Computer Vision can be used to implement purposeful
field of AI. Automatic generation of neutral language navigation and object detection for developing a technology
descriptions or ‘captions’ according to the composition detected for visual aid. Purposeful navigation refers to guided
in an image, i.e., scene understanding is the main part of image movement through free space to reach the desired location
caption generation which can be achieved by combining both while prevention from hitting obstacles. [5]. The major
natural language processing along with computer vision. In this challenge is to fast forward the results from the sensors to the
paper, we tackle the task of generating captions by using the processing algorithm and further to the accessible device.
concepts of Deep Learning.
In this paper we propose and end-to-end accessible
Keywords— Artificial Intelligence, Deep Learning, RNN, CNN, solution to provide purposeful acknowledgement and guidance
LSTM. using object recognition and caption generation for enabling
video to audio aid for the visually impaired community of the
I. INTRODUCTION society. The aim of purposeful acknowledgement and
Millions of people around the world face major disability guidance is to extract the range and direction of the obstacles
of visual impairment. Vision provides all the information within a finite and defined free space captured by the camera
needed for reading, body movement, mobility and its loss can of the device. [6]. The object detection and recognition
severely affect an individual‟s professional and social algorithm will be fed results to the caption generation
advancement. It was reported by the World Health algorithm for explaining the scene of the surrounding to the
Organization (WHO) that out of 1.3 billion people that suffer visually impaired in audio format. The same algorithm will
from one or another form of visual impairment, 36 million also be responsible for providing guidance support to the
suffer from complete blindness[1]. blind. The paper specifically contributes the following:
Problems are often faced by people with impaired vision or 1. A real time algorithm for mapping motion in free
complete blindness once they are out of their familiarized space using object detection.
environments. Corporeal development is one of the major 2. Modified and explored version of general caption
issues for the people suffering from impaired vision [2]. They generator for feedback about the surrounding.
also are unable to recognize an object without physically
feeling it and can‟tsavor the beauty of the nature. Many 3. Capable system for providing guidance support
assistive devices have been made commercially available for through free space along with prevention from
the visually impaired community of the society to help them harmful and specific objects like fire, heavy
read and recognize objects, enhancing their experience[3]. traffic road, pointed ends, etc.
Various research works are still being done regarding the Automatic generation of a caption of an image is itself a
visually impaired community. Thorough analysis of a few big hurdle in artificial intelligence that involves connecting
papers has been done to understand the ongoing work and computer vision with natural language processing. [7]. But
technology. A system that was to be worn in a shoe was solution to this problem could prove to be a better
proposed by K. Patil et al that contains ultrasonic sensors in all understanding of the outside world for the visually impaired
sides including vibration sensor, liquid detector and down step people. The task involves object detection and classification
sensor. [4]. Other works included proposing android with high accuracy and accessibility with large of flexibility in
application for navigation and whether forecasting, news inputs along with priorities to various situations. [8]. Previous
reading features using speech recognition and artificial studies have majorly focused on stitching together the
intelligence. Electronic intelligent eye- a device was solutions of the sub problems to form a larger solution and

ISBN: 978-1-7281-8529-3/21/$31.00 ©2021 IEEE 354

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 01,2021 at 10:38:32 UTC from IEEE Xplore. Restrictions apply.
2021International Conference on Computing, Communication, and Intelligent Systems(ICCCIS)

have, therefore, failed in providing appropriate description to Kiros et al [21], proposed a multimodal log-bilinear model
the image. [9]. In this paper, we propose a neural network which was more inclined towards the features of the images.
based probabilistic model for the generation captions for Later method was improved in such a way that to allow a
images using a combination of recurrent neural network natural process of generation and ranking. Donahue et al
(RNN) and deep convolution neural network (DCNN) along applied LSTMs (Long Short-Term Memory) to Video to
with advanced statistical machine translation to obtain higher generated video captions for the Video. LSTM is an artificial
accuracy. [10]. recurrent neural network is used in deep learning. IT has
feedback connections. It is able to process single data as well
The results from the caption generator are fed as the input as entire sequences of data [22,23].
to the guidance support system to map the purposeful motion
of the user in free space using human body motion and
mapping algorithm developed using modifications in CNN
and statistical distance mapping done using python. The
collective results of the proposed models are rendered to text
to speech conversion algorithm in python to provide final
output accessibility to the visually impaired.
II. RELATED WORK
Computer vision technology has been used from a very
long period of time for making description of visual data in the
natural language [11,12]. Various types of systems have been
developed for this purpose one such type of system is where Fig. 1. Hybrid Model overview design
structured formal language is joined with compound System.
Type of such system are not very reliable as they are majorly Fang et al described a three - step pipeline for generation
hand created, have very few domains and are useful only on by inculcating object detection. First process of their model
specific realm. [13]. was to learn detectors for various visual notion. Then a
Recently, object detection or image detection has gained a language trained model was applied to detector results, along
lot of popularity and interest of a large number of people. with the image text embedding space.
There are various kind of advances in the domain which aid in In our work we combined image classification with deep
detecting natural language generation but are restricted in their convolution networks along with recurrent networks to
outcome. Li et al [15] initiates with detection and combines developa single model that produces description (caption) of
them all-together to form a final outcome which consists of the images. The model is motivated by sequence generation,
detected object and relationships. Similarly, Farhadi et al [14] where instead of sentence an image is provided by CNN. A
utilized observations to produce a triplet of the image and Latest work by Mao et al[24] used a NN for Same idea and
changed them into text phrases with the help of template. outcome. Our used method is somewhat similar to Mao‟s
Much greater model based on language parsing have been also approach with signifiable differences: - we have used more
utilized. The above methods have proved themself useful in impactful RNN Model and the image is directly provided to
various conditions but one issue that remains with them is that the model directly. As a result, our system obtains a better
they are highly hand designed and rigid when used for text result. Then we provided a multimodal embedding system
generations. space with RNN and LSTM that is used to remember text.
Various approaches have been also marked on the basis of Hence two separate pathways, i.e., one for image,while the
problem of ranking descriptions [16,17,18]. Inthese kinds of other one for text to construe a joint embedding to produce
method, the approach is to inserting text and images in the speech outcome.
same vector space. Hence, when an image query is passed,
III. BACKGROUND:TYPES OF ARCHITECTURE
those descriptions are fetched which lie near to image in the
vector space.This approach cannot be used to describe new In the first section we constructed a prominent
composition of objects, even if the separate object may have contradistinction amongst architectures that integrates
been noticed in training data.[19]. rhetorical and image attributes in multimodal layer, and even
those which inject the attributes of image straight onto the
Latest image description can also be considered as caption prefix encoding process.
dependent on language modelling which are created using
recurrent neural network (RNN)[25,26,27]. The basic RNN We are also able to differentiate four rhetorical
model is a language model, its basic functioning is to arrest probabilities emerging from these architectures, as also
the probability of developing a string from the words depicted in the figures and briefed as following: -
generated so far. [20]. The RNN here is not only used to
A. Init-Inject Technique
generated the next term of the string but also a set of image‟s
features.So, here the RNN is a hybrid Model that functions The initial state vector of the RNN is about to be an image
and relies on both linguistic as well as visual features. vector (it can be also a vector that is extracted from image
vector). It almost takes same size of the image vector as of the
size of vector of RNN‟s hidden state. This is a static binding

355

framework which also enables the descriptions of image to be Therefore, in prefix the image vector is managed as the
altered by RNN. primary word. The magnitude of both the inputs that is image
vector and word vectors must beequal. This is also a static
binding framework and enables the image description to be
altered by RNN.
C. Par-Inject Technique
The vector that is image vector or either extracted from
image vector andthe word vector of caption prefix both
simultaneously servers as inputs to the RNN in two ways or
terminologies that are:-
a) Both the inputs are combined to form a single input
(image vector is combined with word vector that is
being forwarded to RNN.
b) RNN can also handle two discrete inputs.
As our previous possibilities both image vector and word
vector of caption prefix needs to be the same size but, in this
case, it is not required that every word vector has a
corresponding image vector. Also, it is not required that each
and every image vector and word vector must be similar.
Therefor this is not a static binding architecture but rather is
mixed unlike our previous case. Little bit of modification in
also allowed while representation of the image. As it would be
quite a task for RNN if every image that is fed to RNN is
exactly the same as at its hidden state vector is refreshed with
the same image every single time.
D. Merge Technique
The vector that is either derived from image or image
vector is not exposed to the RNN at any instance. Rather than
that the image is set forth in the language model following by
the encoding of the prefix by the RNN. This is an example of
a late binding architecture. [28].
We also don‟t require to alter the image at every time step
during its representation. With these variations or possibilities,
we are required to consider about a selection process of these
above contributions.
IV. PROCEDURE
A. Prepare Photo And Text Data
For this following experiment we have used Flickr8K
dataset which consists of two parts: „Flickr8k_Dataset‟ which
contains 8092 different type of photographs in „.jpg‟ or
JPEG/JPG format and „Flickr8k_text‟ which contains a
number of text (.txt) files containing various sources of
rawdescriptions for the given photographs.
Flickr8k dataset is separated into three sections:
a) For training purposes, we are provided with 6,000
images,
Fig. 2. Multiple techniques of constraining a neural language model
alongside an image. b) For testing purposes, we are provided with 1,000
images,
B. Pre-Inject Technique
c) And for validating purposes we are provided with
RNN takes two inputs the first one isa vector that is
1,000 images.
extracted from image vector, i.e., image vector. The second
input that is word vector comes into play later. There are five different captions for each image. Using the
VGG (Visual Geometry Group) class we load the VGG model

356

in Keras. we are curious about the photo‟s internal layer, which is followed by a special kind of RNN layer called
representation prior to classification is produced and not in the as Long Short-Term Memory (LSTM) layer.
classification of images. From the pre-trained VGG-CNN we c) Decoding Layer: We receive a fixed-layer vector as
extract the 4096 element image feature vectors that are also an output from both sequence processor and feature extractor.
available in the distributed datasets. During pre-processing Finally, these both layers are integrated together and to make
these image vectors are normalized to unit length.
a concluding prediction, while these layers are processed by a
There is a unique identifier for each photograph which Dense Layer.
maps to a list of one or more textual description. these
description texts need to be cleaned. These descriptions are
easy to work with and already tokenized. Finally, we can
summarize the size of vocabulary once we have cleaned the
texts.
B. Develop Deep Learning Model
This section is divided into the following parts:
d) Loading the Datasets.
e) Defining the Caption Generation Model.
A. Loading the Datasets
All of the photograph along with their captions of the Fig. 4. Basic description of the model.
training dataset will be used to train the model. We can
extract the photo identifiers using these file names. These
identifiers are used to filter-out descriptions and photos for
each set. A caption will be generated for a photograph that
will be passed as an input for the model which will be
created after a sequence of previously generated words are
passed as an input, i.e., it will be generated one word at a
time. In the final step we remove the „startseq‟ and
„endseq‟ tokens and we have a base of our automatic
caption generation model. For example, for “black dog is
running in the water” as the input sequence we would have
8 input-output pairs for training of the model:

Fig. 3. Photo Identifiers being extracted using the file names.

The model will be trained in this fashion. Input text

will be passed to a word embedded layer after they are
encoded as integers. While we directly pass the photo
features to the model in another section.
B. Defining the Caption Generation model
The model can be described in the following three
parts:
a) Feature Extractor Layer: This is a 16-layer Visual Fig. 5. Algorithm used in the model defined layer by layer.
Geometry Group model. This model is pre-trained on the
ImageNet dataset. The photos are pre-processed with the VGG The basic neural language model is made use of as a
model after removing its output layer and the extracted component of two dissimilar architectures in this
features predicted by this model are used as input. experiment: inject architecture and merge architecture.
b) Sequence Processing Layer: The second layer is for Image vectors are interconnected sequentially with every
one of the word vector in a caption in the Inject
handling the text input for which we use a word embedding

357

Architecture. While in the merge architecture, image The following are our results of the BLEU Score of our
vector is connected sequentially with the final LSTM state. model:
An input of photographic features of a vector of 4,066
elements is expected at the Photo Feature Extractor Model,
which are further processed by a dense layer. This layer
compresses 4,066 elements used to represent the
photograph to 256 elements.
While the Sequence Processor Model expects a
predefined length of 34 words as an input sequence. This
sequence is inputted into the „Embedding Layer‟ in which
the padded values are masked. After which a Long Short- Fig. 7. BLEU Score values of model created.
Term Memory of 256 memory unit is attached.
A 256-element vector is produced by both of the input We can see that the scores fit within expectancy range of
models. To reduce overfitting of the training dataset a 50% aappropriate model on the specifiedquery and that too close to
dropout regulation is applied on both of the input models, the top of the range.
which results in fast model configuration and learning. B. Generate Captions
The vectors that are output of both of the input models We load the photograph of which we want to generate the
are merged in the Decoder model using an addition caption and extract the features from it. We achieved this by
operation. The product of addition operation is fed to a implementing the VGG-16 model after redefining our existing
256-neuron dense layer followed by another dense layer model else we can predict features using the VGG model and
for the final output. This final layer makes arecursive provide them to the existing model as the input.
prediction over the entire set of vocabulary output by using
the next word in the sequence using the „Softmax‟.
Adam optimization algorithm was used to perform the
training of this model with default parameters and 50
captions as the mini batch size. While sum cross-entropy
was used as the cost function. An early stopping criterion
was applied during the training. After each training epoch
program measures the validation performance and once the
performance on the validation data began to deteriorate
training gets terminated.
V. RESULT
A. Evaluate the model
We have generated descriptions for all of the photographs
present in the Flickr8K‟s testing dataset and then evaluate our
model by evaluating these predictions with a standard cost
function. We evaluate the actual and the generated
descriptions by summarizing how resembling the predicted
text is to the expected text. We accomplish this task by using
the corpus BLEU score which are used during text translation,
i.e., for the evaluation of generated translation text against a
few reference translations. To evaluate the skills of any model
we calculate the BLEU Scores for the 1, 2, 3 & 4 cumulative
n-grams. We have some ball-park BLEU scores as a reference
for skillful models when evaluated on the test dataset used in
our experiment:

Fig. 6. BLEU Score range of a good model. Fig. 8. Input images provided to the model.

358

But how to generate a caption using our trained model? and Pattern Recognition (CVPR). In stitute of Electrical and Electronics
Initially we pass the „startseq‟, i.e., the starting description Engineers (IEEE), jun.
token, generate one word and then recursively call the model [9] Desmond Elliott and Frank Keller. 2013. Image De scription using
Visual Dependency Representations. In Proc. EMNLP‟13, pages 1292–
again and again and pass the generated words as an input until 1302, Seattle, WA. Association for Computational Linguistics.
maximum description length is reached or „endseq‟, i.e., end [10] Manchanda, C., Sharma, N., Rathi, R., Bhushan, B., & Grover, M.
sequence token is reached. In the final step we remove the (2020). Neoteric Security and Privacy Sanctuary Technologies in Smart
„startseq‟ and „endseq‟ tokens and we have our caption for the Cities. 2020 IEEE 9th International Conference on Communication
photograph we passed for caption generation. Systems and Network Technologies (CSNT).
doi:10.1109/csnt48778.2020.9115780
[11] R. Gerber and H.-H. Nagel. Knowledge representation forthe generation
of quantified natural language descriptions ofvehicle traffic in image
sequences. In ICIP. IEEE, 1996.
[12] B. Z. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu. I2t:Image
parsing to text description. Proceedings of the IEEE,98(8), 2010.
[13] Rustagi, A., Manchanda, C., & Sharma, N. (2020). IoE: A Boon &
Threat to the Mankind. 2020 IEEE 9th International Conference on
Fig. 9. Output captions generated by the model Communication Systems and Network Technologies (CSNT).
doi:10.1109/csnt48778.2020.9115748
VI. CONCLUSION [14] A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young,C. Rashtchian, J.
Hockenmaier, and D. Forsyth. Every picturetells a story: Generating
Various kinds of models are available today for video sentences from images. InECCV,2010.
caption, image retrieval, image caption with their performance
[15] S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi.
ability and the test results depicted that this system has greater Composingsimple image descriptions using web-scale n-grams. In
performance. The modal basically focuses on the three Conference on Computational Natural Language Learning, 2011.
important criteria:- first being on the generation of complete [16] M. Hodosh, P. Young, and J. Hockenmaier. Framing imagedescription
natural language sentences, second being on making the as a ranking task: Data, models and evaluation metrics. JAIR, 47, 2013.
generated sentence semantically and grammarly correct and [17] Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, andS. Lazebnik.
third making the caption consistent with the image. Improving image-sentence embeddings using large weakly annotated
photo collections. In ECCV, 2014.
REFERENCES [18] V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describingimages
using 1 million captioned photographs. In NIPS, 2011.
[1] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Mar garet Mitchell,
Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual [19] Sharma, N., Kaushik, I., Rathi, R., & Kumar, S. (2020). Evaluation of
Question Answer ing. In Proc. ICCV‟15, pages 2425–2433, Santiago, Accidental Death Records Using Hybrid Genetic Algorithm. SSRN
Chile. IEEE. Electronic Journal. doi: 10.2139/ssrn.3563084
[2] Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic [20] Rathi, R., Sharma, N., Manchanda, C., Bhushan, B., & Grover, M.
metric for MT evaluation with improved correlation with human (2020). Security Challenges & Controls in Cyber Physical System. 2020
judgments. In Proce. Work shop on Intrinsic and Extrinsic Evaluation IEEE 9th International Conference on Communication Systems and
Measures for Machine Translation and/or Summarization, vol ume 29, Network Technologies (CSNT). doi:10.1109/csnt48778.2020.9115778
pages 65–72. [21] R. Kiros and R. Z. R. Salakhutdinov. Multimodal neural
[3] Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut languagemodels. In NIPS Deep Learning Workshop, 2013.
Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, and Barbara [22] Rustagi, A., Manchanda, C., Sharma, N., & Kaushik, I. (2020).
Plank. 2016. Automatic Description Generation from Im ages: A Survey Depression Anatomy Using Combinational Deep Neural Network.
of Models, Datasets, and Evaluation Measures. Journal of Artificial Advances in Intelligent Systems and Computing International
Intelligence Research, 55:409–442. Conference on Innovative Computing and Communications, 19-33.
[4] Xinlei Chen and C. Lawrence Zitnick. 2015. Mind‟s eye: A recurrent doi:10.1007/978-981-15-5148-2_3.
visual representation for image cap tion generation. In Proc. CVPR‟15. [23] Grover, M., Sharma, N., Bhushan, B., Kaushik, I., & Khamparia, A.
Institute of Elec trical and Electronics Engineers (IEEE), jun. (2020). 6 Malware Threat Analysis of IoT Devices Using Deep Learning
[5] Manchanda, C., Rathi, R., & Sharma, N. (2019). Traffic Density Neural Network Methodologies. Security and Trust Issues in Internet of
Investigation & Road Accident Analysis in India using Deep Learning. Things: Blockchain to the Rescue, 123.
2019 International Conference on Computing, Communication, and [24] J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille. Explainimages with
Intelligent Systems (ICCCIS). doi: 10.1109/icccis48478.2019.8974528 multimodal recurrent neural networks. In arXiv:1410.1090,2014.
[6] Grover, M., Verma, B., Sharma, N., & Kaushik, I. (2019). Traffic [25] K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares,H. Schwenk, and
control using V-2-V Based Method using Reinforcement Learning. 2019 Y. Bengio. Learning phrase representations using RNN encoder-decoder
International Conference on Computing, Communication, and Intelligent for statistical machine translation. In EMNLP, 2014.
Systems (ICCCIS). doi: 10.1109/icccis48478.2019.8974540 [26] D. Bahdanau, K. Cho, and Y. Bengio. Neural machinetranslation by
[7] Harjani, M., Grover, M., Sharma, N., & Kaushik, I. (2019). Analysis of jointly learning to align and translate. arXiv:1409.0473, 2014.
Various Machine Learning Algorithm for Cardiac Pulse Prediction. 2019 [27] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequencelearning
International Conference on Computing, Communication, and with neural networks. In NIPS, 2014.
Intelligent Systems (ICCCIS). doi: 10.1109/icccis48478.2019.8974519
[28] Denkowski, Michael and Lavie, Alon. Meteor universal:
[8] Jeff Donahue, Lisa Anne Hendricks, Sergio Guadar rama, Marcus Languagespecific translation evaluation for any target language. In
Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. Proceedings of the EACL 2014 Workshop on Statistical Machine
2015. Long-term Recurrent Convolutional Networks for Visual Recog Translation, 2014.
nition and Description. In 2015 IEEE Conference on Computer Vision

359

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 01,2021 at 10:38:32 UTC from IEEE Xplore. Restrictions apply.

SQL Query Interview Questions and Answers
100% (2)
SQL Query Interview Questions and Answers
4 pages
How To Use Matlab and Simulink With Arduino
No ratings yet
How To Use Matlab and Simulink With Arduino
16 pages
21st Century Challenges of Strategic Marketing Management
0% (1)
21st Century Challenges of Strategic Marketing Management
27 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Computer Application 2 Use of Packages
No ratings yet
Computer Application 2 Use of Packages
118 pages
Voice Assisted Object Detection For Visually Impaired
No ratings yet
Voice Assisted Object Detection For Visually Impaired
5 pages
FEMAP Commands
No ratings yet
FEMAP Commands
519 pages
Lesson4.2 Deciles and Percentile
No ratings yet
Lesson4.2 Deciles and Percentile
9 pages
Haryana Hospital Information Sysyem Request For Proposals Volume 1 Final
No ratings yet
Haryana Hospital Information Sysyem Request For Proposals Volume 1 Final
155 pages
A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq
No ratings yet
A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq
51 pages
Detailed Syllabus 2D and 3D Animation
No ratings yet
Detailed Syllabus 2D and 3D Animation
4 pages
FI Config Basic
No ratings yet
FI Config Basic
7 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Answers For Chapter 2: How Organisations Use ICT 1: Questions 1 To 3 Relate To This Scenario
0% (1)
Answers For Chapter 2: How Organisations Use ICT 1: Questions 1 To 3 Relate To This Scenario
5 pages
Phase 2 Report
No ratings yet
Phase 2 Report
79 pages
Cinematic Derealization
No ratings yet
Cinematic Derealization
10 pages
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
3 pages
"Vehicle Information System": Gitarattan International Business School
No ratings yet
"Vehicle Information System": Gitarattan International Business School
6 pages
Ai-Powered Based Blind and Visually Impaired System For Smart Glass
No ratings yet
Ai-Powered Based Blind and Visually Impaired System For Smart Glass
5 pages
THIRD EYE 360° Object Detection and Assistance For Visually Impaired People
No ratings yet
THIRD EYE 360° Object Detection and Assistance For Visually Impaired People
8 pages
Ai Glass 1
No ratings yet
Ai Glass 1
6 pages
FB Source
No ratings yet
FB Source
9 pages
Chatbot Paper
No ratings yet
Chatbot Paper
10 pages
AI Enabled Smart Glasses
No ratings yet
AI Enabled Smart Glasses
8 pages
Project 2 Instructions
No ratings yet
Project 2 Instructions
2 pages
Object Detection System With Voice Alert For Blind
No ratings yet
Object Detection System With Voice Alert For Blind
7 pages
Irjet V5i12304 PDF
No ratings yet
Irjet V5i12304 PDF
7 pages
Project Diary
No ratings yet
Project Diary
26 pages
Object Detection and Recognition Using TensorFlow For Blind People
No ratings yet
Object Detection and Recognition Using TensorFlow For Blind People
6 pages
HP Designjet 650C Printer - Product Specifications - Bpp01852 - HP Business Support Center
No ratings yet
HP Designjet 650C Printer - Product Specifications - Bpp01852 - HP Business Support Center
6 pages
Technical Review Paper - Assisted Technologies For Challenged and Elderly People
No ratings yet
Technical Review Paper - Assisted Technologies For Challenged and Elderly People
23 pages
Electronics 11 03335 v2
No ratings yet
Electronics 11 03335 v2
22 pages
Pehlivan 2019
No ratings yet
Pehlivan 2019
4 pages
Project Report Group 2
No ratings yet
Project Report Group 2
27 pages
Review 2
No ratings yet
Review 2
30 pages
A Context-Aware Artificial Intelligence-Based System To Support Street Crossings For Pedestrians With Visual Impairments
No ratings yet
A Context-Aware Artificial Intelligence-Based System To Support Street Crossings For Pedestrians With Visual Impairments
19 pages
Image Captionbot For Assistive Technology
No ratings yet
Image Captionbot For Assistive Technology
3 pages
Report
No ratings yet
Report
25 pages
MSP430fg4618 Lab Manual
No ratings yet
MSP430fg4618 Lab Manual
89 pages
Smart Navigation System For The Visually Impaired Using Tensorflow
No ratings yet
Smart Navigation System For The Visually Impaired Using Tensorflow
14 pages
Blind Assistance
No ratings yet
Blind Assistance
16 pages
SystemVerilog Basics Part 1
No ratings yet
SystemVerilog Basics Part 1
27 pages
Blind
No ratings yet
Blind
24 pages
Presentation 3
No ratings yet
Presentation 3
24 pages
1240 SCEECS25 Review
No ratings yet
1240 SCEECS25 Review
12 pages
Seminar PPT Jorna
No ratings yet
Seminar PPT Jorna
16 pages
Assistive Technology For Visual Impairment
No ratings yet
Assistive Technology For Visual Impairment
15 pages
15 Object+Detection+
No ratings yet
15 Object+Detection+
8 pages
A Novel Based Intelligent Spectacles For Visually Impaired
No ratings yet
A Novel Based Intelligent Spectacles For Visually Impaired
9 pages
Slides Nest
No ratings yet
Slides Nest
26 pages
Rem615 Ansi 5.0 Fp1 Broch 1mac259118-Db End
No ratings yet
Rem615 Ansi 5.0 Fp1 Broch 1mac259118-Db End
8 pages
Virtual Smart Glass For Blind Using Object Detection
No ratings yet
Virtual Smart Glass For Blind Using Object Detection
6 pages
Blind Assistance Full
No ratings yet
Blind Assistance Full
12 pages
Object Detection and Translation For Bli
No ratings yet
Object Detection and Translation For Bli
6 pages
Project 1
No ratings yet
Project 1
14 pages
First Review 1MS21LVS06
No ratings yet
First Review 1MS21LVS06
12 pages
Blind's Sonar Presentation Document
No ratings yet
Blind's Sonar Presentation Document
10 pages
Smart Glasses A Visual Assistant For The Blind
No ratings yet
Smart Glasses A Visual Assistant For The Blind
6 pages
Scene Description
No ratings yet
Scene Description
6 pages
Final Invision
No ratings yet
Final Invision
6 pages
Final Invision
No ratings yet
Final Invision
6 pages
Blind Assistance System
No ratings yet
Blind Assistance System
8 pages
AI Assistant For Visually Impaired 3
No ratings yet
AI Assistant For Visually Impaired 3
6 pages
Designingan Obstacle Detectionand Alerting Systemfor Visually Impaired Peopleon Sidewalks
No ratings yet
Designingan Obstacle Detectionand Alerting Systemfor Visually Impaired Peopleon Sidewalks
5 pages
An AI-Based Visual Aid With Integrated Reading Assistant For The Completely Blind
No ratings yet
An AI-Based Visual Aid With Integrated Reading Assistant For The Completely Blind
11 pages
Object Detection Research Paper
No ratings yet
Object Detection Research Paper
4 pages
Project Report Review 02
No ratings yet
Project Report Review 02
9 pages
Certificate
No ratings yet
Certificate
1 page
Enterprise Vault & Netbackup Integration
No ratings yet
Enterprise Vault & Netbackup Integration
26 pages
Final
No ratings yet
Final
5 pages
Smartglasses For Visually Impaired
No ratings yet
Smartglasses For Visually Impaired
7 pages
Smart Glasses For Blind - A Personal Assistant Using Paper
No ratings yet
Smart Glasses For Blind - A Personal Assistant Using Paper
4 pages
Dsa Nov 2024
No ratings yet
Dsa Nov 2024
3 pages
Ijcrt July Student 2022
No ratings yet
Ijcrt July Student 2022
5 pages
Investigating Use Cases of AI-Powered Scene Description Applications For Blind and Low Vision People
No ratings yet
Investigating Use Cases of AI-Powered Scene Description Applications For Blind and Low Vision People
21 pages
An Real Time Object Detection Method For Visually Impaired Using Machine Learning
No ratings yet
An Real Time Object Detection Method For Visually Impaired Using Machine Learning
6 pages
Third Eye An Aid For Visually Impaired 1
No ratings yet
Third Eye An Aid For Visually Impaired 1
6 pages
AI Powered Glasses For Visually Impaired Person
No ratings yet
AI Powered Glasses For Visually Impaired Person
6 pages
2303 07451 PDF
No ratings yet
2303 07451 PDF
6 pages
Fin Irjmets1680450865
No ratings yet
Fin Irjmets1680450865
4 pages
Assistive Technology For Visually Impaired Using Tensor Flow Object Detection in Raspberry Pi and Coral USB Accelerator
No ratings yet
Assistive Technology For Visually Impaired Using Tensor Flow Object Detection in Raspberry Pi and Coral USB Accelerator
4 pages
Final Year Project Synopsis
No ratings yet
Final Year Project Synopsis
5 pages
Artificial Intelligence in Virtual Reality For Blind and Low Vision Individuals: Literature Review
No ratings yet
Artificial Intelligence in Virtual Reality For Blind and Low Vision Individuals: Literature Review
6 pages
Instructor Instructor Dr. Zulfiqar Ali Khan Dr. Zulfiqar Ali Khan
No ratings yet
Instructor Instructor Dr. Zulfiqar Ali Khan Dr. Zulfiqar Ali Khan
20 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
1 page
Audi-Exchange: AI-Guided Hand-Based Actions To Assist Human-Human Interactions For The Blind and The Visually Impaired
No ratings yet
Audi-Exchange: AI-Guided Hand-Based Actions To Assist Human-Human Interactions For The Blind and The Visually Impaired
9 pages
Fault Detection On Transmission Lines Using Artificial Neural Network
No ratings yet
Fault Detection On Transmission Lines Using Artificial Neural Network
6 pages
Comment: Published May 11, 2022 S2589-7500 (22) 00092-9 See Page E406
No ratings yet
Comment: Published May 11, 2022 S2589-7500 (22) 00092-9 See Page E406
2 pages
Answer The Following Questions.: Print Post Test
No ratings yet
Answer The Following Questions.: Print Post Test
5 pages
It 2 4 II Internal Me Ra NV
No ratings yet
It 2 4 II Internal Me Ra NV
2 pages
Methods of Qual-WPS Office
No ratings yet
Methods of Qual-WPS Office
2 pages

AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies

Uploaded by

AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies

Uploaded by

2021International Conference on Computing, Communication, and Intelligent Systems(ICCCIS)

AI Optics: Object recognition and caption generation

Moksh Grover Rajat Rathi Chinkit Kanishk Garg Ravinder

ISBN: 978-1-7281-8529-3/21/$31.00 ©2021 IEEE 354

Fig. 3. Photo Identifiers being extracted using the file names.

The model will be trained in this fashion. Input text

You might also like