Chatbot Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.

1051/itmconf/20213701019
ICITSD-2021

Virtual AI Assistant for Person with Partial Vision Impairment

Rohith Raghavan1*, Vishodhan Krishnan1, Hitesh Nishad1, and Bushra Shaikh1


1
Department of Information Technology, SIES Graduate School of Technology, Navi Mumbai, India

Abstract: Smartphones help us with almost every activity and task nowadays. The features and
hardware of the phone can be leveraged to make apps for online payment, content
consumption, and creation, accessibility, etc. These devices can also be used to help and assist
visually challenged and guide them in their daily activities. As the visually challenged sometimes
face difficulty in sensing the objects or humans in the surroundings, they require guidance or
help in recognizing objects, human faces, reading text, and other activities. Hence, this Android
application has been proposed to help and assist people with partial vision impairment. The
application will make use of technologies like face detection, object and text recognition, barcode
scanner, and a basic voice-based chatbot which can be used to execute basic commands
implemented through Deep Learning, Artificial Intelligence, and Machine Learning. The
application will be able to detect the number of faces, recognize the object in the camera frame
of the application, read out the text from newspapers, documents, etc, and open the link
detected from the barcode, all given as output to the user in the form of voice.

1 Introduction
2 Literature Review
A Normal person without any disabilities have no
issues with daily work in their life. But, on the other We studied and went through the following research
hand, it is difficult for a partially blind person to carry papers listed below to get more knowledge and ideas
out daily tasks. Actions like reading texts, identifying about the implementation of our project.
objects cannot be performed by them due to their
disability. Making Braille versions of every text is an Tosun et al [1] discussed the process and the
expensive and tedious task. Also, recognizing objects algorithms involved for real-time object detection.They
from a distance is not possible for a visually challenged also compared the various algorithms like YOLOv2,
person. Although there are several applications to help SSD, and faster R- CNN in terms of accuracy.The
and assist the visually challenged, they offer only some paper explained the ML algorithms in brief. YOLOv2
features, making the person install a handful of provided better accuracy and ran on even low fps with
applications for that. So, to overcome the current issues a GPU processor.
faced by a visually challenged person, we have
developed this application that offers convenience and Tembhurne et al. [2] studied the implementation of a
assistance to the visually challenged person. The voice assistant for visually challenged. The paper
application offers text, object recognition, and face discussed the various modules which can be
detection to identify text, objects, and humans. It also implemented in the voice assistant like calls, messages,
offers a chatbot so that the visually challenged person TTS, OCR, etc. The paper also talks about using Maps
can interact with the bot for basic information and API for navigation.
activities

*Corresponding author:[email protected]

© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(http://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

Dahiya et al. [3] elaborates on the R-CNN algorithm in also has a fairly simple and user-friendly UI designed
detail and also compares the accuracy and specifically for the visually impaired.
computational time of R- CNN and faster R-CNN
combined with resnet-50. The paper also discusses the Jakhete et al. [11] discussed about using Single Shot
data preprocessing steps required for feeding the data Detector (SSD) Algorithm to implement Object
into the machine learning model. The framework Detection in an Android Application. The paper lists
proposed in the paper claims an accuracy of 92%. other object recognition algorithms and mentioned the
steps to implement SSD algorithm on an Android
Ahmed et al. [4] discussed using RNN (recurrent application.
neural network) and CNN (convolutional neural
network) for obstacle avoidance and way-finding.
Their work using CNN proved helpful to implement 3 Existing System
object detection using CNN-based algorithms. In this section, we are discussing the features of certain
applications available on the Play Store
Gianani et al [5] described real-time object detection
implemented using OpenCV and also determining the Supersense[12]– it is an application that assists the
position of the object using Euclidean distance. The visually challenged and the features provided by it are
paper also guides the user to the objects through voice Object recognition, Face recognition, and text
output. The paper explains object detection using the recognition Sullivan+[13]–This application also serves
SSD framework and MobileNet architecture which has the same purpose this provides Object Recognition to
an accuracy of 99.61%. This system is designed to describe images, Face recognition and text recognition.
work in an indoor environment.
Envision AI [14]– This application also serves the
Kukade et al. [6] focused on Speech-to-Text, Text-to- same purpose and provides the features that are Face
Speech, Optical Character Recognition, and voice recognition, object recognition
assistance and the proposed system to implement the
same. The paper also discussed the ways of it
LetSeeApp[15]– This application is also for the same
purpose and provides the features of text recognition to
Shishir et al. [7] explained object recognition using read visiting cards as well as credit and debit cards
Tensorflow ML API along with the implementation of
it. They included informative flowcharts for
understanding the process behind it. They also The above-mentioned applications provide more or less
explained the working of OCR and object recognition. similar features (the links to these applications are
This implementation provided accuracy of over 80%. provided in the references section)

Karthik et al. [8] provided an overview of the OCR 4 Proposed System


algorithm and the hindrances faced while the text is
being extracted. They also share the idea of using An Android-based application based on technology and
Raspberry Pi instead of a mobile phone to capture innovation promises to academically empower visually
images. The paper also talks about the future scope of challenged by freeing them of their dependence on
using a GPS location tracker for guidance. visuals by providing the information through an app.

Singh et al. [9] proposed an Android application which This application aims to provide better functionality in
offers text recognition, speech recognition, image an app that makes a partially blind user use it for
recognition and a chatbot for the user to interact with navigation, identification, recognition, and also gaining
the application. The paper proposed using Google information of the outer world. Some of them are listed
Cloud APIs (various APIs which can be used to below:
automate tasks) and Google Dialogflow (a natural
language understanding platform on which chatbot can •The app will contain a chatbot such that we will be
be implemented) to implement various modules instead asking questions about time, weather, or any other kind
of training deep learning models to perform various to obtain information or asking to perform certain
activities. actions the user desires.
•It will detect the objects in real-time and provide the
Sharma et al. [10] focuses on implementing a system necessary information to the user.
offering face recognition, text-to-speech and object •The app will also contain a barcode scanner which
recognition on a web browser which can be opened on will help the user to get information about certain
a mobile device. The paper also talks about adding a products.
feature to add unknown faces to the database at the tap •The app can also help the user detect human faces so
of a button for future reference. The proposed system that the user can understand human presence in the
surrounding and also the number of people in the room.

2
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

•This application will have a text reader which will be Object Recognition:
used to read the text out loud to the user.
Object recognition is the technique to recognize and
Using the app, the person can get help and guidance in label an object detected in an image, video, or real-
day-to-day tasks and activities. time. Object recognition is achieved using machine
learning and deep learning. Object recognition
APP NAME FR` OR` TR` CHATBOT algorithms take the frame from the camera as the input
and then apply a bounding box of a specific size to the
SUPERSENSE YES YES NO NO image and check for the object in the image. If the
SULLIVAN + NO YES YES NO object is found in the image, the algorithm will
LETSEEAPP NO YES YES NO recognize the object. There are two steps to object
recognition –image classification and object
ENVISON AI YES YES NO NO
localization. Image classification predicts the class of
OUR YES YES YES YES the object in an image. Whereas object localization
APPLICATION identifies one or more objects in the image and
Table 1 Comparison of features provided by each drawing the bounding boxes. The object detection
application algorithm will combine both of the tasks and will
NOTE- FR`- FACE RECOGNITION; OR`- OBJECT classify the objects in the image.
RECOGNITION; TR`-TEXT RECOGNITION
NOTE- As the algorithms used in the other
Text Recognition:
applications are unknown to us .We have contacted
their developers but haven’t got any responses yet. So, Text recognition is the technique to detect and identify
we have done the comparison based on features the text which is in printed, handwritten or digital
provided format. Text recognition technology converts the text
in different forms to digital form. It is also called OCR
(Optical Character Recognition). Several APIs exist for
5 Methods various platforms which can be used to implement
OCR.
Face Detection:
For recognizing typed or printed text on objects or
Face detection is a computer technology that is used to
books, the user has to open the application on his
detect human faces in images, videos, or in real-time
smartphone and then select the required option. The
video. Face detection is a broad technology that just
application will identify the text and convert it to
marks or labels the human face identified by the
digital form. The text will then be read out to the user.
application. The key difference between face detection
and recognition is that face detection just identifies the
face whereas face recognition will also label the Chatbot:
person’s name, gender, age, or other attributes. Face
Chatbots are AI-based computer programs that can
detection can be implemented in various fields -
simulate a human conversation. They are also called
security, biometrics, entertainment, law enforcement,
digital assistants as the chatbots can be used to do
etc.
actions and commands given by the user. A chatbot can
process the human conversation, reply to commands
Basic face detection can be achieved through OpenCV and queries or can solve user FAQs as well.
whereas real-time face detection or face detection in
different conditions can be achieved using machine
The key modules behind a chatbot are artificial
learning or deep learning. The face detection
intelligence, natural language processing, user-defined
algorithms start searching for human eyes in the frame
rules, and machine learning which are required to
as it is the easiest to detect.
process the commands or messages sent by the user
and deliver the required feedback.
It then searches for other factors like eyebrows, nose,
ears, and iris. When the algorithm finds the factors in
Chatbots are of two types- task-oriented and data-
the image in the frame, it then applies additional tests
driven. Task-oriented chatbots are designed for a single
and then confirms the detection of the face by labelling
purpose and only generate automated responses. Their
the face with a rectangular box.
interaction is specific and restricted to only FAQs or
basic questions.
Real-time face detection involves motion; hence
traditional algorithms cannot be applied. So, advanced
The answers to the queries are already defined in task-
machine learning and deep learning algorithms are
oriented chatbots. Hence, they can only handle and
used to create models which can detect faces in real-
process basic queries and are the most commonly used
time in various scenarios.
in websites and apps for uer queries.

3
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

Data-driven chatbots or virtual assistants are more The SSD algorithm also performs better when it comes
interactive, sophisticated, and advanced than data- to detecting objects of different shapes and sizes. This
driven ones. is evident from the comparison graph which shows the
These chatbots use NLP, NLU, and ML to learn from difference.
the user’s queries and responses. These chatbots
analyze and use past user interaction data and behavior
to provide responses or feedback to the user’s queries.
Hence, data-driven chatbots get better, efficient, and
precise over time. Amazon Alexa, Google Assistant,
and Apple’s Siri are examples of data-driven chatbots.

Implementation
For implementing object recognition and face
detection, we have chosen to use Tensorflow Lite
(Google’s open-source deep learning framework
designed for on-device processing) framework in our
proposed system.

TensorFlow Lite was chosen as other frameworks like Fig 1 Algorithm’s performance over objects of different
sizes.
Keras (an open-source library that provides a Python
interface for artificial neural networks) and PyTorch
We have used the Tensorflow Object Detection API
(an open-source ML library designed for NLP and
model which uses SSD mobilenet v1. This model is
Computer Vision) do not offer Lite versions for low-
trained over the MS-COCO[19] dataset. The COCO
end devices like smartphones. dataset is a massive object detection dataset which has
330,000 images with over 200,000 labelled images
TF (Tensorflow Lite) also offers various pre-trained consisting of 80 various object categories.
models with commonly used algorithms and datasets
for out-of-the-box usage in projects and
Real-time face detection can be implemented using
applications.Several algorithms like You Only Look algorithms like Multi-Task Cascaded Convolutional
Once (YOLO) algorithm [16], Single Shot Detector Neural Network (MT- CNN)[20], Google FaceNet[21]
(SSD) [17] and Region-based Convolutional Neural
algorithm, and using the OpenCV Haar Cascade[22] and
Network (R-CNN) [18] among others can be used to OpenCV Dlib[23] toolkits.
implement realtime object recognition.
The FaceNet algorithm performed better among the
We chose the SSD algorithm for our project as it offers others as it had a maximum accuracy of 99.63%. So,
a fair trade-off between speed and accuracy over other
we chose to implement the FaceNet algorithm
algorithms which offered either of these parameters.
designed for low-power devices, the MobileFaceNet[24]
model for face detection. The MobileFaceNet model
The following table shows the speed and accuracy offered better speed over others as is evident from the
comparisons.
below graph.
Table 2. Speed and Accuracy comparison among object
detection algorithms.
Method mAP FPS Size Boxes Input

Fast R- 73.2 7 1 6000 1000X


CNN 600
Fast 52.7 155 1 98 448X 448
YOLO
YOLO- 66.4 21 1 98 448X 448
VGG16
SSD300 74.3 46 1 8732 300X 300
SSD512 76.8 19 1 24564 512X 512
SSD300 74.3 59 8 8732 300X 300
SSD512 76.8 22 8 24564 512X 512
Fig 2 Comparison between face detection algorithms w.r.t
time

4
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

The reason behind the fast performance is that the


global average pooling layer has been replaced by a
depth-wise convolutional layer which improves
performance on face detection and recognition.

For the proposed system, we have used the


MobileFaceNet model trained over the Labelled Faces
in the Wild (LFW)[25] dataset. The LFW dataset
contains over 13,000 human faces captured in various
angles and orientations.

Real-time text recognition has been implemented


using Text recognition API from Google’s ML Kit
(Google’s machine learning for Android devices in the
form of a mobile SDK) which provides various
libraries for implementing computer vision-related
recognitions. Google’s ML Kit website offers prebuilt
APIs and packages which can be imported into our
application to implement text recognition.

For implementing the chatbot, we have used AIML Fig 3 Detecting objects
chatbot which uses Python packages like Pyttsx3 (An
offline Python Text to Speech conversion library
(TTS)), nltk (Natural Language Toolkit, a package of Text Recognition
libraries and programs written in Python for
Text Recognition module has been implemented
processing natural language), chatterbot to provide
successfully
feedback to the user as per the queries asked.
Implementing the chatbot requires natural language
processing and artificial intelligence for it to give  Accuracy of 90 %
replies and perform actions. The chatbot will read the  Average run time of 1.4 seconds
command from the user, detect the keywords in the
command, and then will perform the action as
programmed by the developer.

The barcode scanner has been implemented using


Google ML Kit’s Barcode API. The API can directly
be used in the application by importing the app
dependencies and package.

6 Results
Object Recognition
The object recognition module has been implemented
successfully

 Accuracy of 90 %
 Average run time of 1.3 seconds.

Fig 4 Text Detection

5
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

Face Recognition
Face Recognition module has been implemented
successfully

 Accuracy of 85%
 Average run time of 1.2 seconds

Fig 6 User command to chatbot via speech

Fig 5 Detecting human face

Chatbot
Voice-based chatbot has been implemented
successfully

 Offline feature’s like calling, asking for date


and time are working properly
 Online features like asking for weather,
temperature, information on certain products
are working properly if provided suitable
internet connectivity

Fig 7 Chatbot performs the required action

Barcode Scanner
Barcode Scanner has been integrated successfully

 The information on product is provided


correctly with a sufficient internet
connectivity

6
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

B. FACE RECOGNITION
1. Average time taken to perform one task(tested 20
times)

1.5
SUPERSENSE
1.4
SULLIVAN +
1.3
LETSEEAPP
1.2 ENVISON AI
1.1 OUR APP
Time Taken(sec)
Fig 11 –Time taken by each application for face recognition

2. Accuracy- providing correct output (tested 20


times)
Fig 8 Barcode detection
100
Performance of our application compared to 80 SUPERSENSE
others SULLIVAN +
60
A. OBJECT RECOGNITION 40 LETSEEAPP
1. Average time taken to perform one task(tested 20 20 ENVISON AI
times)
0 OUR APP
ACCURACY(PERCENTAGE)
2.5
Fig 12 –Accuracy of each application for face recognition
2 SUPERSENSE
1.5 SULLIVAN + C. TEXT RECOGNITION
1 LETSEEAPP 1. Average time taken to perform one task(tested 20
ENVISON AI times)
0.5
0 OUR APP
2
Time Taken(sec) SUPERSENSE
1.5
Fig 9 –Time taken by each application for object recognition SULLIVAN +
1
2. Accuracy- providing correct output (tested 20 LETSEEAPP
times) 0.5 ENVISON AI
0 OUR APP
95 Time Taken(sec)
90 SUPERSENSE
85 SULLIVAN + Fig 13 –Time taken by each application for text recognition
80
LETSEEAPP
75
70 ENVISON AI
65 OUR APP
ACCURACY(PERCENTAGE)

Fig 10 –Accuracy of each application for object recognition

7
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

2. Accuracy- providing correct output (tested 20 The working of the android app and its modules are
times) explained below.

95
SUPERSENSE
90
SULLIVAN +
85
LETSEEAPP
80
ENVISON AI
75 OUR APP
ACCURACY(PERCENTAGE)
Fig 14- Accuracy of each application for text recognition

Note – the testing was done indoor under tube light


(unnatural but ample brightness) and may vary
according to the surrounding environmental condition

7 Discussion
Fig 16 Application working flowchart

Modules used in ChatBot component of the


Application.

The chatbot only requires the smartphone’s


microphone and Internet access. It offers some useful
functionalities achieved through techniques and
libraries mentioned below:
Pysttsx3 – It is a Python text-to-speech convertor that
even works offline. We have implemented this module
in our project to provide offline text-to-speech
conversion.

This module provides us many features like


• TTS conversion without Internet
• Option to choose different voices
• Change speed or pitch of speech
• Easy-to-use and feature-rich API

Speech Recognition – A technique that is used to


identify the queries of the user and convey it to the
application. Which in turn will start the process it was
requested to perform. This works in such a way that a
keyword is associated with a particular action and
when the keyword is spoken by the user the action will
take place. Google Speech-to-Text has been used for
speech recognition. We have implemented this for
Natural Language Processing in our project
Fig 15 Layout of the application
Natural Language Processing (NLP) – It is broadly
The system (referred to as the android app hereafter) defined as the automatic manipulation of natural
consists of 5 modules-- real-time face detection, real- language, like speech and text, by software. Natural
time object and text recognition, barcode scanner, and language refers to the way humans normally
chatbot. Each of these modules can be easily accessed communicate with each other.
from the android app with the click of a button. The UI This module is used in our project so that user can
of the application has been designed to be user-friendly communicate with their device as they communicate
for partially blind. with fellow human beings.

8
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

Datetime – This module is used to provide the date and The application is designed to capture preview frames
time to the chatbot. This module works offline as its at a resolution of 800*600px. The preview frame, if
working on the data received from the device on which horizontal in orientation is rotated vertically and is
it runs. We have implemented this in our Project so cropped to 400*300px which removes the background
that the user can ask the device for the time and date and retains only the human body.
whenever needed the data from this module is given to
another module (Pyttsx3) This image is then rescaled to 112*112px to be used as
input for the MobileFaceNet model. On feeding the
Web Browser- The user can browse through the web image, the model looks for the face in the image by
using only voice commands given to the chatbot. We matching the face features. It then creates a bounding
have configured this module in such a way that it can box when it detects the face and is highlighted. The
be used to gain information, play music (via an API to number of faces detected in the frame by the app is
access YouTube), provide us with weather report( via then outputted to the user orally using TTS
an API to access The Weather Channel), and get news functionality. This can be useful for the user to know
updates(via an API to access Times of India) and to get the number of people in the room or a certain place.
information on various topics we have also linked it Face detection requires only the smartphone’s camera
with Wikipedia Module. access.

Working of Text Recognition: Working of Object Recognition


Text Recognition API: Google ML Kit is a set of APIs The android app has the object recognition feature
and tools which can be used to deploy and automate where the user can point at an object and the app will
certain applications like text recognition, barcode recognize the object in frame and will output the object
scanning, pose detection, etc. We have used the text name to the user using Text-to-Speech. Object
recognition API in our project. The API will first use recognition has been implemented using SSD neural
OCR to detect the text shown on the camera frame. It network. When the user points to an object, the frame
will then split the text into lines and the lines will be is cropped to 600* 800 and is inputted into the model
split into words. till the whole frame is covered. Based on the
These words will then be sent to the API for confidence level set by the user, the model creates
recognition and the recognized words will be spoken to multiple boxes with different aspect ratios throughout
the user using Google Text-to-Speech (TTS). Google the image and tries to detect the object.
TTS is available by default on all Android devices.
Text recognition only requires the smartphone’s The accuracy of detection of the object depends on the
camera access. confidence level. Once the object is detected, it then
creates a box over the detected object with a label. The
Working of Barcode Scanner: name of the object is then read to the user using TTS.
The object recognition requires only the smartphone’s
Barcode API: Google ML Kit also offers a barcode
camera access.
API which can be used to scan barcodes and QR codes.
The API will detect for any QR code/barcode displayed
on the camera preview frame. 8 Conclusion
After detection, the QR Code/barcode will be read by The proposed android application is designed to help
the API to detect the embedded information or URL. and guide the partially blind in their daily tasks when
The app will automatically open the URL link or will needed. The application has 5 main components,
read out the information from the barcode using namely- text recognition, object recognition, face
Google TTS. The barcode API only requires the detection, chatbot, and barcode scanner. The text and
smartphone’s camera access. object recognition, barcode scanner, face detection, and
chatbot are working as proposed and intended. Several
Working of Face Recognition changes in the text-to-speech module and the output
are yet to be implemented which will be added in the
For implementing face detection, we have used the coming months. This application is intended to work in
MobileFaceNet model, which is an extremely efficient indoor and outdoor conditions provided there is a good
CNN model. The model is just 4.0MB in size and is lighting condition.
designed for smartphones and embedded systems. The
face detection process starts with detecting the human's
faces in the real-time camera preview frame. The Acknowledgment
image is then warped using the detected landmarks like This work is supported by the Department of
eyes, nose, jaws, eyebrows, etc and the face is Information Technology, SIES Graduate School of
captured. This image of the face is then processed and Technology. This Project is also supported by the Head
resized to be fed as input to the Deep Learning model. of the Department Dr. K. Lakshmisudha and Project
Guide Prof. Bushra Shaikh(co-author).

9
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021

References International Conference (PuneCon), Pune, India,


1. S. Tosun and E. Karaarslan, "Real-Time Object 2019
Detection Application for Visually Impaired 12. Supersense:
People: Third Eye," 2018 International Conference https://play.google.com/store/apps/details?id=com.
on Artificial Intelligence and Data Processing mediate. supersense&hl=en_IN&gl=US
(IDAP), Malatya, Turkey 13. Sullivan+:
2. N. M. Tembhurne, S. V. Vaidya, A. Shiekh, S. https://play.google.com/store/apps/details?id=tuat.k
Dravyakar, "Voice Assistant for Visually r.suliva n&hl=en_IN&gl=US
Impaired People”, International Research Journal 14. EnvisionAI:
of Engineering and Technology (IRJET). https://play.google.com/store/apps/details?id=com.l
3. D. Dahiya, H. Gupta and M. K. Dutta, "A Deep etsenvis ion.envisionai&hl=en_IN&gl=US
Learning based Real-Time Assistive Framework 15. LetSeeApp:
for Visually Impaired," 2020 International https://play.google.com/store/apps/details?id=com.l
Conference on Contemporary Computing and etseeapp.letseeapp&hl=en_IN&gl=US
Applications (IC3A), Lucknow, India, 2020 16. “You Only Look Once: Unified, Real-Time
4. F. Ahmed, M. S. Mahmud, and M. Yeasin, "RNN Object Detection”: [1506.02640] You Only Look
and CNN for Way-Finding and Obstacle Once: Unified, Real-Time Object Detection
Avoidance for Visually Impaired," 2019 2nd (arxiv.org)
International Conference on Data Intelligence and
17. SSD: Single Shot MultiBox Detector”:
Security (ICDIS), South Padre Island, TX, USA,
[1512.02325] SSD: Single Shot MultiBox
2019
Detector (arxiv.org)
5. S. Gianani, A. Mehta, T. Motwani, and R. Shende,
18. “Faster R-CNN: Towards Real-Time Object
"JUVO - An Aid for the Visually Impaired", 2018
Detection with Region Proposal
International Conference on Smart City and
Emerging Technology (ICSCET), Mumbai Networks”:[1506.01497] Faster R- CNN:
6. R. Kukade, R. Fengse, K. Rodge, S. Ransing, V. Towards Real-Time Object Detection with
Lomte, "Virtual Personal Assistant for the Region Proposal Networks (arxiv.org)
Blind",2018 International Journal of Computer 19. “COCO - Common Objects in Context”:COCO -
Science and Technology (IJCST), Bali, Indonesia. Common Objects in Context (cocodataset.org)
7. M. A. Khan Shishir, S. Rashid Fahim, F. M. 20. ” Joint Face Detection and Alignment using
Habib, and T. Farah, "Eye Assistant: Using a Multi-task Cascaded Convolutional Networks”
mobile application to help the visually impaired," [1604.02878] Joint Face Detection and
2019 1st International Conference on Advances in Alignment using Multi-task Cascaded
Science, Engineering and Robotics Technology Convolutional Networks (arxiv.org)
(ICASERT), Dhaka, Bangladesh 21. ”FaceNet: A Unified Embedding for Face
8. A.Karthik, V.K.Raja and S.Prabakaran, "Voice Recognition and Clustering”: [1503.03832]
Assistance for Visually Impaired People," 2018 FaceNet: A Unified Embedding for Face
International Conference on Communication, Recognition and Clustering (arxiv.org)
Computing and Internet of Things , Chennai, India 22. GitHub - opencv/opencv: Open Source Computer
9. G. Singh, K. Takhtani, O. Kandale, N. Dadhwal,” Vision Library
A Smart Personal AI Assistant for Visually 23. GitHub - ageitgey/face_recognition: The world's
Impaired People”, Vol 7, Issue 6, pg.1450-54,
simplest facial recognition API for Python and
International Research Journal of Engineering and
the command line
Technology (IRJET)
24. ” MobileFaceNets: Efficient CNNs for Accurate
10. V. Sharma, V. M. Singh, S. Thanneeru,” Virtual
Real-Time Face Verification on Mobile
Assistant for Visually Impaired”, (April 19,
Devices”: [1804.07573] MobileFaceNets:
2020). Available at
SSRN:https://ssrn.com/abstract=3580035 Efficient CNNs for Accurate Real-Time Face
11. S. A. Jakhete, P. Bagmar, A. Dorle, A. Rajurkar Verification on Mobile Devices (arxiv.org)
and P. Pimplikar, "Object Recognition App for 25. LFW Face Dataset: http://vis-
Visually Impaired," 2019 IEEE Pune Section www.cs.umass.edu/lfw

10

You might also like