Chatbot Paper
Chatbot Paper
Chatbot Paper
1051/itmconf/20213701019
ICITSD-2021
Abstract: Smartphones help us with almost every activity and task nowadays. The features and
hardware of the phone can be leveraged to make apps for online payment, content
consumption, and creation, accessibility, etc. These devices can also be used to help and assist
visually challenged and guide them in their daily activities. As the visually challenged sometimes
face difficulty in sensing the objects or humans in the surroundings, they require guidance or
help in recognizing objects, human faces, reading text, and other activities. Hence, this Android
application has been proposed to help and assist people with partial vision impairment. The
application will make use of technologies like face detection, object and text recognition, barcode
scanner, and a basic voice-based chatbot which can be used to execute basic commands
implemented through Deep Learning, Artificial Intelligence, and Machine Learning. The
application will be able to detect the number of faces, recognize the object in the camera frame
of the application, read out the text from newspapers, documents, etc, and open the link
detected from the barcode, all given as output to the user in the form of voice.
1 Introduction
2 Literature Review
A Normal person without any disabilities have no
issues with daily work in their life. But, on the other We studied and went through the following research
hand, it is difficult for a partially blind person to carry papers listed below to get more knowledge and ideas
out daily tasks. Actions like reading texts, identifying about the implementation of our project.
objects cannot be performed by them due to their
disability. Making Braille versions of every text is an Tosun et al [1] discussed the process and the
expensive and tedious task. Also, recognizing objects algorithms involved for real-time object detection.They
from a distance is not possible for a visually challenged also compared the various algorithms like YOLOv2,
person. Although there are several applications to help SSD, and faster R- CNN in terms of accuracy.The
and assist the visually challenged, they offer only some paper explained the ML algorithms in brief. YOLOv2
features, making the person install a handful of provided better accuracy and ran on even low fps with
applications for that. So, to overcome the current issues a GPU processor.
faced by a visually challenged person, we have
developed this application that offers convenience and Tembhurne et al. [2] studied the implementation of a
assistance to the visually challenged person. The voice assistant for visually challenged. The paper
application offers text, object recognition, and face discussed the various modules which can be
detection to identify text, objects, and humans. It also implemented in the voice assistant like calls, messages,
offers a chatbot so that the visually challenged person TTS, OCR, etc. The paper also talks about using Maps
can interact with the bot for basic information and API for navigation.
activities
*Corresponding author:[email protected]
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(http://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
Dahiya et al. [3] elaborates on the R-CNN algorithm in also has a fairly simple and user-friendly UI designed
detail and also compares the accuracy and specifically for the visually impaired.
computational time of R- CNN and faster R-CNN
combined with resnet-50. The paper also discusses the Jakhete et al. [11] discussed about using Single Shot
data preprocessing steps required for feeding the data Detector (SSD) Algorithm to implement Object
into the machine learning model. The framework Detection in an Android Application. The paper lists
proposed in the paper claims an accuracy of 92%. other object recognition algorithms and mentioned the
steps to implement SSD algorithm on an Android
Ahmed et al. [4] discussed using RNN (recurrent application.
neural network) and CNN (convolutional neural
network) for obstacle avoidance and way-finding.
Their work using CNN proved helpful to implement 3 Existing System
object detection using CNN-based algorithms. In this section, we are discussing the features of certain
applications available on the Play Store
Gianani et al [5] described real-time object detection
implemented using OpenCV and also determining the Supersense[12]– it is an application that assists the
position of the object using Euclidean distance. The visually challenged and the features provided by it are
paper also guides the user to the objects through voice Object recognition, Face recognition, and text
output. The paper explains object detection using the recognition Sullivan+[13]–This application also serves
SSD framework and MobileNet architecture which has the same purpose this provides Object Recognition to
an accuracy of 99.61%. This system is designed to describe images, Face recognition and text recognition.
work in an indoor environment.
Envision AI [14]– This application also serves the
Kukade et al. [6] focused on Speech-to-Text, Text-to- same purpose and provides the features that are Face
Speech, Optical Character Recognition, and voice recognition, object recognition
assistance and the proposed system to implement the
same. The paper also discussed the ways of it
LetSeeApp[15]– This application is also for the same
purpose and provides the features of text recognition to
Shishir et al. [7] explained object recognition using read visiting cards as well as credit and debit cards
Tensorflow ML API along with the implementation of
it. They included informative flowcharts for
understanding the process behind it. They also The above-mentioned applications provide more or less
explained the working of OCR and object recognition. similar features (the links to these applications are
This implementation provided accuracy of over 80%. provided in the references section)
Singh et al. [9] proposed an Android application which This application aims to provide better functionality in
offers text recognition, speech recognition, image an app that makes a partially blind user use it for
recognition and a chatbot for the user to interact with navigation, identification, recognition, and also gaining
the application. The paper proposed using Google information of the outer world. Some of them are listed
Cloud APIs (various APIs which can be used to below:
automate tasks) and Google Dialogflow (a natural
language understanding platform on which chatbot can •The app will contain a chatbot such that we will be
be implemented) to implement various modules instead asking questions about time, weather, or any other kind
of training deep learning models to perform various to obtain information or asking to perform certain
activities. actions the user desires.
•It will detect the objects in real-time and provide the
Sharma et al. [10] focuses on implementing a system necessary information to the user.
offering face recognition, text-to-speech and object •The app will also contain a barcode scanner which
recognition on a web browser which can be opened on will help the user to get information about certain
a mobile device. The paper also talks about adding a products.
feature to add unknown faces to the database at the tap •The app can also help the user detect human faces so
of a button for future reference. The proposed system that the user can understand human presence in the
surrounding and also the number of people in the room.
2
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
•This application will have a text reader which will be Object Recognition:
used to read the text out loud to the user.
Object recognition is the technique to recognize and
Using the app, the person can get help and guidance in label an object detected in an image, video, or real-
day-to-day tasks and activities. time. Object recognition is achieved using machine
learning and deep learning. Object recognition
APP NAME FR` OR` TR` CHATBOT algorithms take the frame from the camera as the input
and then apply a bounding box of a specific size to the
SUPERSENSE YES YES NO NO image and check for the object in the image. If the
SULLIVAN + NO YES YES NO object is found in the image, the algorithm will
LETSEEAPP NO YES YES NO recognize the object. There are two steps to object
recognition –image classification and object
ENVISON AI YES YES NO NO
localization. Image classification predicts the class of
OUR YES YES YES YES the object in an image. Whereas object localization
APPLICATION identifies one or more objects in the image and
Table 1 Comparison of features provided by each drawing the bounding boxes. The object detection
application algorithm will combine both of the tasks and will
NOTE- FR`- FACE RECOGNITION; OR`- OBJECT classify the objects in the image.
RECOGNITION; TR`-TEXT RECOGNITION
NOTE- As the algorithms used in the other
Text Recognition:
applications are unknown to us .We have contacted
their developers but haven’t got any responses yet. So, Text recognition is the technique to detect and identify
we have done the comparison based on features the text which is in printed, handwritten or digital
provided format. Text recognition technology converts the text
in different forms to digital form. It is also called OCR
(Optical Character Recognition). Several APIs exist for
5 Methods various platforms which can be used to implement
OCR.
Face Detection:
For recognizing typed or printed text on objects or
Face detection is a computer technology that is used to
books, the user has to open the application on his
detect human faces in images, videos, or in real-time
smartphone and then select the required option. The
video. Face detection is a broad technology that just
application will identify the text and convert it to
marks or labels the human face identified by the
digital form. The text will then be read out to the user.
application. The key difference between face detection
and recognition is that face detection just identifies the
face whereas face recognition will also label the Chatbot:
person’s name, gender, age, or other attributes. Face
Chatbots are AI-based computer programs that can
detection can be implemented in various fields -
simulate a human conversation. They are also called
security, biometrics, entertainment, law enforcement,
digital assistants as the chatbots can be used to do
etc.
actions and commands given by the user. A chatbot can
process the human conversation, reply to commands
Basic face detection can be achieved through OpenCV and queries or can solve user FAQs as well.
whereas real-time face detection or face detection in
different conditions can be achieved using machine
The key modules behind a chatbot are artificial
learning or deep learning. The face detection
intelligence, natural language processing, user-defined
algorithms start searching for human eyes in the frame
rules, and machine learning which are required to
as it is the easiest to detect.
process the commands or messages sent by the user
and deliver the required feedback.
It then searches for other factors like eyebrows, nose,
ears, and iris. When the algorithm finds the factors in
Chatbots are of two types- task-oriented and data-
the image in the frame, it then applies additional tests
driven. Task-oriented chatbots are designed for a single
and then confirms the detection of the face by labelling
purpose and only generate automated responses. Their
the face with a rectangular box.
interaction is specific and restricted to only FAQs or
basic questions.
Real-time face detection involves motion; hence
traditional algorithms cannot be applied. So, advanced
The answers to the queries are already defined in task-
machine learning and deep learning algorithms are
oriented chatbots. Hence, they can only handle and
used to create models which can detect faces in real-
process basic queries and are the most commonly used
time in various scenarios.
in websites and apps for uer queries.
3
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
Data-driven chatbots or virtual assistants are more The SSD algorithm also performs better when it comes
interactive, sophisticated, and advanced than data- to detecting objects of different shapes and sizes. This
driven ones. is evident from the comparison graph which shows the
These chatbots use NLP, NLU, and ML to learn from difference.
the user’s queries and responses. These chatbots
analyze and use past user interaction data and behavior
to provide responses or feedback to the user’s queries.
Hence, data-driven chatbots get better, efficient, and
precise over time. Amazon Alexa, Google Assistant,
and Apple’s Siri are examples of data-driven chatbots.
Implementation
For implementing object recognition and face
detection, we have chosen to use Tensorflow Lite
(Google’s open-source deep learning framework
designed for on-device processing) framework in our
proposed system.
TensorFlow Lite was chosen as other frameworks like Fig 1 Algorithm’s performance over objects of different
sizes.
Keras (an open-source library that provides a Python
interface for artificial neural networks) and PyTorch
We have used the Tensorflow Object Detection API
(an open-source ML library designed for NLP and
model which uses SSD mobilenet v1. This model is
Computer Vision) do not offer Lite versions for low-
trained over the MS-COCO[19] dataset. The COCO
end devices like smartphones. dataset is a massive object detection dataset which has
330,000 images with over 200,000 labelled images
TF (Tensorflow Lite) also offers various pre-trained consisting of 80 various object categories.
models with commonly used algorithms and datasets
for out-of-the-box usage in projects and
Real-time face detection can be implemented using
applications.Several algorithms like You Only Look algorithms like Multi-Task Cascaded Convolutional
Once (YOLO) algorithm [16], Single Shot Detector Neural Network (MT- CNN)[20], Google FaceNet[21]
(SSD) [17] and Region-based Convolutional Neural
algorithm, and using the OpenCV Haar Cascade[22] and
Network (R-CNN) [18] among others can be used to OpenCV Dlib[23] toolkits.
implement realtime object recognition.
The FaceNet algorithm performed better among the
We chose the SSD algorithm for our project as it offers others as it had a maximum accuracy of 99.63%. So,
a fair trade-off between speed and accuracy over other
we chose to implement the FaceNet algorithm
algorithms which offered either of these parameters.
designed for low-power devices, the MobileFaceNet[24]
model for face detection. The MobileFaceNet model
The following table shows the speed and accuracy offered better speed over others as is evident from the
comparisons.
below graph.
Table 2. Speed and Accuracy comparison among object
detection algorithms.
Method mAP FPS Size Boxes Input
4
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
For implementing the chatbot, we have used AIML Fig 3 Detecting objects
chatbot which uses Python packages like Pyttsx3 (An
offline Python Text to Speech conversion library
(TTS)), nltk (Natural Language Toolkit, a package of Text Recognition
libraries and programs written in Python for
Text Recognition module has been implemented
processing natural language), chatterbot to provide
successfully
feedback to the user as per the queries asked.
Implementing the chatbot requires natural language
processing and artificial intelligence for it to give Accuracy of 90 %
replies and perform actions. The chatbot will read the Average run time of 1.4 seconds
command from the user, detect the keywords in the
command, and then will perform the action as
programmed by the developer.
6 Results
Object Recognition
The object recognition module has been implemented
successfully
Accuracy of 90 %
Average run time of 1.3 seconds.
5
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
Face Recognition
Face Recognition module has been implemented
successfully
Accuracy of 85%
Average run time of 1.2 seconds
Chatbot
Voice-based chatbot has been implemented
successfully
Barcode Scanner
Barcode Scanner has been integrated successfully
6
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
B. FACE RECOGNITION
1. Average time taken to perform one task(tested 20
times)
1.5
SUPERSENSE
1.4
SULLIVAN +
1.3
LETSEEAPP
1.2 ENVISON AI
1.1 OUR APP
Time Taken(sec)
Fig 11 –Time taken by each application for face recognition
7
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
2. Accuracy- providing correct output (tested 20 The working of the android app and its modules are
times) explained below.
95
SUPERSENSE
90
SULLIVAN +
85
LETSEEAPP
80
ENVISON AI
75 OUR APP
ACCURACY(PERCENTAGE)
Fig 14- Accuracy of each application for text recognition
7 Discussion
Fig 16 Application working flowchart
8
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
Datetime – This module is used to provide the date and The application is designed to capture preview frames
time to the chatbot. This module works offline as its at a resolution of 800*600px. The preview frame, if
working on the data received from the device on which horizontal in orientation is rotated vertically and is
it runs. We have implemented this in our Project so cropped to 400*300px which removes the background
that the user can ask the device for the time and date and retains only the human body.
whenever needed the data from this module is given to
another module (Pyttsx3) This image is then rescaled to 112*112px to be used as
input for the MobileFaceNet model. On feeding the
Web Browser- The user can browse through the web image, the model looks for the face in the image by
using only voice commands given to the chatbot. We matching the face features. It then creates a bounding
have configured this module in such a way that it can box when it detects the face and is highlighted. The
be used to gain information, play music (via an API to number of faces detected in the frame by the app is
access YouTube), provide us with weather report( via then outputted to the user orally using TTS
an API to access The Weather Channel), and get news functionality. This can be useful for the user to know
updates(via an API to access Times of India) and to get the number of people in the room or a certain place.
information on various topics we have also linked it Face detection requires only the smartphone’s camera
with Wikipedia Module. access.
9
ITM Web of Conferences 37, 01019 (2021) https://doi.org/10.1051/itmconf/20213701019
ICITSD-2021
10