Blind Assistance
Blind Assistance
SSD uses VGG16 to extract feature maps. Then it detects objects using the Conv4_3 layer.
For illustration, we draw the Conv4_3 to be 8 × 8 spatially (it should be 38 × 38). For each
cell (also called location), it makes 4 object predictions.
Each prediction composes of a boundary box and 21 scores for each class (one extra class for
no object), and we pick the highest score as the class for the bounded object. Conv4_3 makes
a total of 38 × 38 × 4 predictions: four predictions per cell regardless of the depth of the
feature maps. As expected, many predictions contain no object. SSD reserves a class “0” to
indicate it has no objects.
c. Depth Estimation:
Depth estimation or extraction feature is nothing but the techniques and algorithms which
aims to obtain a representation of the spatial structure of a scene. In simpler words, it is used
to calculate the distance between two objects. Our prototype is used to assist the blind people
which aims to issue warning to the blind people about the hurdles coming on their way. In
order to do this, we need to find that at how much distance the obstacle and person are located
in any real time situation. After the object is detected, rectangular box is generated around that
object. if that object occupies most of the frame, then with respect to some constraints the
approximate distance of the object from the particular person is calculated.
d. Voice Assistant:
After the detection of an object, it is utmost important to acknowledge the person about the
presence of that object on his/her way. For the voice generation module PYTTSX3 plays an
important role. Pyttsx3 is a conversion library in Python which converts text into speech. This
library works well with both Python 2 and 3. To get reference to a pyttsx. Engine instance, a
factory function called as pyttsx. init() is invoked by an application. Pyttsx3 is a tool which
converts text to speech easily. Pytorch is primarily a machine learning library. Pytorch is
mainly applied to the audio domain. Pytorch helps in loading the voice file in standard
mp3 format. It also regulates the rate of audio dimension. Thus, it is used to manipulate the
properties of sound like frequency, wavelength, and waveform. The numerous availabilities of
options for audio synthesis can also be verified by taking a look at the functions of Pytorch.
The SSD : It consists of two parts: an SSD head and a backbone model. As a feature
extractor, the backbone model is essentially a trained image classification network.
This is often a network trained on ImageNet that has had the final fully linked
classification layer removed, similar to ResNet . The SSD head is just one or more
convolutional layers added to the backbone, with the outputs read as bounding boxes
and classifications of objects in the spatial position of the final layer activations [3].
As a result, we have a deep neural network that can extract semantic meaning from an
input image while keeping its spatial structure, although at a lesser resolution. In
ResNet34, the backbone produces 256 7x7 feature maps for an input picture. SSD
divides the image into grid cells, with each grid cell being in charge of detecting
things in that region [1][7]. Detecting objects entails anticipate.
Pyttsx3: Pyttsx3 is a conversion library in Python which converts text into speech.
This library works well with both Python 2 and 3. To get reference to a pyttsx. Engine
instance, a factory function called as pyttsx. init() is invoked by an application.
Pyttsx3 is a tool which converts text to speech easily.This algorithm works as
whenever an object is being detected, approximate distance is being calculated, with
the help of cv2 library and cv2.putText() function, the texts are getting displayed on to
the screen. To identify the hidden text in an image, we use Python-tesseract for
character recognition.
The proposed product successfully captures the readable material in front of the user,
identifies the text in the image and reads it out. It also informs the user about the distance of
object that is at his level of eyesight and tells the objects around him. Hence this product
helps the user to gain knowledge from the readable material. This gives him necessary
information about his surroundings and makes him independent. The user-friendly wearable
device is portable and compact.
Drawbacks:
While identifying objects at a greater distance it fails to relocate the particular object and advice with
it name because the system gets confused having many objects.
Yuraja Kadri1:
Future Vision Technology:
This paper presents a prototype of lightweight smart glasses for visually impaired people. We
exhibited the working of the glasses, along with the hardware design and software design.
And we have implemented many excellent image processing, text recognition algorithms on
the new lightweight smart glass system. This system can detect and recognize the text in real
time. In the soon future, we will implement more useful applications in the smart glass
system such as handwritten texts, image detection.
K. Vijiyakumar:
Object Detection For Visually Impaired People Using SSD Algorithm:
In this project, an object detection system for visually impaired people based on SSD
algorithm in real time has been proposed. The system has retrieved the trained model from
the cloud database to perform object detection in real time. The proposed system is beneficial
for the visual impaired people for better living quality to detect the object as well as
calculating the distance of the object.
Esra ali hassan and Tong bong tang:
Smart Glasses For The Visually Impaired People:
This project presents a new design of assistive smart glasses for visually impaired students.
The objective is to assist in multiple daily tasks using the advantage of wearable design
format. As a proof of concept, this paper only presents one example application, i.e. text
recognition technology that can help reading from hardcopy materials. The building cost is
kept low by using single board computer raspberry pi 2 as the heart of processing and the
raspberry pi 2 camera for image capturing. Experiment results demonstrate that the prototype
is working as intended.
The proposed system is divided into two levels based on the SSD algorithm and TensorFlow,
which recognizes the objects not only for recognition but also for localization. It also tells
you how far the person is from the object. Individuals with visual impairments may face
difficulties as innovation advances step by step. This research work has proposed a novel
framework by utilizing AI, which makes the framework more straightforward to use
specifically for the individuals with visual impedances and to help the society. The main key
aspect of the proposed system is identifying or naming the object detected, calculating the
accurate distance between the user and objects and the voice over using Audio commands .
SSD ARCHITECTURE:
4.2 MODULE EXPLAINATION:
Picture Capturing Module: At the point when the framework is turned on the
framework catch pictures utilizing camera. We need to interface this as contribution
to the COCO dataset and grouping of pixels and highlights happens. The caught
casings should be visible in the screen with drawn limits and mark. The technique
video capture () is utilized to begin the camera and catch the video.
Picture Processing Module: OpenCV (Open-Source Computer Vision) is a library in
python what works mostly focused on constant PC vision. It is mostly used to do all
the computational activity connected with pictures. cv2 is utilized to perform picture
handling and utilize strategies which are utilized to identify and catch the casings and
indicates names. This module is handled after the info is taken from the camera.
Object Detecting Module: The calculation will accept the picture as info and every
one of the calculations will happen like diving the picture into neurons only pixels and
arrangement of elements which will be finished on Neural Network. Picture will be
perused as string for the following calculation and it will be analyzed under prepared
dataset. This can be accomplished here by utilizing class list where 90 items are
prepared independently. Here we utilized SSD Architecture which goes under Tensor
Flow API.
Distance computation Module: To find the distance of the item NumPy is utilized,
which is pip bundle utilized for numerical computation. Finding distance can be
approach by utilizing profundity assessment, utilizing recognized objects noticeable
on the screen approaches the profundity assessment will occur by finding mid ranges
and adjusting the assessment scale to 0-10.
Sound Output Module: Next in the wake of distinguishing the item and ascertaining
the distance our point is to give the result in the sound utilizing voice notes. In the
result we will determine the distance alongside units and the admonition messages to
alarm the client. For sound result the pyttxs3 pip bundle which is predefined python
underlying module utilized for switching text over completely to discourse..
5.PRELIMINARY ANALYSIS:
5.1 BRIEF ABOUT INPUT DATA:
COCO is a large image dataset designed for object detection, segmentation, person keypoints
detection, stuff segmentation, and caption generation. It stores its annotations in the JSON
for storing and using the tools developed for COCO we have to create the dataset like like
COCO we can either convert the one which we have to COCO format or we can create one to
ourselves.
Object segmentation
Recognition in context
80 object categories
91 stuff categories
The proposed system is beneficial for the visual impaired people for better living quality to detect
the object as well as calculating the distance of the object.
6.FEASABILITY STUDY:
6.1 TECHNICAL FEASABILITY:
This study is carried out to check the technical feasibility, that is, the technical requirements
of the system. No system developed should not have a high demand for the available
technical resources. This will lead to high demand for the available technical resources. This
will lead to high demands being placed on the client. The developed system must have a
modest requirement, as only minimal or null changes are required for implementing this
system
[2] Mallapa D. Gaurav, Shruti S. Salimath, Shruti B. Hatti, Vijayalaxmi I. Byakod, Shivaleela Kanede,
”B-Light: A reading aid for the blind people using OCR and OpenCV”, International Journal of
Scientific Research Engineering & Technology, May 2017, vol. 6, issue 5, pp. 546-548
[3] Nikhil Mishra, “Image Text to Speech Conversion using Raspberry Pi & OCR Techniques”,
International Journal for Scientific Research and Development, vol. 5, issue 08, 2017, pp. 523-525.
[4] Zhiming Liu, Yudong Luo, Jose Cordero, “Finger-eye: A wearable text reading assistive system for
the blind and visually impaired”, IEEE International Conference on Real-time Computing and
Robotics, 6-10 June 2016, pp. 125-128.
[5] “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” Shaoqing
Ren, Kaiming He, Ross Girshick, and Jian Sun, IEEE transactions, Dec 2016.
[6] Vicky Mohane, Chetan Gade “Object Recognition for Blind people Using Portable Camera” WCFTR
World conference 2016.
[7] Image recognition: By Samer Hijazi, Rishi Kumar, and Chris Rowen, IP Group, Cadence “Using
convolutional neural network for image recognition”.
[11] https://sci-hub.se/https://ieeexplore.ieee.org/document/8554625
[12] https://ieeexplore.ieee.org/document/1713189
[13] https://ieeexplore.ieee.org/document/8795205
[14] https://sci-hub.se/https://ieeexplore.ieee.org/document/9137791