Voice Assisted Object Detection For Visually Impaired
Voice Assisted Object Detection For Visually Impaired
2023 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) | 979-8-3503-3439-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/CONECCT57959.2023.10234781
VISUALLY IMPAIRED
HITAISH KG DR.VANI KRISHNASWAMY VISHNU M
School of CSE School of CSE School of CSE
REVA UNIVERSITY REVA UNIVERSITY REVA UNIVERSITY
Bengaluru, India Bengaluru, India Bengaluru, India
[email protected] [email protected] [email protected]
B MAHIMA
School of CSE
REVA UNIVERSITY
Bengaluru, India
[email protected]
Abstract— Visual impairment can have a severe negative In this research paper, we seek to review the current
effect on a person's freedom, employment, and daily life. technologies in object detection for visual impairment and
Globally there are at least 2.2 billion individuals who highlight the most promising approaches and techniques. We
experience some form of visual impairment. Reading, propose an offline navigation system which identifies, tracks
writing, and navigating their environment is challenging the direction and distance of objects in real time. This system
for people with visual impairment. A lower quality of life uses low processing power, enabling it to be used in various
as well as feelings of loneliness and sadness may result mobile devices.
from this. There are a variety of deep learning object We also make recommendations for future research
detection models based on computer vision. In this paper, trajectories that can enhance the efficiency and usability of
we aim to integrate MobileNet SSD architecture to object recognition systems.
develop a navigation system which provides faster and The paper is structured as follows. In section 2 we present
accurate object detection to assist the visually impaired. the literature review on existing systems of objection
This project is a mobile-friendly architecture that is detection for visually impaired. Section 3 provides the
lightweight and optimized for embedded and mobile recommended system to detect objects using convolutional
devices with constrained CPU power. In this paper we neural networks. Section 4 of the paper offers an evaluation
also designed algorithms which provides direction and and examination of the proposed system's performance.
distance along with audio output. In the future works, this Ultimately, the paper concludes by presenting the findings
project can be further modified and implemented in and discussing future developments.
various domains.
2. RELATED WORKS
Keywords— Visual impairment, object detection, The rapid progression of technology has given rise to a
navigation, Deep learning, Computer vision. multitude of systems and technologies like Electronic travel
devices (ETD’s), Ultrasonic sensors and RFID’s [1,2,3].
Although these devices are capable of discerning the
1. INTRODUCTION existence of objects in their surroundings, they lack the ability
to identify the specific nature of those objects. Furthermore,
Visual impairment is a widespread state that touches lots of
these devices may exhibit a considerable degree of
individuals around the sphere and can range from minor to inaccuracy and imprecision, thereby compromising their
major, and it can considerably touch a person's value of life. overall effectiveness and practicality.
Individuals who are blind or partially sighted have numerous
The authors in [4] have used convolutional neural
difficulties every day, such as navigating unfamiliar
networks which have been found very useful and provided
environments, recognizing objects, and identifying obstacles.
promising results for object detection.
Several technologies, including white canes, guiding dogs,
Further the authors in [5] have used deep learning
and assistive devices, have been developed to address these algorithms like YOLOv3(You only look once) for object
difficulties. These technologies do, however, have inherent detection. In this paper, the authors have used logistic
restrictions and might not always deliver precise or timely
regression method to predict classes of bounding box which
information.
comprises the object.
Object detection for visual impairment has the potential to
The authors in [6] have recognized objects using Region-
address some of these limitations. By using computer vision based convolutional neural networks (R-CNN) algorithms. In
algorithms and machine learning techniques, object detection this paper guidance is provided through an application in
systems can help visually impaired individuals better
audio output to give the results of object detection. To train
understand their surroundings and make informed decisions.
the neural network model, the authors have used TensorFlow
For instance, an object detection system can identify
lite and XML. The application is built using java in Android
obstacles like poles, curbs, or other hazards and warn the user
Studio.
to stay clear of them.
Authorized licensed use limited to: The Technology Library. Downloaded on May 30,2024 at 14:39:19 UTC from IEEE Xplore. Restrictions apply.
the camera which is a physical parameter of the camera that convolutional, pooling and nonlinear transformational
can be obtained from the manufacturer's specifications of the operations are used to turn a picture into a set of features.
camera used. The bounding_box area in equation (1) is the Running an input image through a MobileNet base network
area of the object in the image that is detected by the object uses a number of additional layers that conduct object
detection algorithm. The obtained bounding box area is detection model to perform feature extraction using CNN.
converted from pixel square to meter square. Object identification datasets namely MS COCO is used to
train MobileNet-SSD caffe model and the model is further
3.4 Algorithm to find the position of the object fine-tuned on VOC0712. These datasets, which have many
Each frame is run through the location detection annotated photos and different object classifications, are
technique once the objects are detected and the distance has frequently used to test and improve object detection models
been determined. The identified items in the frame are [9]. Some of the objects detected by VAOD are bicycle, bird,
categorized into top left, top right, bottom left, and bottom boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse,
right orientations. Section 4 provides more information on the motorbike, person, potted plant, sheep, sofa, train and tv
method. monitor.
A collection of feature maps representing the identified
3.5 Text to speech features of the input image make up the MobileNet base
The pyttsx3 tools are the primary packages utilized in this network's output.
conversion. Python's Pyttsx3 module is used to convert text The location and class of items in the input image are
to voice. This is the base for the engine's ability to provide an predicted by a collection of extra layers, which are applied to
audio output to the user. the feature maps produced by the MobileNet base network.
Convolutional layers that pull features from the input feature
4. IMPLEMENTATION maps are followed by object detection layers. The previous
box layer and the detection output layer are the two different
Figure 2. shows the implementation of our model. Once the sorts of layers that make up the layers MobileNet- SSD. The
live video gets converted to frames, we load the mobile-net 8732 anchor boxes (per class) generated by the prior box
SSD model. The mobile-net SSD model customs a feed- layer will be utilized to forecast the bounding boxes for the
forward convolutional system to create a fixed-size group of objects in the input image. Based on the information retrieved
bounding boxes and marks for the presence of object class by the convolutional layers and the anchor boxes, the
occurrences in those boxes. A non-maximum conquest step detection output layer predicts the location and kind of
is later utilised to provide the last detection [11]. objects in the input image [13].
Authorized licensed use limited to: The Technology Library. Downloaded on May 30,2024 at 14:39:19 UTC from IEEE Xplore. Restrictions apply.
confidence ratings or it selects the boxes that have large the need for an internet connection. In situations when
amount of overlap with the grounding box based on its internet connectivity is erratic or non-existent, this can be
confidence level. Further, it sorts the projected bounding helpful. It enables more flexibility in the text-to-speech
boxes in descending order based on their confidence scores. output. The engine's speech rate, volume, and voice type may
It then selects and includes the box with the highest all be altered according to the user.
confidence score in the output and determines the IoU of this
box with other boxes [14]. This confidence score or the
accuracy of the object detected is presented as a percentage
value in the output.
ϲϲ͘ϬϬй
left and bottom-right corners of the bounding box. ϲϰ͘ϬϬй
ϲϮ͘ϬϬй
x_avg = (start_X + end_X) / 2 ----------(4) ϲϬ͘ϬϬй
ϱϴ͘ϬϬй
y_avg = (start_Y + end_Y) / 2 ----------(5) ϱϲ͘ϬϬй
zK>Kϰϱ ZͲEE ^^ ĨĂƐƚĞƌZͲ sK
In equation (4), x_avg is the centroid of the object with &W^ EE
respect to x axis, start_X and start_Y represents the x- sĂƌŝŽƵƐĚĞĞƉůĞĂƌŶŝŶŐŵŽĚĞůƐ
coordinate and y-coordinate of the top-left corner of the
bounding box respectively. In equation (5) y_avg is the Fig 4. Accuracy of detected object vs Deep learning models.
centroid of the object with respect to y axis, end_X and
end_Y represents the x-coordinate and y-coordinate of the Figure 4. illustrates accuracy of different deep learning
bottom-right corner of the bounding box respectively. models
The frame's centre is therefore believed to be at 200x112.5
and the imutils.resize function scales the frame down to ϲϬ
400x225. Depending on how faraway from the frame's centre ϱϬ
the object's centre is, the direction is determined. For
ϰϬ
^ƉĞĞĚ;ĨƉƐͿ
Authorized licensed use limited to: The Technology Library. Downloaded on May 30,2024 at 14:39:19 UTC from IEEE Xplore. Restrictions apply.
Figure 5. illustrates the processing speeds(fps) of different
deep learning architectures. 6. CONCLUSION AND FUTURE WORKS
VAOD is light weight and hence it can be used in IOT or
any mobile application which has low processing power. A novel framework employing object detection,
VAOD not only identifies the objects but also guides the classification of objects, directions and distance prediction
visually impaired by providing directions and distance. has been presented to assist visually impaired people. Future
The following images show some of the results. works can be done to increase the number of objects detected
and this model can be further employed in different fields like
robotics, medical automation and automobile industry.
7. REFERENCES
[1] Cardillo, E. and Caddemi, A., 2019. Insight on electronic travel aids
for visually impaired people: A review on the electromagnetic
technology. Electronics, 8(11), p.1281.
[2] Gbenga, D.E., Shani, A.I. and Adekunle, A.L., 2017. Smart walking
stick for visually impaired people using ultrasonic sensors and
Arduino. International journal of engineering and technology, 9(5),
pp.3435-3447.
[3] Real, S. and Araujo, A., 2019. Navigation systems for the blind and
visually impaired: Past work, challenges, and open
problems. Sensors, 19(15), p.3404.
Fig 6. Detection of object.
[4] S. Shah, J. Bandariya, G. Jain, M. Ghevariya and S. Dastoor, "CNN
based Auto-Assistance System as a Boon for Directing Visually
Figure 6 illustrates the detection of object (sofa) with 52.86% Impaired Person," 2019 3rd International Conference on Trends in
accuracy towards downright and with a distance of 0.41m. Electronics and Informatics (ICOEI), Tirunelveli, India, 2019, pp. 235-
240, doi: 10.1109/ICOEI.2019.8862699.
[5] Wong, Y.C., Lai, J.A., Ranjit, S.S.S., Syafeeza, A.R. and Hamid, N.A.,
2019. Convolutional neural network for object detection system for
blind people. Journal of Telecommunication, Electronic and Computer
Engineering (JTEC), 11(2), pp.1-6.
[6] Afif, M., Ayachi, R., Said, Y., Pissaloux, E. and Atri, M., 2020. An
evaluation of retinanet on indoor object detection for blind and visually
impaired persons assistance navigation. Neural Processing Letters, 51,
pp.2265-2279.
[7] Yee, L.R., Kamaludin, H., Safar, N.Z.M., Wahid, N., Abdullah, N. and
Meidelfi, D., 2021. Intelligence Eye for Blinds and Visually Impaired
by Using Region-Based Convolutional Neural Network (R-
CNN). JOIV: International Journal on Informatics Visualization, 5(4),
pp.409-414.
[8] S. Bhole and A. Dhok, "Deep Learning based Object Detection and
Recognition Framework for the Visually-Impaired," 2020 Fourth
Fig 7. Detection of person. International Conference on Computing Methodologies and
Communication (ICCMC), Erode, India, 2020, pp. 725-728, doi:
Figure 7 illustrates the detection of people as person with 10.1109/ICCMC48092.2020.ICCMC-000135.
75.30% accuracy with distance of 0.11m and 99.27% [9] Younis, A., Shixin, L., Jn, S. and Hai, Z., 2020, January. Real-time
accuracy with distance of 0.12m respectively. object detection using pre-trained deep learning models MobileNet-
SSD. In Proceedings of 2020 the 6th international conference on
computing and data engineering (pp. 44-48).
[10] Arora, A., Grover, A., Chugh, R. et al. Real Time Multi Object
Detection for Blind Using Single Shot Multibox Detector. Wireless
Pers Commun 107, 651–661 (2019). https://doi.org/10.1007/s11277-
019-06294-1.
[11] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y. and
Berg, A.C., 2016. Ssd: Single shot multibox detector. In Computer
Vision–ECCV 2016: 14th European Conference, Amsterdam, The
Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37).
Springer International Publishing.
[12] Rosebrock, A. (2015, January 19). Find Distance from Camera to
Object/Marker Using Python and OpenCV. Retrieved May 10, 2023,
from https://pyimagesearch.com/2015/01/19/find-distance-camera-
objectmarker-using-python-opencv/. DOI:
10.1109/ACCESS.2020.3009703. Type: Article.
Fig 8. Detection of motorbike and person. [13] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W.,
Weyand, T., Andreetto, M. and Adam, H., 2017. Mobilenets: Efficient
convolutional neural networks for mobile vision applications. arXiv
Figure 8 illustrates detection of motorbike with 71.67% preprint arXiv:1704.04861
accuracy towards down-left direction with distance of 0.51m [14] Kim, K. and Lee, H.S., 2020. Probabilistic anchor assignment with iou
and a person (left) with 59.42% accuracy and person (right) prediction for object detection. In Computer Vision–ECCV 2020: 16th
with 83.50% accuracy respectively. European Conference, Glasgow, UK, August 23–28, 2020,
Proceedings, Part XXV 16 (pp. 355-371). Springer International
Publishing.
Authorized licensed use limited to: The Technology Library. Downloaded on May 30,2024 at 14:39:19 UTC from IEEE Xplore. Restrictions apply.