Demo Research Paper
Demo Research Paper
Shilpa Vatkar
Department of Electronics and
Telecommunication Engineering
K. J. Somaiya College of Engineering
Mumbai, India.
[email protected]
2023 3rd Asian Conference on Innovation in Technology (ASIANCON) | 979-8-3503-0228-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ASIANCON58793.2023.10269899
Abstract— Visually impaired individuals face numerous the positions of different objects, which can vary from one
challenges in their daily lives, including the ability to identify image to another [2]. This research proposes a system that
and navigate through their surroundings independently. can help people who are visually impaired detect and identify
Object detection techniques based on computer vision have objects in their environment in real-time. To achieve this, we
shown results in helping the visually impaired by detecting and use an object detection algorithm called YOLO_v3 and a
classifying objects in real-time. In this paper, we used a dataset called MSCOCO. Our system generates an audio
realtime object detection and audio feedback system that description of the object, including its location and category,
provides audio feedback to the visually impaired for and plays it through a speaker or headphones using gTTS
identifying and navigating in their surroundings. The proposed
(Google Text to Speech) API. With providing audio
system uses the YOLO_v3 algorithm with the MS COCO
dataset to detect and classify objects in real-time and provide
feedback, this system aims to help visually impaired
corresponding audio feedback. We used gTTS (Google Text to individuals an additional way to detect and identify objects in
Speech) API for generating the audio feedback. The audio their environment.
feedback is generated using an audio processing techniques
and deep learning algorithms. We evaluated on a dataset, and
II. LITERATURE REVIEW
achieved an average detection accuracy of 90%. The proposed Object detection and recognition have been important
system provides a practical and effective solution for topics of research form many years. With the advancement
enhancing accessibility and independence for visually impaired of deep learning techniques, object detection has become
individuals, and demonstrates the potential of using advanced more accurate and efficient. The YOLO (You Only Look
deep learning algorithms and datasets for real-time object Once) algorithm has emerged as a popular method for real-
detection and audio feedback systems. time object detection due to its speed and accuracy [3]. There
has been an increase in the amount of interest in developing
Keywords— Real-time object detection, Audio feedback assistive technologies for visually impaired individuals.
system, YOLO_v3 algorithm, MS COCO dataset, gTTS (Google
These technologies aim to enhance their independence and
Text to Speech) API, Deep learning
mobility by providing them with additional means of
I. INTRODUCTION detecting and identifying objects in their environment. Deep
learning-based object detection systems have shown
From an early age, humans are taught by their parents to promising results in this regard. The Microsoft Common
distinguish between different things, including themselves as Objects in Context (MS COCO) dataset is widely used in
individuals. Our visual system as humans is remarkably deep learning-based object detection research. It is a large-
precise and can handle multiple tasks even when we are not scale dataset that contains over 330,000 images with more
consciously aware of it. However, when dealing with large than 2.5 million object instances labelled in 80 different
amounts of data, we require a more accurate system to categories [4]. Ramesh et al. proposed a real-time object
correctly identify and locate multiple objects at the same detection system for visually impaired individuals using deep
time. This is where machines come into play. By training our learning. Their system uses a YOLO-based object detection
computers using improved algorithms, we can enable them algorithm and provides audio feedback to the user in real-
to detect multiple objects within an image with a high level time [5]. Saha et al. proposed an object detection and audio
of accuracy and precision. Object detection is a particularly feedback system for visually impaired individuals that uses
challenging task in computer vision because it involves fully deep learning techniques. They used the YOLO algorithm
understanding images. In simpler terms, an object tracker for object detection and gTTS (Google Text-to-Speech) for
attempts to determine if an object is present in multiple audio feedback [6]. Li et al. proposed a deep reinforcement
frames and assigns labels to each identified object [1]. This learning-based object detection and obstacle avoidance
process encounters various challenges, such as complex system for visually impaired individuals. Their system uses a
images, loss of information, and the transformation of a combination of object detection and obstacle avoidance
three-dimensional world into a two-dimensional image. To techniques to enable visually impaired individuals to
achieve accurate object detection, our focus should not only navigate through complex environments [7]. One of the most
be on classifying objects but also on accurately determining commonly used object detection algorithms for real-time
2
Authorized licensed use limited to: Somaiya University. Downloaded on May 24,2024 at 11:15:19 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Architecture of YOLO_v3
3
Authorized licensed use limited to: Somaiya University. Downloaded on May 24,2024 at 11:15:19 UTC from IEEE Xplore. Restrictions apply.
Fig. 12. Terminal Output
Multiple Object:
With a Multiple object detection, it gives accuracy
between 1 – 0.78 which is 100 % - 78 % accuracy
Fig. 7. Video Frame Output
4
Authorized licensed use limited to: Somaiya University. Downloaded on May 24,2024 at 11:15:19 UTC from IEEE Xplore. Restrictions apply.
camera as the input device, limiting its use in low-light
Fig. 18. Terminal Output environments. We can improve the detection model's
precision by expanding the data set to include more images
in a different lighting conditions and orientations. The object
detection technique may have a few extra features added,
such color recognition and distance measurement.
REFERENCES
[1] S. Cherian, & C. Singh, “Real Time Implementation of Object Tracking
Through webcam,” Internation Journal of Research in Engineering and
Technology, 128-132, (2014)J. Clerk Maxwell, A Treatise on
Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892,
pp.68–73.
[2] Z. Zhao, Q. Zheng, P.Xu, S. T, & X. Wu, “Object detection with deep
learning: A review,” IEEE transactions on neural networks and
learning systems, 30(11), 3212-3232, (2019).
[3] Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental
Fig. 19. Precision Curve of YOLO_v3 improvement. arXiv preprint arXiv:1804.02767.
[4] Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D.,
... & Zitnick, C. L. (2014). Microsoft COCO: Common objects in
context. In European conference on computer vision (pp. 740-755).
Springer, Cham.
[5] Ramesh, N., Anand, V. R., & Babu, R. V. (2018). Real-time object
detection for visually impaired using deep learning. In 2018
[6] International Conference on Communication and Signal Processing
(ICCSP) (pp. 0214-0218). IEEE.
[7] Saha, S., Nag, A., & Roy, P. P. (2019). Object detection and audio
feedback system for the visually impaired using deep learning.
International Journal of Computer Vision and Image Processing, 9(3),
1-14.
[8] Li, H., Chen, X., Liang, X., Li, Z., & Liu, S. (2019). Deep
reinforcement learning-based object detection and obstacle avoidance
for visually impaired. Sensors, 19(20), 4483
Fig. 20. . Recall Curve of YOLO_v3 [9] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look
Once: Unified, Real-Time Object Detection," in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2016.
We tested the system on a MS COCOdataset of images
[10] Y. Gao and W. Wu, "Real-time Object Detection for Visually Impaired
containing various objects and measured its accuracy and People with YOLO," in Proceedings of the 2nd International
speed. The results showed that our system achieved high Conference on Control Science and Systems Engineering, 2021.
accuracy in object detection between of 1 - 0.64 which is [11] N. R. Kuncham and K. H. Prasad, "Real-time Object Detection for
100% - 64%. The system was able to detect and classify Visually Impaired People Using YOLOv3," in Proceedings of the 6th
objects in real- time, on a laptop with a GPU. Also, the audio International Conference on Inventive Computation Technologies,
feedback generated by gTTS API was clear and 2021.
understandable, providing visually impaired individuals with [12] S. Shin and S. Kwon, "Real-time Object Detection with Audio
Feedback for the Visually Impaired using YOLOv3," in Proceedings of the
a reliable means of detecting and identifying objects in their 15th International Conference on Advanced Technologies, 2020.
environment. Overall this real-time object detection and
[13] S. Ghosal, P. Banerjee, and S. Chakraborty, "Real-time Object
audio feedback system showed high accuracy and speed, Detection and Audio Feedback System for the Visually Impaired using
making it a good tool for assisting visually impaired Faster R-CNN," in Proceedings of the International Conference on
individuals in navigating their environment. Computer Vision and Image Processing, 2019.
[14] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D
VI. CONCLUSION AND FUTURE SCOPE [15] Bhuyan, M. S., Chakravarty, S., Das, S., & Bora, P. K. (2019). Real-
In conclusion, our research has shown the effectiveness time text detection and audio feedback system for the visually
impaired. Multimedia Tools and Applications, 78(17), 24479-24499.
of utilizing deep learning techniques, specifically CNN and
[16] Noh, Y., Kim, C., & Hwang, I. (2018). Object detection and
YOLO_v3, to develop an object detection system for identification for visually impaired using deep learning and audio
visually impaired individuals. This has shown an excellent feedback system.
accuracy in identifying and categorizing single and multiple [17] Saha, S., Pal, S., & Mukherjee, J. (2019). An assistive device for
objects, and remote object utilizing a laptop webcam in a visually impaired people for object detection and audio feedback.
short amount of time. Also, our system can detect multiple [18] T. Lin, Y. Maire, M. Belongie, S. Hays, J. Perona, P. Ramanan, D.,
objects in a frame and accurately determine their positions. & C.L. Zitnick, “Microsoft coco: Common objects in context,” In
We have used MS COCO Dataset. We have also successfully European conference on computer vision (pp. 740-755). Springer,
Cham, (2014, September)
used our object detection system with gTTS API to provide
audio feedback to visually impaired individuals, enhancing [19] http://cocodataset.org/#home
their ability to navigate and interact with their environment. [20] J. Du, “Understanding of Object Detection Based on CNN Family and
YOLO,” In Journal of Physics: Conference Series (Vol. 1004, No.1,
This provides real-time audio feedback to the user. Also, this p. 012029). IOP Publishin, g, (2018, April).
has shown that the benefits of using deep learning and audio
feedback for object detection, there are still areas for
improvement. For example, our system currently relies on a
5
Authorized licensed use limited to: Somaiya University. Downloaded on May 24,2024 at 11:15:19 UTC from IEEE Xplore. Restrictions apply.