Assistive Technology For Visually Impaired Using Tensor Flow Object Detection in Raspberry Pi and Coral USB Accelerator
Assistive Technology For Visually Impaired Using Tensor Flow Object Detection in Raspberry Pi and Coral USB Accelerator
Amit Ghosh§ , Shamsul Arefeen Al Mahmud∗ , Thajid Ibna Rouf Uday† , Dewan Md. Farid‡
§‡ Departmentof Computer Science & Engineering, † Department of Electrical & Electronic Engineering,
United International University, United City, Madani Avenue, Badda, Dhaka 1212, Bangladesh
∗ Department of Electrical Engineering & Automation, Aalto University, Finland.
Abstract—Assistive Technology (AT) becomes an interesting Smartphones are the most common device in assistive tech-
field of research in this present era. According to the World nology development. Voice assistant using a smartphone is
Health Organisation (WHO - https://www.who.int), there are the most common option for the visually impaired person.
approximately 285 million visually impaired people around the Such technologies have limitations as smartphones need to be
world. To address this issue, many researchers are employing focused on a specific thing to know about it. Voice assistant
new technologies, e.g. Machine Learning (ML), Computer Vision
(CV), Image Processing, etc. This paper aims to develop an
applications are not perfect and cannot answer all the necessary
assistive technology based on Computer Vision, Machine Learn- queries. There are many applications of assistive technology
ing and Tensor Flow to support visually impaired people. The for visually impaired people, such as navigation in the road,
proposed system will allow the users to navigate independently navigation assistance in a vehicle, etc. In intelligent transport
using real-time object detection and identification. Hardware system (ITS), assistive technologies can be merged for visually
implementation is done to test the performance of the system, impaired people to help them navigate through the road [4],
and the performance is tracked using a monitoring server. The [5]. Ghosh et. al. [6] presented a video based system to count
system is developed on Raspberry pi 4 and a dedicated server with the number of vehicles, measure speed where the assistive
NVIDIA Titan X graphics where Google coral USB accelerator system for a blind person can be easily implemented. Babir
is used to boost processing power. et. al. [7] discussed the advantages of radio over fiber-based
Keywords—Assistive Technology; Computer Vision; Object De- vehicle communication systems where real-time data can be
tection; Tensor Flow; Visually Impaired; easily transferred through the internet, which can be a useful
factor in assistive technology to collect real-time data from the
environment.
I. I NTRODUCTION
Computer Vision (CV) is gaining popularity for processing
Assistive Technology (AT) is used for the betterment of digital images nowadays, which is a branch of Artificial
persons with some disability, e.g., blindness and deafness. Intelligence (AI) and Machine Learning (ML). In this paper, a
Visual impaired people suffer from numerous difficulties, such wearable Computer Vision system based on Internet of Things
as working on their own, not understanding the changes (IoT) embedded platform is proposed where image classifica-
in their surroundings. In recent years, researches are amal- tion is implemented by integrating with the voice to assist
gamating emerging technologies like Artificial Intelligence the visually impaired person. The system is developed and
(AI), Deep Learning to assist the visually impaired people. tested in Dhaka, Bangladesh. Visual data are collected through
Physical and social barriers, along with accessibility, make a camera and then processed through Tensor Flow, an open-
it burdensome for a visually impaired person. Blindness and source framework developed by Google. Image classification,
visual impairment are one of the highest rates in Asia that object detection, speech recognition, and voice command for
round 63% of people have a visual impairment [1]. Several navigation are the main strength of this work.
assistive devices e.g. earphones, smart cane, goggles, etc have
been introduced to aid blind people in last decades. Assistive The remainder of this paper is organised as follows. Section
technologies can enhance the quality of life of a blind person II discusses about the related works to understand the recent
by introducing specific devices, services, systems, processes, development in this area. Section III presents the system
and environmental modifications [2]. Recent studies observe overview along with data processing technique. Section IV
the impact of state-of-art assistive technologies such as cane, presents the experimental test and results. Finally, Section V
goggles, wearables, etc. for the visually impaired person [3]. presents the conclusion with future works.
Auditory vision can be a solution for the visually impaired
person as many studies illustrate that blind persons do have
better sound sense than a sighted person. II. R ELATED W ORK
Recently, researchers are looking for a way to integrate Computer Vision (CV) is creating an innovative step
different technologies with voice commands to assist a blind towards assisting visually impaired people. Leo et. al. [8]
person in navigating and understanding his/her environment. discussed the use of Computer vision (CV) that can help
978-1-7281-7366-5/20/$31.00
2020
c IEEE 186
2020 IEEE Region 10 Symposium (TENSYMP), 5-7 June 2020, Dhaka, Bangladesh
to develop multiple assistive technologies for blind or visu- earphone. A prototype is built to test the system and evaluate
ally disabled people. Software solutions are becoming more the performance. Table I shows the prototype system in details.
common in assistive technology. Tapu et. al. [9] presented
the difficulties in pointing smartphones towards an object to
describe the environment. Digital image processing is stepping
forward in the development of assistive systems for visually
disabled people. Numerous works have been done using image
processing or computer vision technology. Roberto and Dan
developed a laser-based virtual cane on top of computer vision,
which can assist in navigation [10]. Rajalakshmi et. al. [11]
discussed about an assistive system using object detection by
the tensor flow and ultrasonic sensors. This work is similar to
[11], however, in the proposed method, computer vision-based
goggles are implemented without any sensor, which is reducing
the cost. Moreover, we have considered the hearing problem
of the person and integrated a bone conduction earphone
with the system. The whole system is based on raspberry Fig. 1: Image of the prototype system, which contains a
pi. Raspberry pi is a well known embedded platform for camera, goggles, USB coral, raspberry pi 4 and headphone.
development projects. Goel et. al. [12] used raspberry pi to
develop readers for blind people. The voice vision technology
developed a visual experience system that can provide an
image to sound description to the users [13]. It uses a camera TABLE I: Prototype system details.
on top of goggles to capture images and map them according
to the objects in the image. Navigation is a hurdle for the Specifications Description
Camera Fixed focus, 720p/30fps
visually impaired person. Ross [14] developed a system for Goggles Ordinary glasses
blind people navigation, and the system was tested to cross a Processing Unit Google Coral
road using an assistive wearable. This work taken to imply CV Embedded Platform Raspberry Pi 4
by integrating voice command, speech recognition and object
detection. The uniqueness of this work is including Tensor
Flow for image processing, object detection, expression recog-
nition, real-time ID, text to speech conversion and navigation
system. The high processing unit is used to process the images
with proper efficiency.
III. M ETHODOLOGY
A. System Overview
The proposed system is a wearable goggle with earphones
based on IoT embedded platform using raspberry Pi 4. It has
multiple segments includes image collection, image process-
ing, object detection, expression recognition, real-time ID, text
to speech conversion, and navigation system. The camera is
an essential device in this system, which is responsible for
collecting images of the real-time environment. The entire
system is based on goggles, which include a camera and
earphones. The whole system works depending on the im-
age collected by the camera. Then the collected images are
processed through Tensor Flow to identify different objects in
the images. In Fig. 2, there are four modes of operation after
getting the image data. In describe mode, the system uses local
or cloud processing to explain the user about the image. In
announcement mode, the system will narrate the object in the
images in real-time. In search operation mode, object search
and find direction are done by using voice recognition. In the
proposed system, to identify the object’s exact position and
distance from the user, the system calculates the angle from the
user’s current location. In the last mode, real-time identification
and expression recognition are made by the system. It has an
algorithm to detect new faces and saved it in the database for
future reference. After all of these steps, the system generates Fig. 2: Flow chart of the proposed system with different modes.
text and process it for text to speech conversion. Then the
speech output is sent to the user through a bone conduction
978-1-7281-7366-5/20/$31.00
2020
c IEEE 187
2020 IEEE Region 10 Symposium (TENSYMP), 5-7 June 2020, Dhaka, Bangladesh
the users. The experiment was conducted in different places a2i-Access to Information Program – II, Information and Com-
including, university classroom, university cafeteria, university munication Technology (ICT) Division (https://ictd.gov.bd),
lobby, office room and an office meeting room. The experiment Government of the People’s Republic of Bangladesh. We
environments are selected based on room size. The location would like to thank “a2i (Access to Information) Innovation
was United International University, Bangladesh, and office Lab” (https://a2i.gov.bd/innovation-lab/).
location is ANTT Robotics Limited, Bangladesh. From the R EFERENCES
experiment, we have collected accuracy data to understand
[1] Y.-C. Tham, S.-H. Lim, Y. Shi, M.-L. Chee, Y. F. Zheng, J. Chua,
how conveniently our users can identify and roam around S.-M. Saw, P. Foster, T. Aung, T. Y. Wong et al., “Trends of visual
the places independently. As mentioned above, as experiment impairment and blindness in the singapore chinese population over a
was conducted in different places by the help of two visually decade,” Scientific reports, vol. 8, no. 1, pp. 1–7, 2018.
impaired persons. From the experiments we have collected [2] M. A. Hersh and M. A. Johnson, “On modelling assistive technol-
image frames from the camera, and tried to calculate the ogy systems–part i: Modelling framework,” Technology and disability,
accuracy of the overall system. Table V is containing the vol. 20, no. 3, pp. 193–215, 2008.
data from our experiment. The accuracy is representing the [3] A. Bhowmick and S. M. Hazarika, “An insight into assistive technology
precision of identifying objects in the environment. This for the visually impaired and blind people: state-of-the-art and future
trends,” Journal on Multimodal User Interfaces, vol. 11, no. 2, pp. 149–
accuracy is the determinant about how precisely a visually 172, 2017.
impaired person can recognise objects within the coverage area
[4] I. Alam, D. M. Farid, and R. J. F. Rossetti, “The prediction of traffic
of the system. Data column is the number of frame collected flow with regression analysis,” in International Conference on Emerging
during the experiment. The system was tested in different Technology in Data Mining and Information Security (IEMIS), Kolkata,
room environment where classroom and office room were India, February 2018, pp. 1–10.
quite similar in size. University cafeteria and office meeting [5] I. Alam, M. F. Ahmed, M. Alam, J. Ulisses, D. M. Farid, S. Shatabda,
room were bit larger than classroom size, where university and R. J. F. Rossetti, “Pattern mining from historical traffic big data,”
lobby was the largest environment. From Table V, the system in IEEE Technologies for Smart Cities (TENSYMP), and IEEE Xplore
Digital Archive, Cochin, Kerala, India, July 2017, pp. 1–5.
performance is better when the room size is small and it
[6] A. Ghosh, M. S. Sabuj, H. H. Sonet, S. Shatabda, and D. M. Farid,
degrades when the size increases. System accuracy is 89% “An adaptive video-based vehicle detection, classification, counting,
and 87% for classroom and office room respectively, where in and speed-measurement system for real-time traffic data collection,”
university lobby the accuracy is lowest (70%). in The IEEE Region 10 Symposium (TENSYMP) Symposium Theme:
Technological Innovation for Humanity, Kolkata, India, June 2019, pp.
1–6.
TABLE V: Result on system accuracy. [7] M. R. N. Babir, S. A. Al Mahmud, and T. Mostary, “Efficient m-qam
digital radio over fiber system for vehicular ad-hoc network,” in 2019
Location Data Minute Accuracy International Conference on Robotics, Electrical and Signal Processing
University Classroom 460 80 89 %
Techniques (ICREST). IEEE, 2019, pp. 34–38.
University Cafeteria 1090 10 85 %
University Lobby 300 12 70 % [8] M. Leo, G. Medioni, M. Trivedi, T. Kanade, and G. M. Farinella,
Office Room 633 15 87 % “Computer vision for assistive technologies,” Computer Vision and
Office Meeting Room 522 7 81 % Image Understanding, vol. 154, pp. 1–15, 2017.
[9] R. Tapu, B. Mocanu, A. Bursuc, and T. Zaharia, “A smartphone-
based obstacle detection and classification system for assisting visually
V. C ONCLUSIONS A ND F UTURE W ORK impaired people,” in Proceedings of the IEEE International Conference
on Computer Vision Workshops, 2013, pp. 444–451.
The proposed system has achieved acceptable accuracy
[10] S. Sivan and G. Darsan, “Computer vision based assistive technology
level for standard size room environment. The accuracy is for blind and visually impaired people,” in Proceedings of the 7th In-
near 90% in classroom and office room. A bone conduction ternational Conference on Computing Communication and Networking
earphone is considered in the system, as visually impaired Technologies, 2016, pp. 1–8.
people may also have some problem in their hearing. This [11] M. R. Rajalakshmi, M. K. Vishnupriya, M. M. Sathyapriya, and M. G.
system can be developed more efficiently by selecting proper Vishvaardhini, “Smart navigation system for the visually impaired using
high functioning camera, more processing power for labelling tensorflow.”
and image processing. As future work, we would like to [12] A. Goel, A. Sehrawat, A. Patil, P. Chougule, and S. Khatavkar,
“Raspberry pi based reader for blind people,” International Research
consider wide angle camera to cover more objects in single Journal of Engineering and Technology, vol. 5, no. 6, pp. 1639–1642,
frame and connect the system with internet more efficiently to 2018.
collect images in real time with powerful processing unit. [13] M. Auvray, S. Hanneton, and J. K. O’Regan, “Learning to perceive
with a visuo—auditory substitution system: localisation and object
ACKNOWLEDGMENT recognition with ‘the voice’,” Perception, vol. 36, no. 3, pp. 416–430,
2007.
We appreciate the support received from the a2i Innovation [14] D. A. Ross, “Implementing assistive technology on wearable comput-
Fund of Innov-A-Thon 2018 (Ideabank ID No.: 12502) from ers,” IEEE Intelligent systems, vol. 16, no. 3, pp. 47–53, 2001.
978-1-7281-7366-5/20/$31.00
2020
c IEEE 189