0 ratings0% found this document useful (0 votes) 50 views11 pagesBase Pap1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Th acehas be cepted eins fst oft oral Cane lope ah he cpt of gen
An AI-Based Visual Aid With Integrated Reading
Assistant for the Completely Blind
‘Muiz Ahmed Khan ®, Pias Paul ©, Mahmudur Rashid ®, Student Member, IEEE, Mainul Hossain®, Member, IEEE,
and Md Atiqur Rahman Ahad ®, Senior Member, IEEE
Abstract —Blindness prevents a person from gaining knowledge
of the surrounding environment and makes unassisted navigation,
‘object recognition, obstacle avoidance, and reading tasks a major
challenge. In this’ work, we propose a novel visual aid system
for the completely blind. Because of its low cost, compact size,
and ease-of-integration, Raspberry Pi 3 Model B+ has been used
to demonstrate the functionality of the proposed prototype. The
design incorporates a camera and sensors for obstacle avoidance
and advanced image processing algorithms for object detection.
‘The distance between the user and the obstacle is measured by
the camera as well as ultrasonic sensors. The system includes
an integrated reading assistant, in the form of the image-to-text
converter, followed by an auditory feedback, The entire setup
is lightweight and portable and ean be mounted onto a regular
pair of eyeglasses, without any additional cost and complexity.
Experiments are carried out with 60 completely blind individuals to
evaluate the performance ofthe proposed device with respect to the
traditional white cane, The evaluations are performed in controlled
environments that mimic real-world scenarios encountered by a
blind person. Results show that the proposed device, as compared
with the white cane, enables greater accessibility, comfort, and ease
of navigation for the visually impaired.
Index Terms—Blind people, completely blind, electronic
navigation aid, Raspberry Pi, visual aid, visually impaired people,
wearable system.
1, InrRopverion,
LINDNESS or loss of vision is one of the most common
disabilities worldwide. Blindness, either caused by natural
‘means or some form of accidents, has grown over the past
decades, Partially blind people experience a cloudy vision, see-
ing only shadows, and suffer from poor night vision or tunnel vi-
sion. A completely blind person, on the other hand, has no vision
“Manuscriperceived March 3, 2020;rvise Tuy 23,2020; accepted Septem
ber, 2020 Tis artcle was recommended by Artoriaie Editor Z. Ya. (Corr.
‘sponding author: Mail Hossain)
‘Muir Abmed Khan, Psa Pal and Makar Rashid are with the Deparment
of Blecial and Computer Engineering, Norh Souh University, Dhaka 1229,
Bangladesh (email: muiz Khan northsoulh ed, paul pas @northsouth ed
mahnaudurrashid@ nortisouth ed)
‘Mainul Hostain is with the Deparment of Hlecteal and Eletoni Bn
sincering, University of Dhaka, Disk 1000, Bangladesh (e-mail: mail
creidsacbe)
‘Mé Atgue Rahman Ahad is with be Deparment of Hlectical and Hleewonic
Engineering. University of Dhaka, Dhaka 1000, Bangladesh, and also with
‘he Deparment of Intelligent Media, Osaka University, Suita 365-0871, Japan
(emai aiqabadied ab,
(Color versions of one o more of the figures inthis article ae avalable online
at htpaccexplore eos org.
Digital Object Kensie 10.1109-THMS. 20203027534
at all, Recent statistics from the World Health Organization esti-
‘mate the number of visually impaired orblind people tobe about
2.2 billion [1]. A white cane is used traditionally by the blind
people to help them navigate their surroundings, although use of
the white cane does not provide information for moving obsta-
cles that are approaching froma distance. Moreover, white canes
are unable to detect raised obstacles that are above the knee level
‘rained guide dogs are another option that can assist the blind
However, trained dogs are expensive and not readily availabe.
Recent studies have proposed several types [2]-[9] of wearable
‘orhand-held electronic travel aids (ETAS). Most of these devices,
integrate Various sensors to map the surroundings and provide
voice or sound alarms through headphones. The quality of the
auditory signal, delivered in real-time, affects the reliability of
these gadgets. Many ETAs, currently available in the market,
ddo not include 2 real-time reading assistant and suffer from a
poor user interface, high cost, limited portability, and Tack of
hands-free access. These devices are, therefore, not widely pop-
‘ular among the blind and require further improvement in design,
performance, and reliability for use in both indoor and outdoor
settings.
In this article, we propose a novel visual aid system for
completely blind individuals. The unique features, which define
the novelty of the proposed design, include the following
1) Hands free, wearable, low power, and compact design,
mountable on a pair of eyeglasses, for the indoor and
‘outdoor navigation with an integrated reading assistant
2) Complex algorithm processing with a low-end configura-
3) Real-time, camera-based, accurate distance measurement,
which simplifies the design and lowers the cost by reduc-
ing the number of required sensors.
‘The proposed setup, in its current form, can detect both
stationary and moving objects in real time and provide auditory
feedback to the blind. In addition, the device comes with an
in-built reading assistant that is capable of reading text from
any document. This article discusses the design, construction,
and performance evaluation of the proposed visual aid sys-
tem and is organized as follows, Section I summarizes the
cexisting literature on blind navigation aids, highlighting their
benefits and challenges. Section III presents the design and the
working principle of the prototype, while Section IV discusses
the experimental setup for performance evaluation. Section V
summarizes the results using appropriate statistical analysis.
Finally, Section VI concludes the article.
2168-2201 © 2020 IEEE, Personal use is permite, but republicsion/edistibution requires TBEE permission,
See hpssoew ieee ory publiationtrighsindexhaml fr more information,
‘Authored lensed use tad lo: Aucdand Univers of Tecnology. Downloaded on November 02020 at 15 50:74 UTC Kem IEE Xplore. Resticton‘Thr ade be ciple erin free ofthis orm Conte Elapsed the pn of pga
ITI, RELEVANT WoRK
‘The electronic aids for the visually impaired can be cat-
egorized into three different subcategories, ETAs, electronic
orientation aids, and positional locator devices. ETAs provide
object detection, warning, and avoidance for safe navigation,
[10]-[12]. ETAs work in few steps; sensors are used to collect
data from the environment, which are then processed through a
‘computing device to detect an obstacle or object and give the user
a feedback corresponding tothe identified object. The ultrasonic
sensors can detect an object within 300 cm by generating a
40 kH¥z signal and receiving reflected echo from the object in
front of it. The distance is calculated based on the pulse count
and time-of-fight (TOF). Smart glasses [2 [9] and boots [12],
mounted with ultrasonic sensors, have already been proposed as
anaid tothe visually impaired. A new approach by Katzschmann,
‘etal. [13] uses an array of infrared TOF distance sensors facing in
different directions, Villanueva and Farcy [14] combine a white
«cane with near-IR LED and a photodiode to emit and detect the
IR pulses reflected from obstacles, respectively. Cameras [15]
[16] and binocular vision sensors [17] have also been used to
‘capture the visual data for the blind.
Different devices and techniques are used for processing the
collected data, Raspberry Pi 3 Model B-+, with open computer
vision (OpenCV) software, has been used to process the images
captured from the camera [18]. Platforms such as Google tango
[3] have also been used, A cloud-enabled computation enables
the use of wearable devices [2]. field-programmable gate array
is also another option to process the gathered data [19]. The
preprocessing of captured images is done to reduce noise and
distortion, Images are manually processed by using the Gaussian
filter, gray scale conversion, binary image conversion, edge
detection, and cropping [20]. The processed image is then fed
to the Tesseract optical character recognition (OCR) engine to
‘extract the text from it [21]. The stereo image quality assessment
[17] employs a novel technique to select the best image, out
‘of many: The best image is then fed to a convolutional neural
nctwork (CNN), which is trained on big data and runs on a cloud
device. The audio feedback in most devices is provided through
headset ora speaker. The audio is ether a synthetic voice [20]
from the text-to-speech synthesis system [22] or a voice user
interface [23] generating a beep sound. Vibrations and tactile
feedback are also used in some systems.
‘Andd ef al. [24] introduced a haptic device, similar to the
white cane, with an embedded smart sensing strategy and an
active handle, which detects an obstacle and produces vibra-
tion mimicking a real sensation on the cane handle, Another
traditional white cane like system, guide cane (13). rolls on
Wheels and has steering servo motors to guide the wheels by
sensing the obstacles from ultrasonic sensors. The backdrop
of this system is that the user must always hold the device by
their hand, whereas, many systems, which provide a hands-free
‘experience, are readily available, NavGuide [12] and NavCane
[25] are assistive devices that use multiple sensors to detect,
‘obstacles up to the knee level, Both NavGuide and NavCane are
‘equipped with wet floor sensors. NavCane can be integrated into
the white cane systems and offers a global positioning system
(GPS) with a mobile communication module
A context-aware navigation framework is demonstrated by
Xiao ef al. [4], which provides visual cues and distance sensing
along with location-context information, using GPS. The plat
form can also access geographic information systems, trans-
portation databases, and social media with the help of Wi-Fi
communication through the Internet, Lan et al, [26] proposed
a smart glass system, which can detect and recognize road
signs, such as public toilets, restaurants, and bus stops, in the
cities in real time. This system is lightweight, portable, and
flexible. However, reading out the road signage alone may not
carry enough information for a blind user to be comfortable
in an outdoor environment, Since public signs can be different
in different cities, therefore, if a sign is not registered in the
database of the system, the system will not be able to recognize
it. Hoang et al, (20) designed an assistive system using mobile
Kinect and a matrix of electrodes for obstacle detection and
warning. However, the system has a complex configuration and
an uncomfortable setup because the sensors are always placed
inside the mouth during navigation, Furthermore, itis expensive
and has less portability.
Islam ef al. (27] presented a comprehensive review of sensor-
based walking assistants for the visually impaired, The authors
identified key features that are essential for an ideal walking
assistant. These include low cost, simple, and lightweight design
with a reliable indoor and outdoor coverage. Based on the
feedback from several blind user groups, software developers,
and engineers, Dakopoulos and Bourbakis [10] also identified
14 structural and operational features that describe an ideal ETA
for the blind,
Despite numerous efforts, many existing systems do not in-
corporate all features tothe same satisfactory level and are often
limited by cost and complexity. Our main contribution here was
to build a simple, low cost, portable, and hands-free ETA pro-
{otype for the blind, with text-to-speech conversion capabilities
for basic, everyday indoor and outdoor use, While, the proposed
system, in its present form, lacks advanced features, such as
the detection of wet floors and ascending staircases, reading of
road signs, use of GPS, or mobile communication module, the
flexible design presents opportunities for future improvements
and enhancements.
IIL, DesiGn oF Tue Propose Devi
‘We propose a visual aid for completely blind individuals, with
an integrated reading assistant, The setup is mounted on a pair
of eyeglasses and can provide real-time auditory feedback to
the user through a headphone, Camera and sensors are used for
distance measurement between the obstacle and the user. The
schematic view in Fig. 1 presents the hardware setup of the
proposed device, while Fig. 2 shows a photograph of the actual
device prototype.
For the object detection part, multiple techniques have been,
adopted. For instance, TensorFlow object detection application
programming interface (API), frameworks, and libraties, such
as OpenCV and Haar cascade classifier, are used for detecting
faces and eyes and implement distance measurement, Tesseract,
which is a free OCR engine, for various operating systems, is
‘Authored lensed use tad: Auesand University of Technology. Downloaded an November %,2020 at 1550.14 UTC Fam IEE Xplore. Reston apply“The View Asistant
Rabe
Der] PE Te Peed
er
Fig. 1. Hardware configuration ofthe proposed sytem. The visual assistant
takes the image as inputs, proceses though the Raspesty Pi Procsie, and
{ses the aco feedhack through a headphone.
Fig.2._ Proposed prototype. Raspberry Pi wih the camera module and ult
oni sensors mounted on a egular pat of eyeglasses
used to extract text from an image. In addition, eSpeak, which
is a compact open-source speech synthesizer (text-to-speech),
used for auditory feedback for object type and distance between
the object and the user. For obstacles within 40-45 inches of
the user, the ultrasonic transducer (HC-SRO4) sets off a voice
alarm, while the eSpeak speech synthesizer uses audio feedback
to inform the user about his or her distance from the obstacle,
thereby, alerting the blind person and avoiding any potential
accident.
Raspberry Pi 3 Model B+ was chosen as the functional device
owing to its low cost and high portability. Also, unlike many
existing systems, it offers a multiprocessing capability. To detect,
obstacles and generate an alarm, a TensorFlow abject detection
API has been used. The API was constructed using robust
deep learning algorithms that require massive computing power.
Raspberry Pi 3 Model B+ offers a 1.2 GHz quad-core ARM
Cortex A53 processor that can output a video at a full 1080p
resolution with desired details and accuracy. In addition, it has
40 general purpose input/output (GPIO) pins, which were used,
in the proposed design, to configure the distance measurement
by the ultrasonic sensors.
A, Dara Acquisition
Fig. 3 shows how the Raspberry Pi 3 Model B+ is connectedto
other components in the system, Data are acquired in two ways.
Information that have red, green, and blue (RGB) data were
acquired using the Raspberry Pi camera module V2, which has
a high quality, &-megapixel Sony IMX219 image sensor. The
camera sensor, featuring a fixed focus lens, has been custom
Autoizd lonsed use lad ta: Auckand Univer of Technology. Downloa
Aasbery Zz
ater
Fig 3. Basichardware setup Raspberryi3 ModelB | andassocatedmodule
‘wih the camera and ultasonte sensors
designed to fit onboard into Raspberry Pi. It can capture 3280
pixels x 2464 pixels static images and supports 1080p, 720p,
and 640 pixels x 480 pixels video. It is altached (0 the Pi
‘module through small sockets, using the dedicated camera serial
interface. The RGB data are retrieved by our program, in teal
time, and can recognize objects from every video frame that is
already known tothe system.
‘To acquire data from the ultasonic rangefinder, HIC-SRO4
was mounted below the camera, as shown in Fig. 3, There are
four pins on the ultrasound module that were connected to the
Raspberry P's GPIO ports. VCC was connected to pin 2 (VCC),
‘GND to pin (GND), TRIG to pin 12 (GPIO18), and the ECHO
to pin 18 (GPI024). The ultzasonic sensor output (ECHO) will
always give output LOW (0 V), unless it has been tiggered, in
which cas, it will give ouput HIGH (5 V), Therefore, one GPIO
pin was set as an output to tigger the sensor and one as an input
to deteot the ECHO voltage change. However, this HIC-SR04
sensor requires a short 10 s pulse (o trigger the module, This
‘causes the sensor fo start generating eight ultrasound bursts, at
4 kHz, to obtain an echo response. So, to create the tigger pulse,
the trigger pin is set HIGH for 10 sand then set to LOW again.
‘The sensor sets ECHO to HIGH for the time it takes for the
pulse to tavel the distance and the relected signal to travel back
Once a signal is received, the value changes from LOW (0) to
HIGH (1) and remains HIGH for the duration of the echo pulse.
From the difference between the two recorded time stamps, the
sistance between the ultasound source and the reflecting object
can be calculated. The speed of sound depends on the medium
it is traveling through and the temperature of that medium. In
‘our proposed system, 343 mis, which is the speed of sound at
sea level, has been used.
B. Feature Extraction
‘The TensorFlow object detection API is used to extract
features (objects) from images captured from the live video
stream, The TensorFlow object detection API is an open-source
framework, built on the top of TensorFlow, which is easy to
integrate, train, and create models that perform well in different
scenarios. TensorFlow represents deep learning networks as
the core of the object detection computations. The foundation
of TensorFlow is the graph object, which contains a network
‘of nodes. GraphDef objects can be created by the ProteBuf
library to save the network, For the proposed design, apretrained
1 on Noverber 0.2020 a 1550.14 UTC em IEE Xplore, Reston apply.‘Thr ade be ciple erin free ofthis orm Conte Elapsed the pn of pga
x Te | el I SE
Ss] -S]|= i
T |G R=Fs]
oo e}y{= ;
Fig. 4, Complete workflow ofthe ropored system, The hardware interface
collet it from the environment. The sostware interfaces proces te collected
‘ate and generate an outpat response trough the au nteriace, Raspberry P
SB+ isthe cenl processing unit ofthe system.
‘model, called single-shot detection (SSD)Lite-MobileNet, from
the TensorFlow detection model zoo, has been used. The model,
200 is Google's collection of pretrained object detection mod-
cls trained on different datasets, such as the common objects
in context (COCO) dataset (28]. This model was particularly
chosen for the proposed prototype because it does not require
high-end processing capabilities, making it compatible with
the low processing power of the Raspberry Pi. To recognize
‘objects from the live video stream, no further training is required
since the models have already been trained on different types of
‘objects. An image has an infinite sct of possible object locations
and detecting these objects can be challenging because most of,
these potential locations contain different background colors,
not actual objects. The SSD models usually use the one-stage
object detection, which directly predicts object bounding boxes
for an image. This has a simple and faster architecture, although,
the accuracy is comparatively lower than the other state-of-the-
aut object detection models having two or more stages.
C. Workflow of the System
Fig. 4 shows the complete workilow of the proposed system,
‘with the hardware and software interfaces. Every frame of the
video is being processed through a standard convolutional net-
work to build a feature representation ofthe original image or the
frame. This backbone network is then pretrained on Image-Net
in the SSD model, as an image classifier, to learn how to extract
features from an image using SSD. Then, the model manually
defines a collection of aspect ratios for bounding boxes, at each
‘gid cell location. For each bounding box, it predicts the offsets
for the bounding box coordinates and dimensions. Along with,
this, the distance measurement is processed using both the depth,
information and the ultrasonic sensor. In addition, the reading
assistant works without interrupting any of the prior processes,
Al the three features run in the software interface with the help
‘of the modules from the hardware interface.
D. Object Detection
‘The human brain focuses on the region of interests and salient
objects, recognizing the most important and informative parts
ofthe image [29]. By extracting these visual attsibutes (30) the
deep learning techniques can mimic human brains and.can detect
salient objects from images, video frames [31], and even from
optical remote sensing [32]. A pixelwise and nonparametric
moving object detection method [33] can extract from the spatial
and temporal features and detect moving objects with intricate
background from the video frame. Many other techniques for
object detection and tracking, from the video frame, such as
the object-level RGB-D video segmentation, are also commonly
used [34]
FFor object detection, every object must be localized within a
bounding box, ineach frame ofa videoinput. A “region proposal
system” or Regions + CNN (R-CNN) can be used [35], where,
alter the final convolutional layers, a regression layer is added
to get a number that consists of four variables xo, yo, Width,
and height of the image. This process must train the support
vector machine for each clas, to classify between the object
and background, while proposing the region in each image. In
addition, a linear regression classifier needs tobe trained, which
will output some correction factor. To eliminate the unnecessary
bounding boxes from each class, the intersection over union
method must be applied to filer out the actual location of an
objectin each image. Methods used in faster R-CNN dedicatedly
‘provide region proposals, followed by a high-quality classifierto
classify these proposals [35]. These methods are very accurate
but come at a big computational cost. Furthermore, because
of the low frame rate, these methods are not fit to be used on
embedded devices
Object detection can also be done by combining the two tasks
into one network by having a network that produces proposals
instead of having a set of predefined boxes to look for objects
‘The computation that is already made during the classification,
to localize the objects, could be reused. This is achieved by using
the convolutional feature maps from the later layers of anetwork,
upon which convolutional filters can be run, to predict class
scores and bounding box offsets at once. The SSD detector [36]
uses multiple layers that provide a finer accuracy’ on the objects
with different scales. As the layers go deeper, the bigger objects
become more visible. SSD is fast enough to infer objects in the
real-time video. In SSDLite, MobileNetw2 [37] was used as the
backbone and has depthwise separable convolutions fr the SSD
layers. The SSDLite models make predictions on a fixed-sized
arid, Each cell in this grid is responsible for detecting objects,
ina location, from the original input image and produces two
tensors as the outputs that contain the bounding box predictions
for different classes. SSDLite has several different grids ranging
in size from 19 x 19 to 1 1 cells, The number of bounding
boxes per grid cell is 3, fr the largest grid, and 6 for the others,
making a total of 19 x 17 boxes.
For the designed prototype, Google's abject detection API,
COCO, has been used, wich has 3.000 000 images of 90 most
found objects, The API provides five different models, making a
tradeoffhetween the speed of execution and the accuraey in plac-
ing bounding boxes. SSDLite-MobileNet, whose architecture is
shown in Fig. 5, 8 chosen asthe object detection algorithm since
it requires less processing power. Basically, SSD is designed
to be independent of the base network and so it can run on
‘Authored lensed use tad: Auesand University of Technology. Downloaded an November %,2020 at 1550.14 UTC Fam IEE Xplore. Reston applyTh acehas be cepted eins fst oft oral Cane lope ah he cpt of gen
Fig. 5. SSD Lite-MobileNet achitetwe
MobileNet [35]. With the SSDLite on top ofthe MobileNet, we
were able to get around 30 frames per second (fps), which is
enough to evaluate the system in real-time test cases. In places
‘where online access is either limited or absent, the proposed
device can operate offline as well. In $SDLite-MobileNet, the
“classifier head” of MobileNet, which made the predictions for
the whole network, gets replaced with the SSD network. As
shown in Fig. 5, the output of the base network is typically a
7 x7 pixel image, which is fed into the replaced SSD network
to do further feature extraction. Not only the replaced SSD
network takes the output of the base network but it also takes the
‘outputs of several previous layers. The MobileNet layers convert
the pixels from the input image into features that describe the
contents of the image and pass these along to the other layers.
A new family of object detectors, such as POLY-YOLO [38],
DETR [39], Yolact [40], and Yolact-++ [41], introduced instance
segmentation along with object detection. Despite the efforts,
many object detection methods still struggle with medium and.
large-sized objects. Researchers have, therefore, focused on
proposing better anchor boxes to scale up the performance of an
‘object detector with regards to the perception, size, and shape
of the object. Recent detectors offer a smaller parameter size
‘while significantly improving mean average precision. However,
large input frame sizes limit their use in the systems with low
processing power.
For object detection, MobileNetv2is used asthe base network,
along with SSD since itis desirable to know both high-level as
‘well as low-level features by reading the previous layers. Since
object detection is more complicated than the classification, SSD
adds many additional convolution layers on the top of the base
network. To detect objec in live feeds, we used a Pi camera
Basically, our script sets paths to the model and label maps,
loads the model into memory, initializes the Pi camera, and then
begins performing object detection on each video frame from
the Pi camera, Once the script initializes, which ean take up to
maximum of 308 alive video stream will begin and common
‘Authored lensed use tad lo: Aucdand Univers of Tecnology. Downloaded on November 02020 at 15 50:74 UTC Kem IEE Xplore. Resticton
>| Raspberry Pi oie |
Camera
Headphone Je
Fig. 6, Worklow forthe reading assistant, Raspherry P gets a single fame
fiom the camera module and runs tvough the Tesseract OCR engine. The west
‘utp hen converted to he aio.
‘objects inside the view of the user will be identified. Next, a
rectangle is drawn around the objects. With the SSDLite model
and the Raspherry Pi 3 Model B-+, a frame rate higher than 1 {ps
‘can be achieved, which is fast enough for most real-time object
detection applications,
E. Reading Assistant
‘The proposed system integrates an intelligent reader that will,
allow the user to read text from any document. An open-source li-
brary, Tesseract version-4, which includes ahighly accurate deep
earning-based model for text recognition, is used for the reader.
‘Tesseract has unicode (UTF-8) support and can recognize many
Ianguages along with various output formats: plain-text, hoc
(HTML), paf, tsv, and invisible-text-only paf. The underlying
‘engine uses a long short-term memory (LSTM) network. LSTM
is part of a recurrent neural network, which is a combination of
some unfolded layers that use cell states in each time steps to
predict letcers from an image. The captured image is divided into
horizontal boxes, and in each time step, the horizontal boxes are
being analyzed with the ground truth value to predict the output
etter. LSTM uses gate layers to update the cell state, at each
time step, by using several activation functions. Therefore, the
time required to recognize texts can be optimized.
Fig. 6 shows the working principle of the reading assistant. An
image is captured from the live video feed without interrupting
the object detection process, In the background, Tesseract API
willextract the texts from the image and save them ina temporary
texttile. Thenitzeads out the text from the text file using the text-
to-speech engine eSpeak. The accuracy of the Tesseract OCR
‘engine depends on ambient lighting and background and usally
‘works well in the white background and brightly illuminated
places
IV, SYSTEM EVALUATION AND EXPERIMENTS
A. Evaluation of Object Detection
(Our model (SSDLite) is pretrined on the Image-Net dataset
for the image classification. It draws a bounding box on an,
oy‘Thr ade be ciple erin free ofthis orm Conte Elapsed the pn of pga
Fig.7. Single Object Detection. The object detection algorithm can detect the
cl phone with 97% confidence
Keyboard: 50%
Pg cet prone. ooo
vo
Rig. & Detecting multiple objects, with various confidence levels, fom
single Irae (white boxes are ade forte visibility for readers).
7
wy
‘object and tries to predict the object type based on the trained
data from the network. It directly predicts the probability that
‘each class is present in each bounding box using the softmax
activation function and cross entropy loss function, The model
also has a background object class when itis classifying different
“objects. However, there can bea large number of hounding boxes
detected in one frame with only background classes. To avoid
this problem, the model uses hard negative mining to sample
negative predictions or downsampling the convolutional feature
_maps to filter out the extra bounding boxes.
Fig. 7 shows the detection of a single object from a video
stream, Although most part of the image contains the back-
ground, the model is still able to filter out other bounding
boxes and detect the desired object in the frame, with 97%
confidence. The device can also detect multiple objects, with
different confidence levels, from one video frame, as shown in
Fig. 8. Our model can easily identify up to four or five objects,
PERFORMANCE OP SONOLE AND Mum Oziscr DetEeTion
Tat
CCases_ Actual Object (=) _Preictd Objet (@)_ Failure Cate)
1 Pesan er ‘Nase
Mouse Movse None
Reson Pere Nae
N
ok Notebook
4
5 CellPhone CellPhone Nane
6 Peon, Chai Mouse Perea, Chai, Mouse None
7 Call Phone, Notebook CellPhone, Laptop Laptop
Notebook, Peron —Neteook, Paras None
9 Pen, Mowse, Keyboard Pen, Mouse, Keyboard None
0 Batie,Csl)Phone Hote, Cell Phone None
1 Clock Backpack Clock, Backpack None
2 Batle, Cup, Chair Hose, Cap, Chait None
3 Chai, Person "Person None
4 Laptop, Bed Cup Laptop, Bed. Cup Bod
'S Person, Chai, Cop Pason, Chait, Cup None
6 Person, Bench, Chair Person, Bench, Chair ‘None
7 Cellphone, Cup Cell phone, Cup Nove
8 Kaif, Cel phone Kai Call phone None
Kaif Spoor, anata Kaif, Spon, Buse None
20, Agple Orange, Bana, Baza, Bow, Knife, Orage, Aple
Bowl KrfeSpoon Spoon
21 BicylePanon, Cr Breyele, Peron, Chait None
22 Peron Bieyele,Car, Paro, Biel, C Note
ike Motori
simultaneously, from a single video frame, The confidence level
indicates the percentage of times the system can detect an object,
without any failure
‘Table I summarizes the results from single and multiple object,
detection, for 22 unique cases, consisting of either a single
item or a combination of items, commonly found in indoor and
outdoor setups, The system can identify single items with near
100% accuracy with zero failure cases. Where multiple objects
are inthe frame, the proposed system can recognize each known
object within the view. For any object situated in the range of
15-20 m from the user, the object can be recognized with at
least 80% accuracy. The camera identifies objects based on their
ground truth values (in %), as shown in Figs. 7 and 8. However,
to make the device more reliable, the ultrasonic sensor is also
used to measure the distance between the object and the user
‘Whenever there are multiple objects, in front of the user, the
system will generate feedback for the abject, which is closest to
the user: An object with a higher ground truth value has a higher
priority. The pretrained model, however, is subject to failure due
to variation in the shape and color of the object as well as changes
in ambient lighting conditions,
B. Evaluation of Distance Measurement
Fig. 9 shows the device measuring the distance between a
computer mouse and the blind person using the ultrasonic sensor.
Ifthe distance measured from the sensor is less than 40 em, the
user will get a voice alert saying that the object is within 40 em,
The sensor can measure distances within a range of 2-120 em
by sonar waves
“Fig. 10 demonstrates the case where the combination of
camera and ultrasonic sensor is used to identify a person's face
‘Authored lensed use tad: Auesand University of Technology. Downloaded an November %,2020 at 1550.14 UTC Fam IEE Xplore. Reston applyTh acehas be cepted eins fst oft oral Cane lope ah he cpt of gen
Fig. 9. Measuring the distance of « mouse fom the prototype device wing
Person: 98%
Fig. 10, Face detection and distance measurement fom a single video rae,
and determine how far the person is from the blind user. The
integration ofthe camera with the ultasonic sensor, therefore,
allows simultaneous object detection and distance measurement,
which adds novelty to our proposed design. We have used the
Hoar cascade algorithm [42] to detect face from a single video
frame. It can also be modified and used for other objects. The
bounding boxes, which appear while recognizing an object,
consist of arectangle. The width w, height h, andthe coordinates
of the rectangular box (xo, yo) ea be adjusted as required.
Fig. L1 demonstrates how the distance between the object
and the blind user can be simultaneously measured by both,
the camera and the ultrasonic sensor. The dotted line (6 m)
represents the distance measured by the camera and the solid
line (5.6m) represent the distance calculated from the ultrasonic
sensor. Width w and height kof the bounding box aze defined in
the xml file with feature vectors, and they vary depending on the
distance berween the camera and the object. In addition to the
camera, the use of the ultrasonic sensor makes object detection
more reliable. The following equation, which can be derived by
considering the formation of image, as light passes through the
‘Authored lensed use tad lo: Aucdand Univers of Tecnology. Downloaded on November 02020 at 15 50:74 UTC Kem IEE Xplore. Resticton
Fig. 11, Demonstration ofthe distance messueaent using camera and liea-
TABLET
stave MeAstnantnt BETWEEN OazECr AxD Use
Distance
Tes\Cusse _Actusl_Uliseonic Sensor_ Camera
3S 302, Fn
2 16 Iss 7
3 BI Ba B
4 a4 21 “1
5 a 1
camera lens [43], is used to calculate the distance between the
‘object and user:
(2x 3.14 x 180)
distance (inches) = aos)
‘The actual distance between the object and the user is mea-
sured by a measuring tape and compared with that measured
by the camera and the ultrasonic sensor, Since the camera can
detect a person's face, the object used in this case is a human
face, as shown in Fig. 10. Table II summarizes the results. The
distance measured by ultrasonic sensors is more accurate than
that measured by the camera, Also, the ultrasonic sensor can
respond in realtime so that it can be used to measure the distance
between the blind user and a moving object. The camera, with
ahigher processing power and more fps, has a shorter response
time, Although the camera takes slightly more time to proces
both camera and ultrasonic sensors can generate feedback at the
same time.
C. Evaluation of Reading Assistant
‘The integrated reading assistant in our prototype is tested.
under different ambient lighting conditions for various combina
tions of text size, font, color, and background, The OCR engine
performs better in an environment with more light asit can easily
‘extract the text from the captured image. While comparing text
‘with different colored background, it has been shown that a
‘wellilluminated background yields better performance for the
reading assistant. As given in Table UI, the performance of the
reading assistant is tested under three different illuminations:
bright, slightly dark, and dark, using the green and black-colored‘Thr ade be ciple erin free ofthis orm Conte Elapsed the pn of pga
PERFORMANCE OF THE READING ASSISTANT
Tr Text Paper Color Arion
Gases Text___—Colar Background) Lighting Performance
Reads
1 Whateant do foryou? Blick White Bright Asuately
Slight Docs not Read
2 Whateant oforyou? Buck White Dark Accurately
Doce not Read
3. What can [do foryou? Black White Dark Accurately
Tam doing well, Green While ‘Bright Read sesately
Slight Docs not Read
5 Lamdoing well. Groen White atk Accurately
Does not Read
§ _Lamdbsing well Green White ___Dark_Aceurtly
| Child Sittin
1 ‘on Chait
Fig.
‘Typical oudoor test envisonment
texts, written on white pages. When the text color is black, the
device performed accurately in bright and even in a slightly dark
‘environment but under the dark condition, it failed to read the
full sentence. For the green-colored text, the reading assistant
had no issues in the brightly lit environment but failed to perform
accurately in slightly dark and dark conditions.
D. Experimental Setup
‘The usability and performance of the prototype device is
primatily tested in controlled indoor settings that mimic real-life
scenarios, Although the proposed device functioned well in a
typical outdoor setting, as shown in Fig. 12, the systematic study
and conclusions, discussed in the following sections, are based
‘on the indoor setup only.
‘A total of 60 completely blind individuals (male: 30 and
female: 30) volunteered to participate in the controlled experi-
ments. The influence of gender or age on the proposed system is
beyond the scope of our current work and has, therefore, not been
investigated here, However, since the gender-based blindness
studies [44], [45] have shown blindness to be more prevalent
among women than in men, it is important to have female
blind users represented in significant numbers, inthe testing and
evaluation of any visual aid. Dividing 60 human samples into
30 males and 30 females (o study separately, could, therefore,
prove useful to conduct the gender-based evaluation study of the
~~
Water Jar |
ue
=
Chair
Fig 13
‘Testing the proteype in an indoor sting
proposed system in future endeavors. A short training session,
over a period of 2 hours, is conducted to familiarize the blind
participants with the prototype device. During the training, the
evaluation and scoring criterion were discussed in detail
‘The indoor environment, as shown in Fig. 13, consisted of six
stationary obstacles of different heights and a moving person
(aot shown in Fig. 13). The position of the stationary objects
was shutiledto create ten different indoor test setups, which were
assigned, in random, to each user. A blind individual walks from
point to point B, along the path AB (~15 min length), first with
four proposed blind assistant, mounted on a pair of eyeglasses,
and then with a traditional white cane, For both the device andthe
white cane, the time taken to complete the walk was recorded for
cach participant. Based on the time, the corresponding velocity
for each participant is calculated, The results from the indoor
setting, as shown in Fig. 13, are summarized and discussed in
Section V.
E, Assessment Criterion
Blind participants were instructed to rate the device based
fn its comfort level or ease-of-use, mobility, and preference
compared with the more commonly used traditional white cane.
Ratings were done on a scale of 0-5 and the user experiences for
comfortability, mobility, and preference over the white cane are
divided into the following three categories based on the scores:
1) worst (score: 0-2);
2) moderate (score: 3);
3) good (score: 4 and 5),
‘The preferability scores also refer to the likelihood that the
user would recommend the device to someone else. For example,
1 score of 3 for preferability means that the user is only slightly
impressed with the overall performance of the device, while a
score of | means that the blind person highly discourages the
use of the device. The accuracy of the reading assistant was also
scored on & scale of 0-5, with 0 being the least accurate and 5
being the most. The total score, from each user, is calculated
by summing the individual scores for comfort, mobility, prefer-
ability, and accuracy of the reading assistant, In the best-case
scenario, each category gets a score of 5 with a total score of
‘Authored lensed use tad: Auesand University of Technology. Downloaded an November %,2020 at 1550.14 UTC Fam IEE Xplore. Reston applyTh acehas be cepted eins fst oft oral Cane lope ah he cpt of gen
‘Velocity Men
Fig. 14 Velocity of bling patcipants walking {rom point Ato Bin Fig. 13
TABLE
‘lind Assnant White Cane
‘Ave Veloriy Ave, Velociy
ais) ess)
T2677 a2177
02000 cant
TABLE
PARAMETERS AND VALUES USED FOR T-TEST
ind Aisa Winte Gane
Vican Valoaiiy 0.2659 D258
Standard or Tae
Deviation (wis)
ple w w
20. Depending on the total score, the proposed blind assistant is
labeled as “not helpful (total score: 0-8),” “helpful ((otal score:
9-15)." and “very helpful ((otal score: 16-20).” These labels
‘were set after an extensive discussion with the blind participants
prior to conducting the experiments. Almost all the blind users
‘were participating in such a study for the first time with no
prior experience of using any form of ETA. Therefore, it was
necessary to set a scoring and evaluation criterion that could
be easily adopted without the need for advanced training and
extensive guidelines,
V. RESULTS AND Discussion
Fig. 14 plots the velocity at which each blind user completes
«a walk from point A to point B, as shown in Fig. 13. For each
user, the speed achieved using the blind assistant and the white
cane is plotted. The plots for male and female users are shown
separately. Table IV lists the average velocity for 30 male and
‘Authored lensed use tad lo: Aucdand Univers of Tecnology. Downloaded on November 02020 at 15 50:74 UTC Kem IEE Xplore. Resticton
PN
Fig 1.
Fig.
‘Ute eling forthe proposed device tested in the indoor setup of
30 female participants. It is evident from the table that, on an
average, the blind assistant provides slightly faster navigation
than the white cane, for both the genders. To compare the per
formances between our proposed blind assistant and the white
cane, a Hest is performed with a sample size of 60, using the
following statistics:
@
‘where x), s5,and mare the mean, standard deviation, and sample
size, respectively, for the experiment with the blindassistant. The
corresponding values for the white cane are denoted BY xy, 5.
and ny. Table V lists the values used in the Hest,
With a revalue equal to 4.9411, the two-tailed P value is
less than 0.0001. Therefore, by the conventional criteria and
at 95% confidence interval, the difference in velocity between,
the blind assistant and white cane can be considered statistically
significant.
‘The user ratings are plotted in Fig. 15, which shows the
individual scores for comfort, mobility, preference, and accuracy
ofthe reading assistant, onaseale of 0-5, foreach ofthe 60 users
Inaddition, the total score, rated on ascale of 0-20, isalso shown,
‘The average of all scores is 14.5, which deems our proposed
device as “helpful” based on the criterion defined in Section
IVE, Since we only used a prototype (© conduct the experi-
‘ments, the comfort level was slightly compromised. However,
the mobility and preference of the proposed device over the white
‘cane gained high scores. The pretrained model, which was used,
could be retrained with more objects for better performance.
‘The reading assistant performed well under brightly illuminated
settings. One major limitation ofthe reading assistant, as pointed
‘out by the users. is that it was unable to read texts containing
tables and pictures,
‘A cost analysis was done with similar state-of-the-art assistive
navigation devices. Table VI compares the cost of our blindCost of Praroseo Device VERSUS EXISTING VISUAL AIDS
Davie Enimated Cant (USD)
Dar Proposed Device =
‘Lanta. [26] 4
Siang er a [17] ow
Rajesh eta (46) 0
White Cane 2s
assistant with some of the existing platforms, The (otal cost
‘of making the proposed device is roughly US $68, whereas
some existing devices, with a similar performance, appear more
expensive. The service dogs, another Viable alternative, can cost
up (o US $4000 and require high maintenance. Although the
‘white canes are cheaper, they are unable to detect moving objects
and do not include a reading assistant
VI. Conctusion
‘This research article introduces a novel visual aid system, in
the form of a pair of eyeglasses, for the completely blind. The
kkey features of the proposed device include the following.
1) The hands free, wearable, low power, low cost, and com-
pact design for indoor and outdoor navigation.
‘The complex algorithm processing using the low-end pro-
cessing power of Raspberry Pi 3 Model B +
Dual capabilities for object detection and distance mex
surement using @ combination of camera and ultrasound
4) Integrated reading assistant, offering image-to-text con-
version capabilites, enabling the blind to read texts from.
any document.
‘A detailed discussion, on the software and hardware aspects
of the proposed blind assistant, has been given, A total of 60
completely blind users have rated the performance of the device
in well-controlled indoor settings that represent real-world sit-
uations, Although the current setup lacks advanced fonctions,
such as wet-floor and staizcases detection or the use of GPS
and mobile communication module, the flexibility in the design
eaves room for future improvements and enhancements, Inaddi-
tion, with the advanced machine learning algorithms and a more
improved user interface, the system can further be developed
and tested in more complex ouldoor environment
2)
REFERENCES
uy
Blindness and vision impairment, World Health Organization, Geneva
Switzerland, Oc. 2019 [Online Available: htpsfww who innewse
roomi/factesheetl detabindness-and-vstl-mpsirea
1Bai § Lia, 7 Liu, K Wang, abd Li, Virtual-bind-oad flowing
hazed wearahleaavigation device fr bling peopl," IEEE Trans. Consum,
Elecron, vol. 4, 00 1, pp. 136-183, Feb. 2018,
B, Li etal, “Visonsazed mobile indoor assistive navigation aid for
blind people” IEEE Tran. Mobile Comput, vl. 8, 20.3, pp. 702-71,
Mar 2019,
1 Xiao, $. 1 Joseph, X. Zhang, B-Li, X Li, and J. Zang, "An asitive
bavigation famework forthe visually impaired" ZEEE Trans. Huma
‘Mach Syt, val 43, 20.5, pp. 635-640, Ot 2015.
‘A. Karmel, A. Sharma, N Pandya, and D, Gat, “ToT based assistive
Aevice for deat mand blind people.” Procedia Comput. Seto. 165,
‘pp 259-269, Now. 2019,
el
bh
i
151
co}
m
8)
°
(Ho)
i
ua)
03)
ua)
us}
as)
7
as)
9)
(20)
en
22)
a
4)
ps)
61
won
pa
9)
eta epee the exceptin of pgtin.
Ye and X. Qian, "3-D object recogition fa robotic navigation aid for
the vnully impaled" IREE Trans: Neural Syot Rehabil ng. vl. 2,
0.2, pp. 441-450, Fe, 2018
Y Lid N. RR. Sule, and MM Meister, “Augmented realy pomers
cognitive assistant forthe blind” elif, vl. 7. Nov. 2018, Anno. 37841
‘AcAdebiyict al, "Assessment of feedback modalies fr wearable visual
ide in blind mobility,” PLoS One, vo. 12, n0. 2, Fed. 2017, Am. no
cn170581
4 Ba, S. Lian, Z. Lin, K. Wan, and D. Liu, "Sar giding glasses for
visually ispiced people i indoor environment, IEEE Trans. Consum.
Elecirom, ol. 63, 0. 5, p. 258-260, Aug. 2017
1. Dakopoulos and N.G. Bourbakis, "Weasabe obstacle avoidance clee-
‘won tvel ais for blind A survey" IEEE Trans. Syst Man, Cybern
Part C Appl Re. so. 40, 20.1, pp. 25-85, Ta. 2010.
ELE, Pstaloux, R,Velazqet, and F. Maingreat, "A new framework for
cogaitive mobility of visually paid uses in using tact device” JEBE
Trane Humat-Mach Sy, ol. 7,0. 6, pp. 1080-1051, Dex 2017,
Pui, Q. Tavadvala, and F.C. St, “Design and constuction of
electronic ad for visually impaired peopl.” /EEE Trans. aman-Mach
Syst vl 48, 00.2, pp. 172-182, Ap. 2018,
RK Katzschmann, B, Arak, and D. Rs, “Safe local navigation for
visually impaired ure with ine-of ight and hap feedback device
IEEE Trane, Neural Syet Rehabil. Eg. vol. 28, 20.3, pp. 583-595,
‘Mar. 2018
1. Villasovs and R, Fare, “Optical deve indicating a sa re pao
‘ind people: ZEKE Trans. Instrum. Meas, vol. 61, 20.1, pp. 170-177,
Jan, 2012.
2X. Yang, §. Youn, and ¥. Tan, “Assistive clothing pattern recognition for
‘ivalyimpsired people” IEEE Trane Human Mack Sy, vol 4,70, 2
pp. 244-268, Ape 2014
S'L Joseph cal, “sing aware ofthe world: Toward using soral media
fo support the blind with navigation” IEEE Trant. Human-Mack. St.
sol 43,10. 3, pp. 398-405, um. 2015,
B.Tang, J. Yang, Z Ly, and H. Song, “Wearable vision assistance system
based on binaclar sensors for vnlly impaired ers” IEEE Internet
Things 1, el. 6, 0.2 pp. 1375-1383, Age 2019.
L, Tepeles, 1 Boe, ©. Graa, I, Gav, and A. Gaeta, “A vision
‘module for visually inpire people by using vasphery PY platform."
Proc [Sih Int Conf Eng. Modern Elec Sy. (EMES), Oradea, Romania,
2019, pp. 208-212
1, Duns, G. Peri-Fsjameé, B,Llona, and B. Dele, “Sentry navige-
sion device for Hind people” J Nw. vol. 65, 20. 3 pp. M9362,
May 2013,
V-N. Hoang. T-H. Nguyen, T-L. Le, T-H. Than, T-P. Voong. and N.
‘Veillerme, "Obstacle detection and warning system for visually impaired
people bated on clecuode matix and mobile knee.” Vietnam 1. Compu
Set, vol no. 2, pp. Tin88, ul 2016
(CL Patel A: Patel and. Pael,“Opsial character recognition by open
source OCR tool Tesseract: (ease tid.” In. Comput Appl, vol. 58
‘0.10, pp. 50-56, Oct. 2012
‘A. Chalamandaris, S. Karabeleos, P.Teakoubis, and $, Rape, “A anit
Selection fexto-speech synthesis system optimized for use with screen
eaders," IEEE Trans. Conrum Electro, vol 56,80 5, pp. 1890-1897,
‘Aug. 2010,
Keefer, ¥. Liu, and N. Bourbakis, “Th development and evaluation of
san eyertee interaction model for mobile reading devices” JEEE Trans,
Human Mach Sy, vol 43,90. 1, pp. 7691, Jas. 2013.
B. Ando, §. Bagi, . Maeta, nd A. Valacro, "A haptic solution to
ist visually impaired in mobility ask," JERE Trane: Haman Mach
Suit vl. 43, n0. 5, pp. 641-646, Ot. 2015,
VV. Meaheam, K.Patl, VA’ Mesiram, and F.C. Shu, “An astute
assistive device for mobility and objet recognition for visually impaired
people IEEE Trant Human Mack. Syst. vol 49, 0. 5, pp 443-60,
et 2018
FE Lan, 6, Zhi nd W. Li, “Lightweight smart glass sytem with audio
‘ai for visually impaired people" in Proc. IEEE Region 10 Cov, Maca,
hina, 2015, pp.
M.M Ishi, M.S. Sad, K. 2 Zam, and M, M, Abmed, ‘Developing
‘walking sisistants for vsully impaized people A review," IEEE Sens I.
WoL 19-0. 8, pp. 2814-2828, Ape. 2018
"EY Lin etl, “Microsoft COCO, Common objects in context” eb, 2015,
‘Online. Available: pax orga 1405.0312
1. Hane a, "Representing and retrieving video shots in homan-centre
brain imaging space" IEEE Trans Image Process, vol. 22, 00.7.
pp. 2725-2736, Tl 2013,
‘Authored lensed use tad: Auesand University of Technology. Downloaded an November %,2020 at 1550.14 UTC Fam IEE Xplore. Reston applyse Gs pened th he pt of gen
(30) 4. Han, K.N. Ngan, M13, and HJ. Zhang, “Unsupervised extraction of [39] N. Cation, F. Massa, G. symacve, N. Usunier. A. Kisilow, and
visual tention objects in color imagen” ZEEE Trans. Circus Syat ideo‘, Zagorusko, “End-to-end object detection with ansformers” May 2020
Techno, vo. 16.n0. 1 pp. 141-145, Jen, 2006, [Cnline). Avcilable:hup:/arxiv orgabs/2005 12872
[31] D- Zhang, D. Meng, and J. Han, “Conalieny detection via aselipaced [40] D.Bolya,C. Zhou, F Xizo, and Y.J.Lee, "YOLACT: Real-ime instance
‘muliplesinsance learning framework." IEEE Trans Paster Anal Mach. segmetaton” in Proc. IEEE/CVEF Conf. Comput. Vision, Seoul, South
Ise, vo. 390.5, py 865-878, May 2017, Korea, 2019, pp 4510-8520,
[32] 6 Cheag.P Zou, an. Han. "“Learingrotationnvarant convolutional [41] D Bolya,C. Zhou, Xiao, ané . J Lee, "YOLACT 4: Beterreal-ime
‘ural networks fr object deiection in VER optical remote sensing in instane sentation” Dee. 2019. [Online] Available: ps lant or!
lagen” IBEE Trane, Geores, Remote Sen. vol 54.20.12, 7p. 7405~7415, sba/1912 05218,
Dee. 2016. [12] R Padila, C. C. Filho, and M. Costa, "Rvaluation of Haar cascade
(031 ¥-Yang.Q. Zhang, P. Wang. X Hu, andN. We, Moving object detection clases designed for lace detection.” in. J Comput. Elect. Aton,
for dynamic background scenes based on spaotemporal model” Ads. Control I. Eng. ol 6,204, pp. 85-169, Ap. 2012
‘Maimedia, vo. 2017 sn, 2017, Ar 20.3179013. [s3] L Xiaoming, Q.Tan, © Wanehin, and ¥ Ningong, “Realtime distance
(24) Q.Xie,0-Remu, Y.Guo,M. Wang. M. Wei aad), Wang,“Objectdetection -measucinent using 4 modified camera in Proc IEEE Sensors App.
and tacking under oceusin for objecelevel KGB-D video segmentation, Symp. Limerick Ireland, 2010, pp 34-58,
IEEE Trans Multimedia ol. 20, no 5, pp 580-582, Mat 2018 i] LDoyal and RG. Das-Bhansk, "Sex, gender and Windness: A new
[35] S.Rea, K'He, R Girhick and. Stn, “FasterR-CNN: Towards reshtime _sfamework for equity” BMI Oper Ophthal, vol. 3, n0. 1, Sep. 2018,
‘object detection withtcgion proposal networks” IEEETrans. Patern nal ‘Seno, cO00135
‘Mach Ine. xo. 38. no. 6, pp. 1137-1189. un, 2 [5] M. Prasad, Manors M, Kalani P Vashi, and, K. Gupta, "Gender
[35] W'Liu eta, "SSD: Single shot mulibox detector” in Proc Eur Conf _dllerences in lindnss, eatatat indaess and cataactsusical coverage
Compu. Vision, vol. 9905, Sep. 2016, pp. 21-37 tn Indias A systematic review and metacanalyss” rt J Ophthalmol,
[37] M. Sandles, A’ Howard, M. Zu, A. Zamogiaoy, and I.-C. Chen, "Mo ‘ol 108, no. 2, pp. 220-224, Ja. 2020.
bileNetV2:Inverodresidvals and linear botleneck,"in Proc. IEEE/CVF 46] M. Rajesh etal. “Text recognition and lace detection aid for visually
Conf. Comput. Wa. Pattern Recognit, Sat Lake City, UT. USA, 2018, impaired person using raspbeny PI" in Proc Int. Cony Clrew, Power
pp. 4510-4320, Comput Technol, Kollam, Inéa, 2017, pp 1-5
(38) PrHurik, Mole, J. Hula, M. Val, P Vlasanek, and T. Nejzehleb,
Poly-YOLO: Higher sped, more precise detection and instance segmen
{ation for YOLO¥S," May 2020, [Online] Avalable- tp Jara-og/abs)
2005 13243,
‘Authored ieensed use tad lo: Auckland University of Technology. Downlosded on November 2020 at 15 50:14 UTC kam IEEE Xplore. Resticon apy