A Vision Module For Visually Impaired People by Using Raspberry Pi Platfrom PDF
A Vision Module For Visually Impaired People by Using Raspberry Pi Platfrom PDF
Abstract— The paper describes a vision based platform for • A multicore Android platform with its own CPU -
real-life indoor and outdoor object detection in order to guide Central Processing Unit; and GPU - Graphics
visually impaired people. The application is developed using Processing Unit, allowing powerful computational
Python and functions from OpenCV library and, ultimately tasks.
ported upon Raspberry PI3 Model B+ platform. Template
Matching is selected as method. More precisely, a multi-scale • Arduino platform aiming to detect obstacles with three
version approach is proposed to reduce the processing time and level of detection: head level, body level and legs level.
also to extend the detection distance range for accurate traffic The acquired information is further transmitted to the
sign recognition in indoor/outdoor environment. The smartphone via Bluetooth technology.
experimental part addressed the finding of the optimum values
for template and image source dimension, as well as the scaling • RaspberryPI platform, with an ARM Advanced RISC
factor. Machine; and RISC - Reduced instruction set
computing technology. Object detection and
Keywords—vision module, visually impaired people, raspberry recognition is performed using a camera as source and
PI platform, template matching, OpenCV. the software is OpenCV based.
I. INTRODUCTION The vision module for image processing proposed in the
paper is an integrated part of the platform dedicated to guide
Visually impaired people represent a significant
visually impaired people. Moreover, the proposed module can
population segment, currently the number being estimated to
be also used off the shell, independently of the integrated
tens of millions around the globe [1]. Their integration in the
platform.
society is an important and constant objective. A great effort
has been made to assure a health care system. Various The vision based guidance system proposed is designed,
guidance system techniques have been developed to assist the developed and validated throughout experiments and
visually impaired people in living a normal life. Often, these iteratively optimized. The module is compliant to the principle
systems are designed only for specific tasks [2][3]. of developing a highly performance device but cost effective
Nevertheless, these systems can greatly contribute to the with practical usage. The module is using disruptive
mobility and safety of such people. technology and allows for updating and inclusion of new
functions.
The development of state-of-the-art guidance systems to
assist visually impaired people is closely related to the II. VISION MODULE FOR VISUALLY IMPAIRED PEOPLE
advanced methods in image processing and computer vision
as well as to the speed performance of the devices and unit The visual scene is captured at various sampling rates.
processors. Regardless of the involved technology, the Each acquired image is next processed and the processing
application needs to operate in real time with quick actions output will trigger an acoustic alert message to the person,
and decisions, as the speed might be critical for tacking actions message depending on the detected object type. Regardless of
[4]. Basically, picking up the best solution is a trade-off the image processing functions and tasks, the processing
between the performance of the software component and the framework includes the following blocks:
hardware capabilities. Optimum parameters tuning is • A block responsible for image acquisition that is able
required. to accomplish some basic preprocessing steps if
During the indoor or outdoor movement of a visually required, according to the module objectives.
impaired person, one of the main objectives for the assisted • The main block for image processing, detection and
system is to automatically detect and recognize objects or object recognition.
obstacles followed by an acoustic alert [5][6].
• The acoustic alert block that notifies the visually
The integrated guidance system for visually impaired impaired person about the detected object.
people developed in [7] includes three basic platforms:
209
The process image acquisition image processing Typically, the correlation coefficient is used to compare
acoustic notification is looped for the entire person’s two images of the same object (or scene), taken at different
movement in the indoor or outdoor environment. Summing times. In real life scenario, there is no perfect match (100 %
the three processing times lead to the overall processing time matching score) as the source and template might suffer slight
which determines the acquisition rate for the input image geometrical distortion due to the perspective [10].
frames. The process needs to be fast enough, so that the
potential obstacles can be avoided timely [8]. Higher values of the correlation coefficients represent a
better match between the two images (the template image and
A. Template Matching or Image Correlation compared regions in the test image). Technically, for each
Template matching or correlation between two images can image, the maximum coefficient (or coefficients higher
be classified into feature-based and intensity based-methods. compared with a threshold value) will be searched and will
In the case of the intensity based methods different metrics or output the corresponding position for the detected object, if
procedures can be applied, such as: Euclidean distance, Sum any. The choice of the threshold value used for comparison is
of Absolute Differences (SAD), Mean Absolute Differences dependent on the application and it is often between 0.35 and
(MAD), Sum of Squared Differences (SSD) or Normalized 0.75. To compute the correlation coefficient, the relation (1)
Cross Correlation (NCC). was used.
Let us consider an input test (source) image Unfortunately, template matching approach has some
Φ(m,n): R2→R with dimension M×N, a template image or downsides that limits its performance for real life applications,
correlation kernel K(p,q): R2→R with dimension P×Q such as:
(p∈[1,P], q∈[1,Q]) and an actual (current) image region • Template matching is not robust against rotation and
Λ(p,q): R2→R, from test image Φ(m,n), with dimension P×Q scaling transformation, thus it may result in poor
(p∈[1,P], q∈[1,Q]). The image Λ(p,q) will compared to the performance.
template image K(p,q).
• The computing time for obtaining a correlation
By comparing the two images, the matching degree with coefficient is dependent on the sources image’s
the template image K(p,q) and an actual image region Λ(p,q) dimensions and on the template image’s dimensions,
from the test image Φ(m,n) is obtained by computing the increasing proportionally with them. On the other
normalized cross correlation coefficient which indicate how hand, it is desirable that the template image size should
well the pattern matches the contents of that region (compared be large enough to contain relevant information.
image) [9]:
B. The Raspberry Pi 3 Model B+ Platform
The Raspberry Pi 3 board we used is the latest version
P Q
(K(p,q) − K )⋅ (Λ(p,q) − Λ ) (1)
having the following specifications:
Corr(i,j) =
p =1q =1 • Broadcom processor BCM2837B0, Cortex-A53
P Q P Q (ARMv8) 64-bit SoC at 1.4GHz.
(K(p,q) − K ) (Λ(p,q) − Λ )
2 2
⋅
• 1GB SDRAM type LPDDR2.
p =1q =1 p =1q =1
• 2.4GHz and 5GHz IEEE 802.11.b/g/n/ac wireless
K is the mean intensity of the template image, LAN, Bluetooth 4.2, BLE.
Λ is the mean intensity of the test image. • Gigabit Ethernet port.
• Extended 40-pin GPIO header.
The test image is scanned pixel-by-pixel so that the
template image to completely overlap the test image and the • CSI camera port for connecting a Raspberry Pi
matching degree of each pixel is calculated, as can be seen in camera.
Fig. 1. Thus it results the correlation image Corr(i,j) or target • Power-over-Ethernet (PoE) support.
image. For image acquisition we have used a camera module v2
The correlation coefficient has the value Corr(i,j) = 1 if the with 8 megapixel high quality Sony IMX219 image sensor
two images are absolutely identical and Corr(i,j) = 0 if they with fixed focus lens. It supports 3280×2464 pixel static
are completely uncorrelated. If the two images are negatively images and 1080p30, 720p60 and 640p90 video frames.
correlated, the correlation coefficient has the value Corr(i,j) =
-1, for example, if one image is the negative of the other. III. IMAGE PROCESSING APPLICATION OF MODULE VISION
In designing the image processing module with the
Raspberry PI platform we have taken into account to develop
Φ(m,n) Template Matching Corr(i,j) a module that suits fast and accurate processing. To assess the
or performance it is necessary to test the module in real life
Image Correlation
Test or Source Image Correlation Image conditions and tune its parameters accordingly.
K(p,q)
Λ (p,q) The image processing method is applied to a specific
object detection task, more precisely traffic sign recognition.
pixel(i,j) pixel(i,j) We have used the integrated OpenCV function named
Compared Image Template Image cv2.matchTemplate existing in the library for the Python
Correlation Kernel version.
210
The module addressed the following specifications in the A. Indoor Experimental Results
design: The sign templates used to test the vision module are
• Required time between two consecutive video frames. illustrated in Figure 2. The size for each template is as follows:
We desire to obtain a low processing time for each 110×110 pixels, 76×76 pixels, 50×50 pixels and 32×32 pixels.
template. As we applied the approach for several For indoor testing (laboratory case) the size of the signs in the
scales, the summing processing time should be small image sources have 20cm x 20cm.
enough to permit real time decisions.
• Maximum and minima distance to which the matching
leads to an acceptable matching score for each
template. The detection distance interval should be as
large as possible to extend the distance range of the
module. Moreover, the module should be insensitive to
the external changing conditions, including
illumination conditions, scaling, or geometrical
deformation. It also needs to accept various traffic sign Fig. 2. Template images used in indoor testing
shapes.
Table I shows the results corresponding to the experiments
To address the template scale variation for the template, for various source images. We kept the same illumination
issue that might cause poor matching, we have employed a conditions during the experiments. The processing times
multiscale approach. correspond to a single template usage when the template
matching correlation is employed.
• The experiments were run for different source image
resolution, i.e.: 1440×1920 pixels, 960×1280 pixels TABLE I. INDOOR EXPERIMENTAL RESULTS
and 480×640 pixels. The template is then searched
upon each location in the source image. Size of Detection
Size of
source Processing distance
template Scale factor
• In the multiscale approach each acquired video frame image time [s] range [m]
[pixels]
[pixels]
is downsampled with various resolutions factors, i.e. 5,
5 4.7 1.3÷3.1
3 and 1. For instance, if the source image has an initial
1440×1920 3 2.8 1.5÷3.1
resolution of 960×1280 pixels, by downsampling with 1 1.9 2.2÷3.1
factor 3, this will lead to three lower resolution images 5 2.0 0.3÷2.3
having the following size: 960×1280 pixels, 720×960 110×110 960×1280 3 1.3 0.5÷2.3
pixels and 480×640 pixels. Then the template is 1 0.9 1.4÷2.3
compared against each such scaled image source 5 0.4 0.3÷1.0
version. 480×640 3 0.3 0.4÷1.0
1 0.2 0.6÷1.0
Another considered aspect it to analyze the way the 5 4.2 1.6÷4.0
internal module parameters are correlated to the size of the 1440×1920 3 2.5 1.7÷4.0
traffic signs and image source resolution. 1 1.6 2.8÷4.0
5 1.8 0.8÷3.0
To have an overall evaluation we need to take into account
76×76 960×1280 3 1.1 1.2÷3.0
several processing time at each step, starting to the image
1 0.8 2.1÷3.0
acquisition, module communication and ending with the
5 0.5 0.5÷1.5
trigger action to emit the acoustic alert. 480×640 3 0.3 0.5÷1.5
Once a traffic sign is identified, an acoustic message is 1 0.2 0.5÷1.5
send to the user via headphones. In its simplest form, the audio 5 3.2 1.0÷5.5
message is approximately 1.2÷1.5s length. This time interval 1440×1920 3 1.8 2.5÷5.5
1 1.2 4.0÷5.5
is seen here as a reference.
5 1.6 0.5÷4.0
IV. TESTING THE VISION MODULE 50×50 960×1280 3 1.0 1.0÷4.0
1 0.6 2.5÷4.0
To assess the performance of the proposed module for 5 0.4 0.3÷2.0
traffic sign recognition, the following objectives were 480×640 3 0.3 0.5÷2.0
evaluated: the processing time for template matching, and the 1 0.2 1.0÷2.0
minimum and maximum distance detection range subject to 5 3.3 2.5÷>5.7
an accuracy threshold of 90 %. More precisely, we only 1440×1920 3 2.0 3.5÷>5.7
considered the distance range where the module was able to 1 1.3 >5.7
recognize the traffic signs with an accuracy exceeding this 5 1.4 1.1÷5.7
matching threshold. 32×32 960×1280 3 0.8 3.0÷5.7
1 0.5 4.0÷5.7
The detection threshold for the correlation value stays the 5 0.4 0.5÷2.5
same for each template and for each scaled source image 480×640 3 0.2 0.8÷2.5
version. 1 0.1 2.0÷2.5
211
The following discussions can be drawn: V. CONCLUSIONS
• The maximum distance increases proportionally to an A vision based guidance module has been proposed to help
increase in the source image size (in pixels). visually impaired people. The vision module is able to detect
and recognize traffic signs with high accuracy and at a
• The maximum distance increases proportionally to a reasonable distance detection range. The application is
decrease in the template image size (in pixels). developed with a Raspberry PI3 Model B+ platform.
• The minimum distance shrinks with the scaling factor. The experiments inducted that the template matching
In Table I, the red values indicates the unacceptable values functions form the OpenCV library can be successfully
for a real life scenario, while the blue values are reasonable. applied for multiscale approach with great results for both
The values marked with green are the optimum values as they indoor and outdoor cases. The optimum values for the
satisfy both low processing time and reasonable distance parameters (time and detection range) have been found, values
detection constraints. that depend on the scaling factors and image dimension.
B. Outdoor Experimental Results The accuracy of the vision module could be improved by
considering the parameter adaptation to the changing
To validate the performance of the vision module for real- illumination conditions for real life environment.
life cases we ran experiments for outdoor conditions taking
samples on the street.
Figure 3 shows the template for outdoor experiments with REFERENCES
the following dimensions: 110×110 pixels, 76×76 pixels, [1] M.P. Arakeri, N.S. Keerthana, M. Madhura, A. Sankar, T. Munnavar,
50×50 pixels and 32×32 pixels. “Assistive Technology for the Visually Impaired Using Computer
Vision”, International Conference on Advances in Computing,
The real signs have the geometrical dimensions of 60 cm Communications and Informatics (ICACCI), Bangalore, India, pp.
x 60 cm. Therefore, here, for outdoor testing, the maximum 1725-1730, sept. 2018.
distances are 50 % longer than the distances associated to the [2] E.A. Hassan, T.B. Tang, “Smart Glasses for the Visually Impaired
indoor cases. People”, 15th International Conference on Computers Helping People
with Special Needs (ICCHP), pp. 579-582, Jul. 2016, Linz, Austria.
[3] R. Ani, E. Maria, J.J. Joyce, V. Sakkaravarthy, M.A. Raja, “Smart
Specs: Voice Assisted Text Reading system for Visually Impaired
Persons Using TTS Method”, IEEE International Conference on
Innovations in Green Energy and Healthcare Technologies (IGEHT),
Coimbatore, India, Mar. 2017.
[4] V. Tiponuţ, D. Ianchis, Z. Haraszy, “Assisted Movement of Visually
Impaired in Outdoor Environments”, Proceedings of the WSEAS
Fig. 3. Template images used in outdoor testing. International Conference on Systems, Rodos, Greece, pp.386-391,
2009.
Table III tabulates the results for the outdoor testing for [5] M. Trent, A. Abdelgawad, K. Yelamarthi, “A Smart Wearable
Navigation System for Visually Impaired”, 2nd EAI international
various template and source size. Conference on Smart Objects and Technologies for Social Good
(GOODTECHS), pp. 333-341, Dec. 2016, Venice, Italy.
TABLE II. OUTDOOR EXPERIMENTAL RESULTS [6] Jae Sung Cha, Dong Kyun Lim and Yong-Nyuo Shin, “Design and
Size of Detection Implementation of a Voice Based Navigation for Visually Impaired
Size of Persons”, International Journal of Bio-Science and Bio-Technology,
source Processing distance
template Scale factor Vol. 5, No. 3, pp.61-68, June 2013.
image time [s] range [m]
[pixels] [7] L. Ţepelea, V. Tiponuţ, P. Szolgay, A. Gacsádi, “Multicore Portable
[pixels]
5 1.6 1.5÷6.5 System for Assisting Visually Impaired People”, 14th International
Workshop on Cellular Nanoscale Networks and their Applications, pp.
50x50 960x1280 3 1.0 2.0÷6.5 1-2, University of Notre Dame, USA, July 29-31, 2014.
1 0.6 2.5÷6.5
[8] S. Khade, Y.H. Dandawate, “Hardware Implementation of Obstacle
5 1.4 1.8÷9.0 Detection for Assisting Visually Impaired People in an Unfamiliar
960x1280 3 0.8 4.0÷9.0 Environment by Using Raspberry Pi”, Smart Trends In Information
1 0.5 4.5÷9.0 Technology And Computer Communications, SMARTCOM 2016,
32x32
5 0.4 1.5÷5.0 vol. 628, pp. 889-895, Jaipur, India, aug. 2016.
480x640 3 0.2 1.8÷5.0 [9] R. C. Gonzalez, R. E. Woods and S. L. Eddins, “Digital Image
1 0.1 2.5÷5.0 Processing using MATLAB”, Pearson Education, 2004.
[10] L. Ţepelea, A. Gacsádi, I. Gavriluţ, V. Tiponuţ, “A CNN Based
Correlation Algorithm to Assist Visually Impaired Persons”, IEEE
The outdoor results partially confirmed the parameter Proceedings of the International Symposium on Signals Circuits and
obtained for indoor results. They are both aligned, taking into Systems (ISSCS 2011), pp.169-172, July 2011, Iasi, Romania.
account that the template matching method is sensitive to the
outdoor uncontrolled illumination conditions.
212