Real-Time Obstacle Detection System in Indoor Envi
Real-Time Obstacle Detection System in Indoor Envi
Journal of Sensors
Volume 2016, Article ID 3754918, 13 pages
http://dx.doi.org/10.1155/2016/3754918
Research Article
Real-Time Obstacle Detection System in Indoor Environment for
the Visually Impaired Using Microsoft Kinect Sensor
Copyright © 2016 Huy-Hieu Pham et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
Any mobility aid for the visually impaired people should be able to accurately detect and warn about nearly obstacles. In this paper,
we present a method for support system to detect obstacle in indoor environment based on Kinect sensor and 3D-image processing.
Color-Depth data of the scene in front of the user is collected using the Kinect with the support of the standard framework for 3D
sensing OpenNI and processed by PCL library to extract accurate 3D information of the obstacles. The experiments have been
performed with the dataset in multiple indoor scenarios and in different lighting conditions. Results showed that our system is
able to accurately detect the four types of obstacle: walls, doors, stairs, and a residual class that covers loose obstacles on the floor.
Precisely, walls and loose obstacles on the floor are detected in practically all cases, whereas doors are detected in 90.69% out of 43
positive image samples. For the step detection, we have correctly detected the upstairs in 97.33% out of 75 positive images while the
correct rate of downstairs detection is lower with 89.47% from 38 positive images. Our method further allows the computation of
the distance between the user and the obstacles.
1. Introduction has identified more than 140 products, systems, and assistive
devices while providing details on 21 commercially available
In 2014, the World Health Organization estimated that 285 systems [3]. A large number of these systems are based on
million people were visually impaired in the world: 39 million the Global Position System (GPS) that unfortunately prevent
are blind and 246 million have low vision [1]. Furthermore, them to be effectively and efficiently employed in indoor
about 90% of the world’s visually impaired live in low-income environment. Indeed, these systems are not able to provide
settings and 82% of people living with blindness are aged 50 local information on the obstacles that are encountered due
and above. Generally, these individuals are facing important to the inaccurate nature and the susceptibility to loss of the
difficulties with independent mobility that relates to sensing GPS signal. Other types of mobility and navigational aids
the near-field environment, including obstacles and potential are based on the sonar to provide information about the
paths in the vicinity, for the purpose of moving through surroundings by means of auditory cues [4–6]. They use
it [2]. The recent advances of computer science now allow short pulses of ultrasound to detect objects but there are
the development of innovative solutions to assist visually some disadvantages with this. Different surfaces differ in how
impaired people. Various types of assistive devices have well they reflect ultrasound and ultrasonic aids are subject to
been developed to provide blind users means of learning or interference from sources of ultrasound. Finally, another type
getting to know the environment. A recent literature review of assistive devices for blind and visually impaired people has
of existing electronic aids for visually impaired individuals been developed based on the stereo vision technique like [7].
2 Journal of Sensors
With the advances of computer vision algorithms, intel- able to detect and recognize human activities [17, 18]. Other
ligent vision systems have received a growing interest. The researchers have developed the methods for detecting falls
computer vision-based assistive technology for the visually in the homes of older adults using the Microsoft Kinect. For
impaired people has been studied and developed extensively. instance, Mundher and Jiaofei [19] have developed a real-time
These systems can improve the mobility of a person who has fall detection system using mobile robot and Kinect sensor.
an impaired vision by reducing risks and avoiding dangers. The Kinect sensor is used to introduce a mobile robot system
As imaging techniques advance, such as RGB-D cameras of to follow a person and detect when the target person has
Microsoft Kinect [8] and ASUS Xtion Pro Live [9], it has fallen. This system can also send an SMS message notification
become practical to capture RGB sequences as well as depth and make an emergency call when a fall is detected. Stone and
maps in real time. Depth maps are able to provide addi- Skubic [20] have also presented a method for detecting falls in
tional information of object shape and distance compared to the homes of older adults using an environmentally mounted
traditional RGB cameras. Some existing systems use RGB- depth-imaging sensor. RGB-D sensor based assistive technol-
D camera and translate visual images into corresponding ogy can improve the mobility of blind and visually impaired
sounds through stereo headphones [10, 11]. However, these people to travel independently. Numerous electronic mobility
systems can distract the blind user’s hearing sense that could or navigation assistant devices have been developed based on
limit their efficient use in daily life. converting RGB-D information into an audible signal or into
In this paper, we present a Microsoft Kinect-based tactile stimuli for the visually impaired persons. For instance,
method specifically dedicated to the detection of obstacles Khan et al. [21] have developed a real-time human and
in indoor environment based on 3D image processing with obstacle detection system for a blind or visually impaired user
color-depth information (Figure 5). Precisely, our system using a Xtion Pro Live RGB-D sensor. The prototype system
was designed to obtain reliable and accurate data from the includes a Xtion Pro live sensor, a laptop for processing and
surrounding environment and to detect and warn about near transducing the data, and a set of headphones for providing
obstacles such as walls, doors, stairs, and undefined obstacles feedback to the user. Tang et al. [22] presented an RGB-D
on the floor with the ultimate goal in order to ultimately sensor based computer vision device to improve the perfor-
assist visually impaired people in their mobility. This paper, mance of visual prostheses. First, a patch-based method is
indeed, is a part of our long-term research on low vision employed to generate a dense depth map with region-based
assistance devices. Inside, the main objective is to design and representations. The patch-based method generates both a
evaluate a complete prototype of an assistive device which surface-based RGB and depth (RGB-D) segmentation instead
can help the visually impaired in their mobility (Figure 2). of just 3D point clouds. Therefore, it carries more meaningful
To achieve this goal, we rely on various themes explored information and it is easier to convey the information to
in literature, including obstacle detection using computer the visually impaired person. Then, they applied a smart
vision, embedded system design, and sensory substitution sampling method to transduce the important/highlighted
technology. The main novelty of our work is unifying into a information and/or remove background information, before
single prototype and this paper is result of image processing presenting it to visually impaired people. Lee and Medioni
module. [23] have conceived a wearable navigation aid for the visually
impaired, which includes an RGB-D camera and a tactile vest
2. Related Work interface device. Park and Howard [24] presented a real-time
haptic telepresence robotic system for the visually impaired
In the last decades, obstacle detection has received a great
to reach specific objects using an RGB-D sensor. In addition,
interest. Interestingly, the majority of the existing systems
have been developed for mobile robots [12, 13]. In this section, Tamjidi et al. [25] developed a smart cane with SR4000 3D
we will only focus on the works related to assistive technology camera for camera’s pose estimation and obstacle detection
to help visually impaired people. Wearable systems have in an indoor environment. More recently, Yus et al. [26] have
been developed based on various technologies such as laser, proposed a new stair detection and modelling model that
sonar, or stereo camera vision for environment sensing and provides information about the location, orientation, and the
using audio or tactile stimuli for user feedback. For instance, number of steps of the staircase. Aladren et al. [27] have also
Benjamin et al. [14] have developed a laser cane for the developed a robust system for visually impaired people based
blind called C-5 Laser Cane. This device is based on optical on visual and range information. This system is able to detect
triangulation to detect obstacles up to a range of 3.5 m ahead. and classify the main structural elements of the scene.
It requires environment scanning and provides information
on one nearest obstacle at a time by means of acoustic 3. The Proposed System
feedback. Molton et al. [15] have used a stereo-based system
for the detection of the ground plane and the obstacles. 3.1. Overview of the Proposed System. Figure 1 illustrates the
With the RGB-D sensor-based computer vision technolo- overall structure of our proposed system. The proposed
gies, the scientists are finding incredible uses for these devices system uses a personal computer (PC) for processing color-
that have already led to advances in the medical field. For depth images captured from a RGB-D camera. An obstacle
instance, Costa et al. [16] used the low-cost RGB-D sensors to detection method aims to define the presence of obstacles
reconstruct human body. Other systems based on the RGB- and to warn the visually impaired users by using the feedback
D devices (e.g., Microsoft Kinect or ASUS Xtion Pro) are devices such as auditory, tactile, and vibration.
Journal of Sensors 3
Data
processing
Audio feedback
or
RGB-D and Obstacle information
RGB-D camera Obstacle detection Sensory
accelerometer data substitution device
or
Vibration
Backpack
Tactile-visual
substitution
Mobile Kinect
Figure 2: Prototype of an obstacle detection and warning for visually impaired people.
In this paper, we focus on obstacle detection using The process is divided into five consecutive steps of which
information coming from RGB-D cameras. To warn the the first one is dedicated to the acquisition data. In this step,
visually impaired users, we use a sensory substitution device color-depth and accelerometer data of the scene in front of
called Tongue Display Unit (TDU) [32]. In [33], we have the user are acquired using a RGB-D camera. These data
presented a complete system for obstacle detection and are then used to reconstruct point cloud in the second step.
warning for visually impaired people based on electrode The third step subsequently filters the obtained point cloud
matrix and mobile Kinect. However, this system detects only which is fed to the segmentation step. The main goal of the
loose obstacle on the floor. We extend this work by proposing segmentation step is to identity the floor plane. Finally, in the
new approach for detecting different types of obstacles for obstacles detection step, we can identify the types of obstacle
visually impaired people. based on their characteristics.
3.2. User Requirements Analysis. In order to define the Step 1 (acquisition data). Various types of RGB-D camera
obstacles, we have done a survey with ten blind students in such as Microsoft Kinect and ASUS Xtion Pro can be used
Nguyen Dinh Chieu school in Vietnam. The results of the in our work. However, in this work, we use Kinect sensor
preliminary study indicated that there are many obstacles in (Figure 5). Kinect is a low-cost 3D camera that is able to work
an indoor environment such as moving objects, walls, doors, with various hardware models. It is also supported by various
stairs, pillar, rush bins, and flower pots that blind students framework and drivers. It should be noted, however, that the
have to avoid. In this study, we have defined four frequent fundamental part of our system does not need to be changed
types of obstacle that the blind students face in typical indoor if we want to use another type of RGB-D cameras in the
environments of their school: (1) doors, (2) stairs, (3) walls, future.
and (4) a residual class that covers loose obstacles (see some RGB-D camera captures both RGB images and depth
examples in Figure 3). maps at a resolution of 640 × 480 pixels with 30 frames
per second. The effective depth range of the Kinect RGB-D
3.3. Obstacles Detection. In this section, we provide camera is from 0.4 to 3.5 m. The Kinect color stream supports
the obstacle detection process as illustrated in Figure 4. a speed of 30 frames per second (fps) at a resolution of
4 Journal of Sensors
Filtering
Voxel grid Pass Through
Step 3
Obstacles
detection Stair detection Wall detection Loose obstacles
Door detection
Step 5 detection
640 × 480 pixels [34]. Figure 6 shows an illustration of the Step 2 (reconstruction). Color and depth are combined to
viewable range of the Kinect camera. create a Point Cloud; it is a data structure used to represent
In this step, in order to capture color and depth infor- a collection of multidimensional points and is commonly
mation of the scene, we use the standard framework for 3D used to represent three-dimensional data. In a 3D Point
sensing OpenNI. Cloud, the points usually represent the 𝑋, 𝑌, and 𝑍 geometric
Journal of Sensors 5
Body
Microphone array
Base
Motor
+27∘
43∘
∘
57
−27∘
Krf y
nz
x
Kinect
T
z
Hrf
y
x
coordinates of an underlying sampled surface. We use the Step 3 (filtering). The execution time of the program depends
parameters provided by Burrus [35] in order to calibrate color on the number of points in Point Cloud. We thus need to
and depth data. Once the Point Cloud is created, it is defined reduce the number of points to ensure that the system can
in the reference system of the Kinect, indicated by 𝐾rf in be able to respond fast enough. We will use Voxel Grid Filter
Figure 7. to downsample the Point Cloud and, then, Pass Though Filter
This represents a disadvantage because we have to deter- will remove all points that are located at a position larger than
mine the location of obstacles in the reference system cen- 75 cm in the 𝑥-axis (see Figure 8).
tered at the user’s feet, indicated by 𝐻rf . Therefore, we apply
the transformations geometry including translation, rotation, Step 4 (segmentation). The next step is Plane Segmentation.
and reflection to bring Point Cloud from the reference Random Sample Consensus (RANSAC) algorithm [36] is
system of the Kinect 𝐾rf to 𝐻rf . The orientation information used for plane detection in Point Cloud data. RANSAC is an
computed from accelerometer data and Point Cloud data has iterative method to estimate the parameters of a model using
been used to perform this task. data that contains outliers. In the present work, RANSAC can
6 Journal of Sensors
Kinect +x
Remove
+75 cm
−75 cm
Remove
−x
𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 + 𝑑 = 0. (1)
choose a set of points which satisfy the equation of the plane:
𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 + 𝑑 = 0, combined with parallel condition Herein, (𝑎, 𝑏, 𝑐) is the normal 𝑛𝑘 of plane 𝑉𝑘 .
between the floor plane and the 𝑥𝑦-plane (see Figure 7).
(5) Determine the angle between 𝑛𝑘 and the normal of
An example of floor image is illustrated in Figure 9 and the
the ground plane. This angle should approximate 90
detected floor plane is shown in Figure 10.
degrees since doors are perpendicular to the floor.
Step 5 (obstacles detection). Consider the following: (6) Determine the dimensions of each plane.
(7) Check for each plane 𝑉𝑘 ∈ 𝑉 if its width satisfies the
(a) Obstacles on the Floor Detection. After performing the conditions. Remove 𝑉𝑘 if this is not the case.
floor detection, Euclidean Cluster Extraction is used to
determine the clusters on the floor plane. Each cluster is a set
of the points in Point Cloud. In a cluster, the distance of each (c) Staircase Detection. A staircase consists of at least 3 equally
point to the other is smaller than a threshold and it presents spaced steps as Figure 13.
for each obstacle [37]. An example of loose obstacle detection The authors of Monash University [38] have developed
is illustrated in Figure 11. an algorithm to detect the steps of a staircase with the high
In addition, some classes in PCL library will help us performance. This algorithm is able to provide the end user
provide the obstacle’s size. Each obstacle can be approximated with the information as the presence of a staircase (both
Journal of Sensors 7
Figure 11: The rush bin is detected with the distance between it and the user.
(a) Some obstacles on the floor (b) The obstacles size after approximation
h1
The value of 𝐻tol can be changed. In our program, we can
N=0 choose 𝐻tol = 1/2 ⋅ 𝐻step to make sure not to miss a step.
Ground plane
Figure 14: Some examples of ground plane detection: (a) color image of the scene; (b) result of ground plane detection.
In order to evaluate the performance of the door algo- Table 1: Performance of the door detection.
rithm, we use the standard measures widely used for classi- Number of Detection time
fication or identification evaluation, namely, Precision [39]. TP FP Precision
images per an image
This is defined as follows:
43 39 4 0.90 0.410 s
TP
Precision = , (2)
TP + FP
where TP and FP denote the number of true positives and
false positives, respectively. The doors are detected in 90,69% of approximately 45 degrees. Figures 20 and 21 show some
out of 43 positive images (Figures 18 and 19). The results are examples in this case.
summarized in Table 1. We further can compare our results In another experiment, we have tested our approach on
with the performance of some other systems (see Table 2). the dataset with 75 images of upstairs. We also used the stan-
With our data, we see that the program also operates dard measures proposed by [39] for evaluating the upstairs
well when the camera is approaching the door at an angle algorithm. The results of this experiment are presented in
10 Journal of Sensors
Table 2: The performance of some other approaches. Table 3: Performance of the upstair detection.
6 steps detected!
3 steps detected! 4 steps detected!
The results show that our approach can be used for The execution time for intermediate processing steps
the low-light environments. This feature can overcome the is negligible (about 0.04 s for the floor segmentation time
limitations of the monocular or stereo vision technique [7, and 0.009 s for the normal estimation time). The detection
8, 16]. time per image (see Tables 2 and 3) includes all the steps
12 Journal of Sensors
0 steps detected!
0 steps detected!
[12] R. Mojtahedzadeh, Robot obstacle avoidance using the kinect [26] A. P. Yus, G. L. Nicolas, and J. J. Guerrero, “Detection and
[M.S. thesis], Royal Institute of Technology, Stockholm, Swe- modelling of staircases using a wearable depth sensor,” in
den, 2011, http://www.researchgate.net/publication/257541372 Computer Vision—ECCV 2014 Workshops: Zurich, Switzerland,
Robot Obstacle Avoidance using the Kinect. September 6-7 and 12, 2014, Proceedings, Part III, vol. 8927 of
[13] B. Peasley and S. Birchfield, “Real-time obstacle detection and Lecture Notes in Computer Science, pp. 449–463, Springer, 2015.
avoidance in the presence of specular surfaces using an active [27] A. Aladren, G. Lopez-Nicolas, L. Puig, and J. J. Guerrero,
3D sensor,” in Proceedings of the IEEE Workshop on Robot Vision “Navigation assistance for the visually impaired using RGB-D
(WoRV ’13), pp. 197–202, Clearwater Beach, Fla, USA, January sensor with range expansion,” IEEE Systems Journal, 2015.
2013. [28] A. C. Murillo, J. Košecká, J. J. Guerrero, and C. Sagüés, “Visual
[14] J. M. Benjamin, N. A. Ali, and A. F. Schepis, “A laser cane for the door detection integrating appearance and shape cues,” Robotics
blind,” Proceedings of the San Diego Biomedical Symposium, vol. and Autonomous Systems, vol. 56, no. 6, pp. 512–521, 2008.
12, pp. 53–57, 1973. [29] R. Munoz-Salinas, E. Aguirre, M. Garcia-Silvente, and A.
[15] N. Molton, S. Se, J. M. Brady, D. Lee, and P. Probert, “A stereo Gonzalez, “Door-detection using computer vision and fuzzy
vision-based aid for the visually impaired,” Image and Vision logic,” in Proceedings of the 6th WSEAS International Conference
Computing, vol. 16, no. 4, pp. 251–263, 1998. on Mathematical Methods & Computational Techniques in
[16] P. Costa, H. Zolfagharnasab, J. P. Monteiro, J. S. Cardoso, Electrical Engineering, December 2004.
and H. P. Oliveira, “3D reconstruction of body parts using [30] Y. Tian, X. Yang, and A. Arditi, “Computer vision-based door
rgb-d sensors: challenges from a biomedical perspective,” in detection for accessibility of unfamiliar environments to blind
Proceedings of the 5th International Conference on 3D Body persons,” Machine Vision and Applications, vol. 24, no. 3, pp.
Scanning Technologies, Lugano, Switzerland, October 2014. 521–535, 2013.
[17] J. Sung, C. Ponce, B. Selman, and A. Saxena, “Unstructured [31] C. Juenemann, A. Corbin, and J. Li, Robust Door Detection,
human activity detection from RGBD images,” in Proceedings of Stanford Department of Electrical Engineering, Stanford Uni-
the IEEE International Conference on Robotics and Automation versity, 2010.
(ICRA ’12), pp. 842–849, IEEE, Saint Paul, Minn, USA, May [32] T. H. Nguyen, T. H. Nguyen, T. L. Le, T. T. H. Tran, N. Vuillerme,
2012. and T. P. Vuong, “A wearable assistive device for the blind using
[18] W. Niu, J. Long, D. Han, and Y. F. Wang, “Human activity tongue-placed electrotactile display: design and verification,”
detection and recognition for video surveillance,” in Proceedings in Proceedings of the 2nd International Conference on Control,
of the IEEE International Conference on Multimedia and Expo Automation and Information Sciences (ICCAIS ’13), pp. 42–47,
(ICME ’04), vol. 1, pp. 719–722, Taipei, Taiwan, June 2004. IEEE, Nha Trang, Vietnam, November 2013.
[19] Z. A. Mundher and Z. Jiaofei, “A real-time fall detection system [33] V. Hoang, T. Nguyen, T. Le, T. H. Tran, T. Vuong, and
in elderly care using mobile robot and kinect sensor,” Interna- N. Vuillerme, “Obstacle detection and warning for visually
tional Journal of Materials, Mechanics and Manufacturing, vol. impaired people based on electrode matrix and mobile Kinect,”
2, no. 2, pp. 133–138, 2014. in Proceedings of the 2nd National Foundation for Science
[20] E. E. Stone and M. Skubic, “Fall detection in homes of older and Technology Development Conference on Information and
adults using the microsoft kinect,” IEEE Journal of Biomedical Computer Science (NICS ’15), pp. 54–59, Ho Chi Minh City,
and Health Informatics, vol. 19, no. 1, pp. 290–301, 2015. Vietnam, September 2015.
[21] A. Khan, F. Moideen, W. Khoo, Z. Zhu, and J. Lopez, “KinDe- [34] A. Jana, Kinect for Windows SDK Programming Guide, Packt
tect: kinect detecting objects,” in Proceedings of the 13th Inter- Publishing, Birmingham, UK, 2012.
national Conference on Computers Helping People with Special [35] N. Burrus, “Calibrating the depth and color camera,” 2014,
Needs, Linz, Austria, July 2012, K. Miesenberger, A. Karshmer, http://nicolas.burrus.name/index.php/Research/KinectCalibra-
P. Penaz, and W. Zagler, Eds., vol. 7383, pp. 588–595, Springer, tion.
Berlin, Germany, 2012.
[36] D. Bernabei, F. Ganovelli, M. Di Benedetto, M. Dellepiane, and
[22] H. Tang, M. Vincent, T. Ro, and Z. Zhu, “From RGB-D to low R. Scopigno, “A low-cost time-critical obstacle avoidance sys-
resolution tactile: smark sampling and early testing,” in Pro- tem for the visually impaired,” in Proceedings of the International
ceedings of the IEEE Workshop on Multimodal and Alternative Conference on Indoor Positioning and Indoor Navigation (IPIN
Perception for Visually Impaired People in Conjunction with IEEE ’11), Guimarães, Portugal, September 2011.
International Conference on Multimedia and Expo Workshops
[37] M. Vlaminck, L. Jovanov, P. V. Hese, B. Goossens, W. Philips,
(MAP4VIP-ICMEW ’13), pp. 1–6, IEEE, San Jose, Calif, USA,
and A. Pizurica, “Obstacle detection for pedestrians with a
July 2013.
visual impairment based on 3D imaging,” in Proceedings of the
[23] Y. H. Lee and G. Medioni, “A RGB-D camera based navigation International Conference on 3D Imaging (IC3D ’13), pp. 1–7,
for the visually impaired,” in Proceedings of the RSS 2011 RGB- IEEE, Liège, Belgium, December 2013.
D: Advanced Reasoning with Depth Camera Workshop, Los
[38] T. J. J. Tang, W. L. D. Lui, and W. H. Li, “Plane-based detection
Angeles, Calif, USA, June 2011.
of staircases using inverse depth,” in Proceedings of the Aus-
[24] C. H. Park and A. M. Howard, “Real-time haptic rendering and tralasian Conference on Robotics and Automation, Wellington,
haptic telepresence robotic system for the visually impaired,” in New Zealand, 2012.
Proceedings of the IEEE World Haptics Conference (WHC ’13),
pp. 229–234, Daejeon, South Korea, April 2013. [39] P. B. Thawari and N. J. Janwe, “Cbir based on color and texture,”
International Journal of Information Technology and Knowledge
[25] A. H. Tamjidi, C. Ye, and S. Hong, “6-DOF pose estimation of a
Management, vol. 4, no. 1, pp. 129–132, 2011.
portable navigation aid for the visually impaired,” in Proceedings
of the IEEE International Symposium on Robotic and Sensors
Environments, pp. 178–183, Washington, DC, USA, October
2013.