Fusion of Stereo and Optical Flow Data Using Occup
Fusion of Stereo and Optical Flow Data Using Occup
net/publication/224650875
CITATIONS READS
22 274
5 authors, including:
All content following this page was uploaded by Cedric Pradalier on 05 March 2014.
Abstract— In this paper, we propose a real-time method to methods have a large computational cost as several successive
detect obstacles using theoretical models of the ground plane, calculations (stereo, optical flow, ego-motion, ...) are required.
first in a 3D point cloud given by a stereo camera, and then in In this paper, we demonstrate that without knowing the
an optical flow field given by one of the stereo pair’s camera.
The idea of our method is to combine two partial occupancy motion of the camera, we can model the motion of the ground
grids from both sensor modalities with an occupancy grid plane and determine the location of the obstacles. Moreover
framework. The two methods do not have the same range, we show that the stereo data improves the accuracy for short
precision and resolution. For example, the stereo method is range obstacle detection.
precise for close objects but cannot see further than 7 m (with One key point in this method is that we do not compute
our lenses), while the optical flow method can see considerably
further but has lower accuracy. explicitly the optical flow of the image at any time. Optical
Experiments that have been carried on the CyCab mobile flow computation is very expensive in terms of CPU time,
robot and on a tractor demonstrate that we can combine the is inaccurate and sensitive to noise. In general we can see
advantages of both algorithms to build local occupancy grids in the survey led by Barron et al. ( [1]) that the accuracy
from incomplete data (optical flow from a monocular camera of optical flow computation is linked to the computational
cannot give depth information without time integration).
cost. We model the expected optical flow (which is easy and
I. I NTRODUCTION quick to compute) to get rid of the inherent noise and the time
consuming optical flow step. As a consequence we are able to
This work takes place in the general context of mobile demonstrate robust and real-time obstacle detection.
robots navigating in open and dynamic environments. Com-
puter vision for ITS (Intelligent Transport Systems) is an active II. M ODEL - BASED OBSTACLE DETECTION ( STEREO AND
research area [6]. One of the key issues of ITS is the ability OPTICAL FLOW )
to avoid obstacles. This requires a method to perceive them. The two algorithms we will use in the next two parts are
In this article we address the problem of obstacle sensing model-based approaches. In a first step, we model our expected
through their motion (optical flow) in an image sequence. The observation in the sensor space (in this article we focus on the
perceived motion can be caused either by the obstacle itself ground plane model).
or by the motion of the camera (which is the motion of the Let n ∈ N be the dimension of the observation space and
robot in the case of a camera fixed on it). We also use a stereo m ∈ N a number of parameters. The model Fn,m,p with m
camera to improve the range (for short sight) of the resulting parameters is a function (Rn × Rm ) → Rp and is defined by
sensor. the relation:
Many methods have been developed to find moving objects Fn,m,p (Z, P ) = 0, P ∈ Rm and Z ∈ Rn
in an image sequence. Most of them use a fixed camera and
use background subtraction (for example in [13] and [5]). Given a model Fn,m,p of our observations we try to extract
Recently, in [15], a new approach to obstacle avoidance has the parameter set P ∈ Rm from a set of observed data
been developed, based on ground detection by finding planes (Zi )i=1···k by minimising the error in the system:
in images. The weak point of this method is that the robot
Fn,m,p (Zi , P ) = 0, ∀i ∈ {1 · · · k}
must be in a static environment.
Model based approaches using ego-motion have been This minimisation can be done by performing a Least Mean
demonstrated in [9], [14]. The first one detects the ground Squares (LMS) minimisation. However, for better outlier rejec-
plane by virtually rotating the camera and visually estimating tion, here we use a Least Median Squares technique with the
the ego-motion. The second one uses dense stereo and optical minimisation performed by the Nelder-Mead Simplex search
flow to find moving objects and robot ego-motion. These two [12].
Fig. 2. Example of an occupancy grid generated from the stereo data and
(d) plane fitting process. The viewpoint is from the bottom centre of the image.
Fig. 1. (a) and (b) are respectively the left and right images from a synthetic
image sequence (c) is the disparity map computed with SVS software, and
(d) is the corresponding stereo point cloud, also shown is the plane fitted to
the data. the distance of each point from the ground plane using the
following equation:
1241
⎛ ⎞ ⎛ ⎞
(p1 − fu ) + p2 u + p3 v + p4 u2 + p5 uv u̇ u
G4,8,2 (Z, P ) = ⎝ v̇ ⎠ = ḢH−1 ⎝ v ⎠ (2)
(p6 − fv ) + p7 u + p8 v + p4 uv + p5 v 2
ẇ Image 1 Image
Using this method for optical flow requires a good accuracy
on the ground plane to evaluate the parameters of the ground Now we can obtain the optical flow vector f for the pixel
plane. Indeed, the part of the flow field we want to model is the at Euclidean coordinates (u, v) by the formula:
ground plane. Therefore we need a good accuracy on its optical
flow. Moreover we want to respect our real-time constraint. u̇ − uẇ
f (u, v) = (3)
Thus we need a method that perform accurate optical flow v̇ − v ẇ
computation in real-time.
Finally from equations (2) and (3) we can express the
We studied all the characteristics of various optical flow
theoretical optical flow vector for each pixel in the image (with
method described in [1] but no method was really appropriate
the assumption that each pixel is in the ground plane).
(either inaccurate or slow). We used brand new optical flow
The homography matrix and its derivative are evaluated
computation methods developed by Bruhn, Weickert et al.
using the position of the camera (cx , cy , cz ), its orientation
(in [2], [3], [4], ...). They are the most accurate real-time
φ, the motion of the camera (given by the odometry of the
method we found. They give good information on uniform
robot) v (linear velocity) and ω (angular velocity).
surfaces (they use a global constraint like in Horn and Schunck
technique [8] to compute the flow field on uniform surfaces). ⎛ ⎞
h1,1 h1,2 h1,3
Even with state-of-the-art techniques, the optical flow we H = ⎝ h2,1 h2,2 h2,3 ⎠
compute on the ground plane is inaccurate. Indeed, the ground h3,1 h3,2 h3,3
plane is often poorly-textured (asphalt on the road) and is a
large part of the image. We can see on figure 3 that the optical with:
flow of the ground plane is inaccurate.
h1,1 = u0 cos φ
h1,2 = −αu
h1,3 = αu + u0 (− cos φ + cz sin φ)
h2,1 = −αu sin φ + v0 cos φ
h2,2 = 0
h2,3 = αv (sin φ + cz cos φ) + v0 (− cos φ + cz sin φ)
h3,1 = cos φ
h3,2 = 0
h3,3 = − cos φ + cz sin φ
(a) (b)
Fig. 3. Figure (a) is an image from a video sequence where the camera is We have also:
translating and the pedestrian is moving in front of the robot. Figure (b) is ⎛ ⎞
the corresponding optical flow computed with [3]. Note the incorrect optical ḣ1,1 ḣ1,2 ḣ1,3
flow vectors on the ground plane.
Ḣ = ⎝ ḣ2,1 ḣ2,2 ḣ2,3 ⎠
ḣ3,1 ḣ3,2 ḣ3,3
1242
Fig. 6. Camera model, the shape in the image is projected on the ground
Fig. 4. Example of theoretical optical flow field for a moving robot with a plane. The dashed pyramid corresponds to the potentially occupied space.
velocity of 2 m.s−1 and a rotation speed of 0.5 rad.s−1
Once theoretical optical flow field is computed, we can grids. This model is based on the projective description of
match it to the observed data. We use two consecutive images camera sensors. Using a particular sensor configuration we
and try to match one pixel in the previous image to the will be able to extract a 2D model of occupancy grid for a
corresponding theoretical pixel in the current image. The monocular camera.
matching is done by computing an SSD (Sum of Squared
A. 3D camera sensor model
Differences) measure. An example of the SSD matching can
be seen on figure 5 It is difficult to deal with cameras in an occupancy grid
framework. Indeed, one pixel of the image can correspond to
an infinite set of 3D world points. This set of points is known
as projective line.
The set of projective lines corresponding to all the image
pixels is a pyramid (dimension 3). Therefore we need to
express the occupancy grids in a 3D space.
Figure 6 shows the camera model we use. The shape on
the image can correspond to the dashed pyramid. Saying that
(a) (b) a pixel in the image belongs to an obstacle, means that the
projective line is potentially occupied.
B. 2D projected model
The occupancy grid framework is very expensive when it
comes to a 3D space. The idea is to project the 3D occupancy
grid on the ground plane. This projection respects the seman-
tics of occupancy grids and translates the incompleteness of
the monocular camera into uncertainty.
3D
The projection of a 3D occupancy grid Ci,j,k into a 2D
occupancy grid Ci,j is defined in our work as:
(c) 3D 1
Ci,j = max pk × Ci,j,k + (1 − pk ) (4)
Fig. 5. Result of the optical flow matching. Image (a) and (b) are two k 2
consecutive frame of a video sequence. Subfigure (c) is the result of the SSD
matching. The dashed area is the one where the model does not apply. The In equation (4) we use the maximum operator to have an
lighter the area is, the better the match is occupancy grid as safe as possible. Indeed, we impose that
the probability for a cell to be occupied is the maximum of
the probabilities for all the 3D cells on top of the 2D ground
cell, which means that we do not risk saying that a 2D cell is
III. V ISION - ORIENTED DATA FUSION occupied if a 3D cell over it is occupied.
In this section we propose a method to cope with the 3D The term pk is a priori knowledge. It gives a priori
occupancy grid memory and its fusion with 2D occupancy knowledge on the vertical distribution of the obstacles. The
1243
value 1 means a strong confidence and the value 0 means that bottom, this is because the sensor model we used gives more
no obstacle can be at the given height. probability to the obstacles close to the ground.
We imposed the following function for pk (the function is
represented in figure 7): IV. C AMERA MODALITIES FUSION
To fuse all the sensor modalities we use an occupancy grid
⎧
⎨ 1 if k ≤ z0 framework [7], [10], to express the fusion in formal proba-
pk = z−z0 3 z−z0 2 bilistic terms. The probability for cell (i, j), to be occupied
2 −3 + 1 if k ∈ ]z0 , z0 + Δz]
⎩ Δz Δz
given the sensor observations (Zk )k=1···n for this cell can be
0 elsewhere
(5) written as:
n
P (Occi,j )
P (Occi,j | Z1 · · · Zn ) = P (Zk | Occi,j )
P (Z1 · · · Zn )
k=1
n
P (Occi,j | Z1 = z1 · · · Zn = zn ) ∝ P (Zk = zk | Occi,j )
k=1
P (Zk = zk | Occi,j ) = P (H) P (Zk = zk | Occi,j , H)
Fig. 7. Graph of the function pk defined in equation (5) H
V. E XPERIMENTAL RESULT
(a) On figure 9 we can see the result of the whole process.
Subfigure (a) is the current frame where we analyse the
results of our algorithm. Subfigure (b) is the result of the
optical flow detection algorithm. We can clearly see the
pedestrian moving in front of the camera. We can also see
the blue car in the back. The relative importance of the car
and the pedestrian is due to the distance between the camera
and them. The closer to the camera the objects are, the bigger
their optical flow is. In a future work we could improve this
point by exploring the possibility of normalizing the SSD by
the optical flow. This would result in a better ratio between
(b) (c)
far and close obstacles.
Fig. 8. (a) is a square on the image, (b) is its basic projection on the ground
plane, (c) is the 3D model projected on the ground plane.
1244
sensors (other cameras, laser range finders, ...) and/or more
camera modalities (colour segmentation, ...).
The next step will be to perform time integration to remove
some ambiguities (especially ambiguities related to monocular
camera algorithms).
At least we could integrate our algorithms in a whole SLAM
process.
(a) (b)
VII. ACKNOWLEDGEMENTS
This work was done in the context of a cooperation between
the CSIRO ICT Centre of Brisbane (Australia) and the INRIA
Rhône-Alpes in Grenoble (France).
R EFERENCES
[1] J.L. Barron, D.J. Fleet, and S.S. Beauchemin. Performance of optical
flow techniques. In IJCV, volume 12, pages 43–77, 1994.
[2] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical
flow estimation based on a theory for warping. Proc. 8th European
Conference on Computer Vision, 3024:25–36, may 2004.
[3] A. Bruhn, J. Weickert, C. Feddern, T. Kohlberger, and C. Schnrr.
(c) (d) Variational optical flow computation in real-time. IEEE Transactions
on Image Processing, 14/5:608–615, 2005.
[4] A. Bruhn, J. Weickert, and C. Schnrr. Lucas/kanade meets horn/schunck:
Combining local and global optic flow methods. International Journal
of Computer Vision, 61/3:211–231, 2005.
[5] R. Collins, A. Lipton, H. Fujiyoshi, and T. Kanade. Algorithms
for cooperative multisensor surveillance. Proceedings of the IEEE,
89(10):1456 – 1477, October 2001.
[6] E.D. Dicksmanns. The development of machine vision for road vehicles
in the last decade. IEEE Intelligent Vehicles Symposium, 2002.
[7] A. Elfes. Using occupancy grids for mobile robot perception and
navigation. Computer, 22(6):46–57, 1989.
[8] B.K.P. Horn and B.G. Schunck. Determining optical flow. In artificial
intelligence, volume 17, pages 185–203, 1981.
(e) (f) [9] Q. Ke and T. Kanade. Transforming camera geometry to a virtual
downward-looking camera: robust ego-motion estimation and ground
Fig. 9. (a) is the left image of the stereo camera, (b) is the detected obstacle layer detection. Proc. of IEEE International Conference on Computer
from optical flow, (c) is the occupancy grid generated from the 3D point cloud, Vision and Pattern Recognition, 2003.
(d) is the projection of (b) on the ground plane, (e) is the improved model [10] K. Konolige. Improved occupancy grids for map building. Autonomous
presented in III-B and (f) is the final occupancy grid which is the fusion of Robots, 4:351–367, 1997.
the two occupancy grids (c) and (e) [11] H.C. Longuet-Higgins. The visual ambiguity of a moving plane. In
Royal Society London, London, Great Britain, 1984.
[12] J. A. Nelder and R. Mead. A simplex method for function minimization.
Comput. J. 7, pages 308–313, 1965.
Subfigure (d) is the naive projection of subfigure (b), and [13] C. Stauffer and W.E.L. Grimson. Adaptative background mixture
subfigure (e) is the projection we defined in III-B. models for real-time tracking. Proc. of the International Conference
on Computer Vision and Pattern Recognition, January 1998.
Finally, subfigure (f) shows the global result of the fusion. [14] A. Talukder and L. Matthies. Real-time detection of moving objects
We used the same confidence for both algorithms. We can from moving vehicle using dense stereo and optical flow. Proc. of the
see that the area where the stereo does not provide a dense International Conference on Intelligent Robots and Systems, October
2004.
information are supplemented by the optical flow algorithm. [15] K. Young-Geun and K. Hakil. Layered ground floor detection fo vision-
The area where the pedestrian is, is also reinforced. The false based mobile robot navigation. In International Conference on Robotics
detection on the top right of the grid is minimised. After the and Automation, pages 13–18, New Orleans, april 2004.
fusion step, this false detection has a probability which means
the occupancy is unknown. The cells in front of the obstacle
are a little degraded.
VI. C ONCLUSION AND FUTURE WORK
In this paper, we proposed a real-time method to detect
obstacles using theoretical models of the ground plane using
the 3D point cloud given by a stereo camera, and an optical
flow field given by one of the stereo pair’s camera.
The performance of the global process is better than the
stereo detection or the optical flow detection alone. We could
improve the quality of the occupancy grid by adding more
1245