0% found this document useful (0 votes)
15 views7 pages

Fusion of Stereo and Optical Flow Data Using Occup

This paper presents a real-time method for detecting obstacles using a combination of stereo and optical flow data within an occupancy grid framework. The proposed approach leverages the strengths of both sensor modalities to enhance obstacle detection accuracy, particularly for short-range objects, while minimizing computational costs associated with optical flow calculations. Experimental results demonstrate the effectiveness of the method in dynamic environments, showcasing its potential for intelligent transportation systems.

Uploaded by

dragonflamez0130
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

Fusion of Stereo and Optical Flow Data Using Occup

This paper presents a real-time method for detecting obstacles using a combination of stereo and optical flow data within an occupancy grid framework. The proposed approach leverages the strengths of both sensor modalities to enhance obstacle detection accuracy, particularly for short-range objects, while minimizing computational costs associated with optical flow calculations. Experimental results demonstrate the effectiveness of the method in dynamic environments, showcasing its potential for intelligent transportation systems.

Uploaded by

dragonflamez0130
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224650875

Fusion of stereo and optical flow data using occupancy grids

Conference Paper · October 2006


DOI: 10.1109/ITSC.2006.1707392 · Source: IEEE Xplore

CITATIONS READS
22 274

5 authors, including:

Kane Usher Cedric Pradalier


METS Ignited Georgia Institute of Technology
35 PUBLICATIONS 809 CITATIONS 175 PUBLICATIONS 3,010 CITATIONS

SEE PROFILE SEE PROFILE

James L. Crowley Christian Laugier


Grenoble Polytechnique Institute, Grenoble, France National Institute for Research in Computer Science and Control
334 PUBLICATIONS 11,982 CITATIONS 342 PUBLICATIONS 9,936 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Cedric Pradalier on 05 March 2014.

The user has requested enhancement of the downloaded file.


Proceedings of the IEEE ITSC 2006 TC7.4
2006 IEEE Intelligent Transportation Systems Conference
Toronto, Canada, September 17-20, 2006

Fusion of stereo and optical flow data using


occupancy grids
Christophe Braillon1 , Kane Usher2 , Cédric Pradalier2 , James L. Crowley1 and Christian Laugier1
1 2
Laboratoire GRAVIR CSIRO ICT Centre
INRIA Rhône-Alpes Autonomous Systems Lab
655 avenue de l’Europe 1 Technology court
38334 Saint Ismier Cedex, France Pullenvale QLD 4069, Australia

Email: fi[email protected] Email: fi[email protected]

Abstract— In this paper, we propose a real-time method to methods have a large computational cost as several successive
detect obstacles using theoretical models of the ground plane, calculations (stereo, optical flow, ego-motion, ...) are required.
first in a 3D point cloud given by a stereo camera, and then in In this paper, we demonstrate that without knowing the
an optical flow field given by one of the stereo pair’s camera.
The idea of our method is to combine two partial occupancy motion of the camera, we can model the motion of the ground
grids from both sensor modalities with an occupancy grid plane and determine the location of the obstacles. Moreover
framework. The two methods do not have the same range, we show that the stereo data improves the accuracy for short
precision and resolution. For example, the stereo method is range obstacle detection.
precise for close objects but cannot see further than 7 m (with One key point in this method is that we do not compute
our lenses), while the optical flow method can see considerably
further but has lower accuracy. explicitly the optical flow of the image at any time. Optical
Experiments that have been carried on the CyCab mobile flow computation is very expensive in terms of CPU time,
robot and on a tractor demonstrate that we can combine the is inaccurate and sensitive to noise. In general we can see
advantages of both algorithms to build local occupancy grids in the survey led by Barron et al. ( [1]) that the accuracy
from incomplete data (optical flow from a monocular camera of optical flow computation is linked to the computational
cannot give depth information without time integration).
cost. We model the expected optical flow (which is easy and
I. I NTRODUCTION quick to compute) to get rid of the inherent noise and the time
consuming optical flow step. As a consequence we are able to
This work takes place in the general context of mobile demonstrate robust and real-time obstacle detection.
robots navigating in open and dynamic environments. Com-
puter vision for ITS (Intelligent Transport Systems) is an active II. M ODEL - BASED OBSTACLE DETECTION ( STEREO AND
research area [6]. One of the key issues of ITS is the ability OPTICAL FLOW )
to avoid obstacles. This requires a method to perceive them. The two algorithms we will use in the next two parts are
In this article we address the problem of obstacle sensing model-based approaches. In a first step, we model our expected
through their motion (optical flow) in an image sequence. The observation in the sensor space (in this article we focus on the
perceived motion can be caused either by the obstacle itself ground plane model).
or by the motion of the camera (which is the motion of the Let n ∈ N be the dimension of the observation space and
robot in the case of a camera fixed on it). We also use a stereo m ∈ N a number of parameters. The model Fn,m,p with m
camera to improve the range (for short sight) of the resulting parameters is a function (Rn × Rm ) → Rp and is defined by
sensor. the relation:
Many methods have been developed to find moving objects Fn,m,p (Z, P ) = 0, P ∈ Rm and Z ∈ Rn
in an image sequence. Most of them use a fixed camera and
use background subtraction (for example in [13] and [5]). Given a model Fn,m,p of our observations we try to extract
Recently, in [15], a new approach to obstacle avoidance has the parameter set P ∈ Rm from a set of observed data
been developed, based on ground detection by finding planes (Zi )i=1···k by minimising the error in the system:
in images. The weak point of this method is that the robot
Fn,m,p (Zi , P ) = 0, ∀i ∈ {1 · · · k}
must be in a static environment.
Model based approaches using ego-motion have been This minimisation can be done by performing a Least Mean
demonstrated in [9], [14]. The first one detects the ground Squares (LMS) minimisation. However, for better outlier rejec-
plane by virtually rotating the camera and visually estimating tion, here we use a Least Median Squares technique with the
the ego-motion. The second one uses dense stereo and optical minimisation performed by the Nelder-Mead Simplex search
flow to find moving objects and robot ego-motion. These two [12].

1-4244-0094-5/06/$20.00 ©2006 IEEE 1240


(a) (b) (c)

Fig. 2. Example of an occupancy grid generated from the stereo data and
(d) plane fitting process. The viewpoint is from the bottom centre of the image.
Fig. 1. (a) and (b) are respectively the left and right images from a synthetic
image sequence (c) is the disparity map computed with SVS software, and
(d) is the corresponding stereo point cloud, also shown is the plane fitted to
the data. the distance of each point from the ground plane using the
following equation:

Once the parameters are retrieved, we can cluster the ei = p1 xi + p2 yi + p3 zi + p4


observation in two sets (observations that match the model
and observations that do not). where the subscript i denotes the ith point in the cloud. If
In the next part, we will use a ground plane model in both this distance exceeds a threshold, then the point contributes
optical flow and 3D world spaces. In that case, the parameter to the evidence that the ground plane cell to which the point
sets will give us the position of the camera (with respect to belongs to contains an obstacle. Otherwise, if the distance is
the ground plane) and its ego-motion. The observation will be below the threshold, the point contributes to the evidence that
clustered in two sets: the ground plane and the obstacles. the cell is free. Figure 2 illustrates an example occupancy
grid generated using this method. In this figure, an ’empty’
A. Stereo obstacle detection cell is represented by white, an ’occupied’ cell is represented
by black, and ’unknown’ cells are represented by grey. The
For obstacle detection using the stereo camera, we take viewpoint is from the bottom of the image. Note the difference
the point cloud data generated from the stereo images, find to, for example, a scanning laser generated occupancy grid,
the dominant plane in the point cloud (which should be the which physically can’t ’see’ behind objects.
ground-plane) using the Least Median Squares method, and
then build an occupancy grid based upon the distance of each B. Optical flow obstacle detection
point from the dominant plane.
The observation space using the point cloud data is R3 . By definition, an optical flow field is a vector field that
Working in Cartesian space, the ground plane can be described describes the velocity of pixels in an image sequence. The
by a set of four parameters (R4 ). The model is then expressed first step of our method is the modelling of the optical flow
as: field for our camera.
F3,4,1 (Z, P ) = p1 x + p2 y + p3 z + p4 This model is based on the classical pinhole camera model,
that is to say, we neglect the distortion due to the lens. We
where Z = (x, y, z) and P = (p1 , p2 , p3 , p4 ). Figure II-A also assume that there is no skew factor. We will see in the
shows an example of the stereo point cloud data together with experimental results that these two assumptions are valid.
the plane fitted to the data. 1) Naive first approach: The parametrisation of our model
Having found the ground plane, we can now estimate can be found in [11], it says that only 8 parameters ( parameter
the camera height, roll and tilt, and compare these to the space is R8 and P = (p1 , p2 , p3 , p4 , p5 , p6 , p7 , p8 )) are needed
’expected’ values (from knowledge of the camera mounting to fully describe the visual motion of a plane. We call Z =
position). We can also populate the occupancy grid using the (u, v, fu , fv ) an observation of an optical flow vector f =
idea that points not on the ground plane (at least within a (fu , fv ) at pixel (u, v). Therefore the observation space is R4 .
tolerance) must belong to an obstacle. That is, we can calculate We can write the new model as follows:

1241
  ⎛ ⎞ ⎛ ⎞
(p1 − fu ) + p2 u + p3 v + p4 u2 + p5 uv u̇ u
G4,8,2 (Z, P ) = ⎝ v̇ ⎠ = ḢH−1 ⎝ v ⎠ (2)
(p6 − fv ) + p7 u + p8 v + p4 uv + p5 v 2
ẇ Image 1 Image
Using this method for optical flow requires a good accuracy
on the ground plane to evaluate the parameters of the ground Now we can obtain the optical flow vector f for the pixel
plane. Indeed, the part of the flow field we want to model is the at Euclidean coordinates (u, v) by the formula:
ground plane. Therefore we need a good accuracy on its optical  
flow. Moreover we want to respect our real-time constraint.  u̇ − uẇ
f (u, v) = (3)
Thus we need a method that perform accurate optical flow v̇ − v ẇ
computation in real-time.
Finally from equations (2) and (3) we can express the
We studied all the characteristics of various optical flow
theoretical optical flow vector for each pixel in the image (with
method described in [1] but no method was really appropriate
the assumption that each pixel is in the ground plane).
(either inaccurate or slow). We used brand new optical flow
The homography matrix and its derivative are evaluated
computation methods developed by Bruhn, Weickert et al.
using the position of the camera (cx , cy , cz ), its orientation
(in [2], [3], [4], ...). They are the most accurate real-time
φ, the motion of the camera (given by the odometry of the
method we found. They give good information on uniform
robot) v (linear velocity) and ω (angular velocity).
surfaces (they use a global constraint like in Horn and Schunck
technique [8] to compute the flow field on uniform surfaces). ⎛ ⎞
h1,1 h1,2 h1,3
Even with state-of-the-art techniques, the optical flow we H = ⎝ h2,1 h2,2 h2,3 ⎠
compute on the ground plane is inaccurate. Indeed, the ground h3,1 h3,2 h3,3
plane is often poorly-textured (asphalt on the road) and is a
large part of the image. We can see on figure 3 that the optical with:
flow of the ground plane is inaccurate.
h1,1 = u0 cos φ
h1,2 = −αu
h1,3 = αu + u0 (− cos φ + cz sin φ)
h2,1 = −αu sin φ + v0 cos φ
h2,2 = 0
h2,3 = αv (sin φ + cz cos φ) + v0 (− cos φ + cz sin φ)
h3,1 = cos φ
h3,2 = 0
h3,3 = − cos φ + cz sin φ
(a) (b)
Fig. 3. Figure (a) is an image from a video sequence where the camera is We have also:
translating and the pedestrian is moving in front of the robot. Figure (b) is ⎛ ⎞
the corresponding optical flow computed with [3]. Note the incorrect optical ḣ1,1 ḣ1,2 ḣ1,3
flow vectors on the ground plane.
Ḣ = ⎝ ḣ2,1 ḣ2,2 ḣ2,3 ⎠
ḣ3,1 ḣ3,2 ḣ3,3

2) Odometry based optical flow model: Having observed with:


that the optical flow computation does not give results good
ḣ1,1 = αu ωt
enough for our optimisation, we proposed a reverse method
ḣ1,2 = u0 ωt cos φ
which tries to match an optical flow model given by the
odometry data to the image. ḣ1,3 = −u0 vt cos φ
To describe our model, we will use the projective geometry ḣ2,1 = 0
formalism. We will call (u, v, w) the homogeneous coordinates ḣ2,2 = (−αv sin φ + v0 cos φ) ωt
of a pixel in the image, H the homography matrix that ḣ2,3 = (αv sin φ − v0 cos φ) vt
projects one point on the ground plane in the image and Ḣ its ḣ3,1 = 0
derivative. ḣ3,2 = ωt cos φ
The projection equation is: ḣ3,3 = −vt cos φ
⎛ ⎞ ⎛ ⎞
u̇ X
⎝ v̇ ⎠ Figure 4 shows the result of our model for a camera at
= Ḣ ⎝ Y ⎠ (1)
position cx = 1.74 m, cy = 0 m, cz = 0.83 m and φ = 0 rad.
ẇ Image 1 Image The model is valid only below the horizon line whose equation
We can then infer the optical flow equation by differentiat- is: y = v0 − αv tan φ. Therefore there is no flow vector above
ing equation 1 and we obtain: the horizon line.

1242
Fig. 6. Camera model, the shape in the image is projected on the ground
Fig. 4. Example of theoretical optical flow field for a moving robot with a plane. The dashed pyramid corresponds to the potentially occupied space.
velocity of 2 m.s−1 and a rotation speed of 0.5 rad.s−1

Once theoretical optical flow field is computed, we can grids. This model is based on the projective description of
match it to the observed data. We use two consecutive images camera sensors. Using a particular sensor configuration we
and try to match one pixel in the previous image to the will be able to extract a 2D model of occupancy grid for a
corresponding theoretical pixel in the current image. The monocular camera.
matching is done by computing an SSD (Sum of Squared
A. 3D camera sensor model
Differences) measure. An example of the SSD matching can
be seen on figure 5 It is difficult to deal with cameras in an occupancy grid
framework. Indeed, one pixel of the image can correspond to
an infinite set of 3D world points. This set of points is known
as projective line.
The set of projective lines corresponding to all the image
pixels is a pyramid (dimension 3). Therefore we need to
express the occupancy grids in a 3D space.
Figure 6 shows the camera model we use. The shape on
the image can correspond to the dashed pyramid. Saying that
(a) (b) a pixel in the image belongs to an obstacle, means that the
projective line is potentially occupied.

B. 2D projected model
The occupancy grid framework is very expensive when it
comes to a 3D space. The idea is to project the 3D occupancy
grid on the ground plane. This projection respects the seman-
tics of occupancy grids and translates the incompleteness of
the monocular camera into uncertainty.
3D
The projection of a 3D occupancy grid Ci,j,k into a 2D
occupancy grid Ci,j is defined in our work as:
 
(c) 3D 1
Ci,j = max pk × Ci,j,k + (1 − pk ) (4)
Fig. 5. Result of the optical flow matching. Image (a) and (b) are two k 2
consecutive frame of a video sequence. Subfigure (c) is the result of the SSD
matching. The dashed area is the one where the model does not apply. The In equation (4) we use the maximum operator to have an
lighter the area is, the better the match is occupancy grid as safe as possible. Indeed, we impose that
the probability for a cell to be occupied is the maximum of
the probabilities for all the 3D cells on top of the 2D ground
cell, which means that we do not risk saying that a 2D cell is
III. V ISION - ORIENTED DATA FUSION occupied if a 3D cell over it is occupied.
In this section we propose a method to cope with the 3D The term pk is a priori knowledge. It gives a priori
occupancy grid memory and its fusion with 2D occupancy knowledge on the vertical distribution of the obstacles. The

1243
value 1 means a strong confidence and the value 0 means that bottom, this is because the sensor model we used gives more
no obstacle can be at the given height. probability to the obstacles close to the ground.
We imposed the following function for pk (the function is
represented in figure 7): IV. C AMERA MODALITIES FUSION
To fuse all the sensor modalities we use an occupancy grid

⎨ 1 if k ≤ z0 framework [7], [10], to express the fusion in formal proba-
pk = z−z0 3 z−z0 2 bilistic terms. The probability for cell (i, j), to be occupied
2 −3 + 1 if k ∈ ]z0 , z0 + Δz]
⎩ Δz Δz
given the sensor observations (Zk )k=1···n for this cell can be
0 elsewhere
(5) written as:

n
P (Occi,j )
P (Occi,j | Z1 · · · Zn ) = P (Zk | Occi,j )
P (Z1 · · · Zn )
k=1

For a given set of observation we can write:

n
P (Occi,j | Z1 = z1 · · · Zn = zn ) ∝ P (Zk = zk | Occi,j )
k=1

To add the concept of sensor confidence, we can improve


the sensor model by taking into account the failure possibility.
We introduce a binary random variable H ∈ {Right, W rong}
that indicates if the sensor failed or not. The new sensor model
is then:


P (Zk = zk | Occi,j ) = P (H) P (Zk = zk | Occi,j , H)
Fig. 7. Graph of the function pk defined in equation (5) H

We consider that in the case of a failure, the sensor model is


a uniform law. Therefore we write the following final sensor
model:

P (Zk = zk | Occi,j ) = αP (Zk = zk | Occi,j , H = Right)


1
+ (1 − α)
2

V. E XPERIMENTAL RESULT
(a) On figure 9 we can see the result of the whole process.
Subfigure (a) is the current frame where we analyse the
results of our algorithm. Subfigure (b) is the result of the
optical flow detection algorithm. We can clearly see the
pedestrian moving in front of the camera. We can also see
the blue car in the back. The relative importance of the car
and the pedestrian is due to the distance between the camera
and them. The closer to the camera the objects are, the bigger
their optical flow is. In a future work we could improve this
point by exploring the possibility of normalizing the SSD by
the optical flow. This would result in a better ratio between
(b) (c)
far and close obstacles.
Fig. 8. (a) is a square on the image, (b) is its basic projection on the ground
plane, (c) is the 3D model projected on the ground plane.

Stereo generated occupancy grid is represented on subfigure


(c). It shows the pedestrian in the middle of the grid. The grid
Figure 8 shows the different steps of the pyramid projection. is not very dense after 3 m but give information until 7 m. We
On subfigure (c) we can see that the shape is fading out at the ca also see a false detection on the top right of the grid.

1244
sensors (other cameras, laser range finders, ...) and/or more
camera modalities (colour segmentation, ...).
The next step will be to perform time integration to remove
some ambiguities (especially ambiguities related to monocular
camera algorithms).
At least we could integrate our algorithms in a whole SLAM
process.
(a) (b)
VII. ACKNOWLEDGEMENTS
This work was done in the context of a cooperation between
the CSIRO ICT Centre of Brisbane (Australia) and the INRIA
Rhône-Alpes in Grenoble (France).
R EFERENCES
[1] J.L. Barron, D.J. Fleet, and S.S. Beauchemin. Performance of optical
flow techniques. In IJCV, volume 12, pages 43–77, 1994.
[2] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical
flow estimation based on a theory for warping. Proc. 8th European
Conference on Computer Vision, 3024:25–36, may 2004.
[3] A. Bruhn, J. Weickert, C. Feddern, T. Kohlberger, and C. Schnrr.
(c) (d) Variational optical flow computation in real-time. IEEE Transactions
on Image Processing, 14/5:608–615, 2005.
[4] A. Bruhn, J. Weickert, and C. Schnrr. Lucas/kanade meets horn/schunck:
Combining local and global optic flow methods. International Journal
of Computer Vision, 61/3:211–231, 2005.
[5] R. Collins, A. Lipton, H. Fujiyoshi, and T. Kanade. Algorithms
for cooperative multisensor surveillance. Proceedings of the IEEE,
89(10):1456 – 1477, October 2001.
[6] E.D. Dicksmanns. The development of machine vision for road vehicles
in the last decade. IEEE Intelligent Vehicles Symposium, 2002.
[7] A. Elfes. Using occupancy grids for mobile robot perception and
navigation. Computer, 22(6):46–57, 1989.
[8] B.K.P. Horn and B.G. Schunck. Determining optical flow. In artificial
intelligence, volume 17, pages 185–203, 1981.
(e) (f) [9] Q. Ke and T. Kanade. Transforming camera geometry to a virtual
downward-looking camera: robust ego-motion estimation and ground
Fig. 9. (a) is the left image of the stereo camera, (b) is the detected obstacle layer detection. Proc. of IEEE International Conference on Computer
from optical flow, (c) is the occupancy grid generated from the 3D point cloud, Vision and Pattern Recognition, 2003.
(d) is the projection of (b) on the ground plane, (e) is the improved model [10] K. Konolige. Improved occupancy grids for map building. Autonomous
presented in III-B and (f) is the final occupancy grid which is the fusion of Robots, 4:351–367, 1997.
the two occupancy grids (c) and (e) [11] H.C. Longuet-Higgins. The visual ambiguity of a moving plane. In
Royal Society London, London, Great Britain, 1984.
[12] J. A. Nelder and R. Mead. A simplex method for function minimization.
Comput. J. 7, pages 308–313, 1965.
Subfigure (d) is the naive projection of subfigure (b), and [13] C. Stauffer and W.E.L. Grimson. Adaptative background mixture
subfigure (e) is the projection we defined in III-B. models for real-time tracking. Proc. of the International Conference
on Computer Vision and Pattern Recognition, January 1998.
Finally, subfigure (f) shows the global result of the fusion. [14] A. Talukder and L. Matthies. Real-time detection of moving objects
We used the same confidence for both algorithms. We can from moving vehicle using dense stereo and optical flow. Proc. of the
see that the area where the stereo does not provide a dense International Conference on Intelligent Robots and Systems, October
2004.
information are supplemented by the optical flow algorithm. [15] K. Young-Geun and K. Hakil. Layered ground floor detection fo vision-
The area where the pedestrian is, is also reinforced. The false based mobile robot navigation. In International Conference on Robotics
detection on the top right of the grid is minimised. After the and Automation, pages 13–18, New Orleans, april 2004.
fusion step, this false detection has a probability which means
the occupancy is unknown. The cells in front of the obstacle
are a little degraded.
VI. C ONCLUSION AND FUTURE WORK
In this paper, we proposed a real-time method to detect
obstacles using theoretical models of the ground plane using
the 3D point cloud given by a stereo camera, and an optical
flow field given by one of the stereo pair’s camera.
The performance of the global process is better than the
stereo detection or the optical flow detection alone. We could
improve the quality of the occupancy grid by adding more

1245

View publication stats

You might also like