0% found this document useful (0 votes)
23 views23 pages

A Survey of Video Processing Techniques For Traffic Applications

Uploaded by

podoutloud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views23 pages

A Survey of Video Processing Techniques For Traffic Applications

Uploaded by

podoutloud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Image and Vision Computing 21 (2003) 359–381

www.elsevier.com/locate/imavis

A survey of video processing techniques for traffic applications


V. Kastrinaki, M. Zervakis*, K. Kalaitzakis
Digital Image and Signal Processing Laboratory, Department of Electronics and Computer Engineering, Technical University of Crete, Chania 73100, Greece
Received 29 October 2001; received in revised form 18 December 2002; accepted 15 January 2003

Abstract
Video sensors become particularly important in traffic applications mainly due to their fast response, easy installation, operation and
maintenance, and their ability to monitor wide areas. Research in several fields of traffic applications has resulted in a wealth of video
processing and analysis methods. Two of the most demanding and widely studied applications relate to traffic monitoring and automatic
vehicle guidance. In general, systems developed for these areas must integrate, amongst their other tasks, the analysis of their static
environment (automatic lane finding) and the detection of static or moving obstacles (object detection) within their space of interest. In this
paper we present an overview of image processing and analysis tools used in these applications and we relate these tools with complete
systems developed for specific traffic applications. More specifically, we categorize processing methods based on the intrinsic organization of
their input data (feature-driven, area-driven, or model-based) and the domain of processing (spatial/frame or temporal/video). Furthermore,
we discriminate between the cases of static and mobile camera. Based on this categorization of processing tools, we present representative
systems that have been deployed for operation. Thus, the purpose of the paper is threefold. First, to classify image-processing methods used
in traffic applications. Second, to provide the advantages and disadvantages of these algorithms. Third, from this integrated consideration, to
attempt an evaluation of shortcomings and general needs in this field of active research.
q 2003 Elsevier Science B.V. All rights reserved.
Keywords: Traffic monitoring; Automatic vehicle guidance; Automatic lane finding; Object detection; Dynamic scene analysis

1. Introduction Image processing also finds extensive applications in the


related field of autonomous vehicle guidance, mainly for
The application of image processing and computer determining the vehicle’s relative position in the lane and
vision techniques to the analysis of video sequences of for obstacle detection. The problem of autonomous vehicle
traffic flow offers considerable improvements over the guidance involves solving different problems at different
existing methods of traffic data collection and road traffic abstraction levels. The vision system can aid the accurate
monitoring. Other methods including the inductive loop, localization of the vehicle with respect to its environment,
the sonar and microwave detectors suffer from serious which is composed of the appropriate lane and obstacles or
drawbacks in that they are expensive to install and other moving vehicles. Both lane and obstacle detection are
maintain and they are unable to detect slow or stationary based on estimation procedures for recognizing the borders
vehicles. Video sensors offer a relatively low installation of the lane and determining the path of the vehicle. The
cost with little traffic disruption during maintenance. estimation is often performed by matching the observations
Furthermore, they provide wide area monitoring allowing (images) to an assumed road and/or vehicle model.
analysis of traffic flows and turning movements (import- Video systems for either traffic monitoring or auton-
ant to junction design), speed measurement, multiple- omous vehicle guidance normally involve two major tasks
point vehicle counts, vehicle classification and of perception: (a) estimation of road geometry and (b)
vehicle and obstacle detection. Road traffic monitoring aims
highway state assessment (e.g. congestion or incident
at the acquisition and analysis of traffic figures, such as
detection) [1].
presence and numbers of vehicles, speed distribution data,
* Corresponding author. Tel.: þ 30-28210-37206; fax: þ 30-28210- turning traffic flows at intersections, queue-lengths, space
37542. and time occupancy rates, etc. Thus, for traffic monitoring it
E-mail address: [email protected] (M. Zervakis). is essential to detect the lane of the road and then sense
0262-8856/03/$ - see front matter q 2003 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0262-8856(03)00004-0
360 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

and identify presence and/or motion parameters of a vehicle. camera (automatic vehicle guidance), conceptually and
Similarly, in autonomous vehicle guidance, the knowledge algorithmically. In traffic monitoring, the lane and the
about road geometry allows a vehicle to follow its route and objects (vehicles) have to be detected on the image plane, at
the detection of road obstacles becomes a necessary and the camera coordinates. Alternatively, in vehicle guidance,
important task for avoiding other vehicles present on the the lane and the object (obstacle) positions must be located
road. at the actual 3D space. Hence, the two cases, i.e. stationary
In this paper we focus on video systems considering both and moving cameras, require different processing
areas of road traffic monitoring and automatic vehicle approaches, as illustrated in Sections 2 and 3 of the paper.
guidance. We attempt a state of the art survey of algorithms The techniques used for moving cameras can also be used
and tools for the two major subtasks involved in traffic for stationary cameras. Nevertheless, due to their complex-
applications, i.e. the automatic lane finding (estimation of ity and computational cost, they are not well suited for the
lane and/or central line) and vehicle detection (moving or relative simpler applications of stationary video analysis.
stationary object/obstacle). With the progress of research in Research in the field started as early as in the 70s with the
computer vision, it appears that these tasks should be trivial. advent of computers and the development of efficient image
The reality is not so simple; a vision-based system for such processing techniques. There is a wealth of methods for
traffic applications must have the features of a short either traffic monitoring or terrain monitoring for vehicle
processing time, low processing cost and high reliability guidance. Some of them share common characteristics and
[2]. Moreover, the techniques employed must be robust some originate from quite diverse approaches. The purpose
enough to tolerate inaccuracies in the 3D reconstruction of of this paper is threefold. First, to classify image-processing
the scene, noise caused by vehicle movement and methods used in traffic applications. Second, to provide the
calibration drifts in the acquisition system. The image advantages and disadvantages of these algorithms. Third,
acquisition process can be regarded as a perspective from this integrated consideration, to attempt an evaluation
transform from the 3D world space to the 2D image of shortcomings and general needs in this field of active
space. The inverse transform, which represents a 3D research. The paper proceeds by considering the problem of
reconstruction of the world from a 2D image, is usually automatic lane finding in Section 2 and that of vehicle
indeterminate (ill-posed problem) because information is detection in Section 3, respectively. In Section 4 we provide
lost in the acquisition mapping. Thus, an important task of a critical comparison and relate processing algorithms with
video systems is to remove the inherent perspective effect complete systems developed for specific traffic applications.
from acquired images [3,4]. This task requires additional The paper concludes by projecting future trends and
spatio-temporal information by means of additional sensors developments motivated by the demands of the field and
(stereo vision or other type sensors) or the analysis of the shortcomings of the available tools.
temporal information from a sequence of images. Stereo
vision and optical flow methods aid the regularization of the
inversion process and help recover scene depth. Some of the 2. Automatic lane finding
lane or object detection problems have been already solved
as presented in the next sections. Others, such as the 2.1. Stationary camera
handling of uncertainty and the fusion of information from
different sensors, are still open problems as presented in A critical objective in the development of a road
Section 4 that traces the future trends. monitoring system based upon image analysis is adapta-
In our analysis of video systems we distinguish between bility. The ability of the system to react to a changing scene
two situations. The first one is the case in which a static while carrying out a variety of goals is a key issue in
camera observes a dynamic road scene for the purpose of designing replacements to the existing methods of traffic
traffic surveillance. In this case, the static camera generally data collection. This adaptability can only be brought about
has a good view of the road objects because of the high by a generalized approach to the problem which incorpor-
position of the camera. Therefore, 2D intensity images may ates little or no a priori knowledge of the analyzed scene.
contain enough information for the model-based recognition Such a system will be able to adapt to ‘changing
of road objects. The second situation is the case in which circumstances’, which may include the following: changing
one or more vision sensors are mounted on a mobile vehicle light levels, i.e. night – day, or sunny – cloudy; deliberately
that moves in a dynamic road scene. In this case, the vision altered camera scene, perhaps altered remotely by an
sensors may not be in the best position for observing a road operator; accidentally altered camera position, i.e. buffeting
scene. Then, it is necessary to correlate video information by the wind or knocks due to foreign bodies; changing
with sensors that provide the actual state of the vehicle, or to analysis goals, i.e. traffic flow to counting or occupancy
combine multisensory data in order to detect road obstacles measurement. Moreover, an adaptive system would ease
efficiently [2]. installation of the equipment due to its ability for self-
Both lane and object detection become quite different in initialization [1]. Automatic lane finding (ALF) is an
the cases of stationary (traffic monitoring) and moving important task for an adaptive traffic monitoring system.
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 361

ALF can assist and simplify the installation of a detection attention’) to identify and extract the features of interest.
system. It enables the system to adapt to different † The system can assume a fixed or smoothly varying lane
environmental conditions and camera viewing positions. It width and thereby limit its search to almost-parallel lane
also enables applications in active vision systems, where the markings.
camera viewing angle and the focal length of the camera † A system can exploit its knowledge of camera and an
lens may be controlled by the system operator to find an assumption of a precise 3D road model (for example,
optimum view [5]. a flat road without bumps) to localize features easier
The aspects that characterize a traffic lane are its visual and simplify the mapping between image pixels and
difference from the environment and the relatively dense their corresponding world coordinates.
motion of vehicles along the lane. Thus, features that can be
easily inferred are the lane characteristics themselves (lane Real-time road segmentation is complicated by the great
markings and/or road edges) and the continuous change of
variability of vehicle and environmental conditions. Chan-
the scene along the lane area. Based on these features, we
ging seasons or weather conditions, time of the day, dirt on
can distinguish two classes of approaches in lane detection,
the road, shadows, spectral reflection when the sun is at low
namely lane-region detection and lane-border detection
angle and manmade changes (tarmac patches used to repair
(lane markings and road edges). The first class relates the
road segments) complicate the segmentation process.
detection of the lane with the changing intensity distribution
Because of these combined effects, robust segmentation is
along the region of a lane, whereas the second class
very demanding. Several features of structured roads, such
considers directly the spatial detection of lane character-
as color and texture, have been used to distinguish between
istics. It should be emphasized that the first class considers
road and non-road regions in each individual frame.
just changes in the gray-scale values within an image
Furthermore, road tracking can facilitate road segmentation
sequence, without addressing the problem of motion
based on previous information. This process, however,
estimation. The second class can be further separated,
requires knowledge of the vehicle dynamics, vehicle
based on the method of describing the lane characteristics.
Two general subclasses involve model-driven approaches in suspension, performance of the navigation and control
which deformable templates are iteratively modified to systems, etc.
match the road edges, and feature-driven approaches in Single-frame analysis has been extensively considered
which lane features are extracted, localized and combined to not only in monocular but also in stereo vision systems. The
meaningful characteristics. The latter approach limits the approaches used in stereo vision often involve independent
computation-intensive processing of images to simply processing on the left and right images and projection of the
extracting features of interest. result to the ground plane through the Helmholtz shear
equation, making the assumption of flat road and using
2.2. Moving camera piecewise road geometry models (such as clothoids) [7,8].
Furthermore, the inverse perspective mapping can be used
In the case of automatic vehicle guidance, the lane to simplify the process of lane detection, similar to the
detection process is designed to (a) provide estimates for the process of object detection considered in Section 3 [4]. The
position and orientation of the car within the lane and (b) inverse perspective mapping essentially re-projects the two
infer a reference system for locating other vehicles or images onto a common plane (the road plane) and provides
obstacles in the path of that vehicle. In general, both tasks a single image with common lane structure.
require two major estimation procedures, one regarding the In the case of a moving vehicle, the lane recognition
recognition of the borders of the lane and the second for the process must be repeated continuously on a sequence of
prediction of the path of the vehicle. The derivation of the frames. In order to accelerate the lane detection process,
path of the vehicle requires temporal information concern- there is a need to restrict the computation to a reduced
ing the vehicle motion, as well as modeling of the state of region of interest (ROI). There are two general approaches
the car (dynamics and kinematics). Alternatively, the lane in this direction. The first restricts the search on the
recognition task can be based on spatial visual information, predicted path of the vehicle by defining a search region
at least for the short-range estimation of the lane position. within a trapezoid on the image plane, which is located
Although some systems have been designed to work on through the perspective transform. The second approach
completely unstructured roads and terrain, lane detection defines small search windows located at the expected
has generally been reduced to the localization of specific position of the lane, separated by short spatial distances. A
features, such as lane markings painted on the road surface. rough prediction of the lane position at subsequent video
Certain assumptions facilitate the lane detection task and/or frames can highly accelerate the lane detection process. In
speed-up the processing [6]: one scheme, the estimated lane borders at the previous
frame can be expanded, making the lane virtually wider, so
† Instead of processing entire images, a computer vision that the actual lane borders at the next frame are searched
system can analyze specific regions (the ‘focus of for within this expanded ROI [9]. In a different scheme,
362 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

a least squares linear fit is used to extrapolate lane markings by its mean and variance of (R,G,B) values and it is a priori
and locate the new search windows at the next frames [10]. likelihood based on the expected number of pixels in each
Following the process of lane detection on the image class. Gaussian distributions have been used to model the
plane, the result must be mapped on the road (world) color classes [11].
coordinate system for navigation purposes. By assuming a The apparent color of an object is not consistent, due to
flat road model, the distance of a 3D-scene point on the road several factors. It depends on the illuminant color, the
plane can be readily computed if we know the transform- reflectivity of the object, the illumination and viewing
ation matrix between the camera and the vehicle coordinate geometry and the sensor parameters. The color of a scene
systems. In more general cases, the road geometry has to be may vary with time, cloud cover and other atmospheric
estimated in order to derive the transformation matrix conditions, as well as with the camera position and
between the vehicle and the road coordinate systems. The orientation. Thus, color as a feature for classification
aspects of relative position estimation are further considered requires special treatment and normalization to ensure
in Section 3, along with the object detection process. consistency of the classification results. Once the road has
been localized in an image, the color statistics of the road
2.3. Automatic lane finding approaches and off-road models need be modified in each class,
adapting the process to changing conditions [13]. The hue,
The fundamental aspects of ALF approaches are saturation, gray-value (HSV) space has been also used as
considered and reviewed in this section. These approaches more effective for classification [14].
are classified into lane-region detection, feature driven and Besides color, the local texture of the image has been
model driven approaches. used as a feature for classification [11,15]. The texture of the
road is normally smoother than that of the environment,
2.3.1. Lane-region detection allowing for region separation in its feature space. The
One method of automatic lane finding with stationary texture calculation can be based on the amplitude of the
camera can be based upon accumulating a map of significant gradient operator at each image area. Ref. [11] uses a
scene change [5]. The so-called activity map, distinguishes normalized gradient measure based on a high-resolution and
between active areas of the scene where motion is occurring a low-resolution (smoothed) image, in order to handle
(the road) and inactive areas of no significant motion (e.g. shadow interior and boundaries. Texture classification is
verges, central reservation). To prevent saturation and allow performed through stochastic patter recognition techniques
adaptation to changes of the scene, the map generation also and unsupervised clustering. Since the road surface is poorly
incorporates a simple decay mechanism through which textured and differs significantly from objects (vehicles) and
previously active areas slowly fade from the map. Once background, grey-level segmentation is likely to discrimi-
formed, the activity map can be used by a lane finding nate the road surface area from other areas of interest.
algorithm to extract the lane positions [1]. Unsupervised clustering on the basis of the C-means
The lane-region analysis can be also modeled as a algorithm or the Kohonnen self-organizing maps can be
classification problem, which labels image pixels into road employed on a 3D input space of features. Two of these
and non-road classes based on particular features. A typical features signify the position and the third signifies the grey-
classification problem involves the steps of feature level of each pixel under consideration. Thus, the classifier
extraction, feature decorrelation and reduction, clustering groups together neighboring pixels of similar intensities
and segmentation. For road segmentation applications, two [16].
particular features have been used, namely color and The classification step must be succeeded by a region
texture [11,12]. In the case of color, the features are merging procedure, as to combine similar small regions
defined by the spectral response of the illumination at the under a single label. Region merging may utilize other
red, green and blue bands. At each pixel, the (R,G,B) value sources of information, such as motion. In essence, a map of
defines the feature vector and the classification can be static regions obtained by simple frame differencing can
performed directly on the (R,G,B) scatter diagram of the provide information about the motion activity of neighbor-
image [12]. The green band contributes very little in the ing patches candidate for merging [16]. Texture classifi-
separation of classes in natural scenes and on the (R,B) cation can also be effectively combined with color
plane classification can be performed through a linear classification based on the confidence of the two classifi-
discriminant function [12], since road pixels cluster nicely, cation schemes [11].
distinct from non-road pixels. The classification process
can be based on piece-wise linear discriminant functions, 2.3.2. Feature-driven approaches
in order to account for varying color conditions on the road This class of approaches is based on the detection of
(shading, reflections, etc.) [12]. The road segmentation can edges in the image and the organization of edges into
also be performed using stochastic patter recognition meaningful structures (lanes or lane markings) [17]. This
approaches. One can define many classes representing class involves, in general, two levels of processing, i.e.
road and/or non-road segments. Each class is represented feature detection and feature aggregation. The feature
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 363

detection part aims at extracting intensity discontinuities. determined as the locations of maximum positive and
To make the detection more effective, a first step of image negative horizontal changes in illumination. Then, these
enhancement is performed followed by a gradient operator. edge points are aggregated as boundaries of the lane making
The dominant edges are extracted based on thresholding of (paint stripe) based on their spacing, which should
the gradient magnitude and they are refined through approximate the lane-marking width. The detected lanes at
thinning operators. At this stage, the direction of edges at near-range are extrapolated to far-range via linear least-
each pixel can be computed based on the phase of the squares fit, to provide an estimated lane-marking location
gradient and a curvature of line segments can be estimated for placing the subsequent search windows. The location of
based on neighborhood relations. the road markings along with the state of the vehicle are
Feature aggregation organizes edge segments into used in two different Kalman filters to estimate the near and
meaningful structures (lane markings) based on short- far-range road geometry ahead of the vehicle [10]. Prior
range or long-range attributes of the lane. Short-range knowledge of the road geometry imposes strong constraints
aggregation considers local lane fitting into the edge on the likely location and orientation of the lanes.
structure of the image. A realistic assumption that is often Alternatively, other features have been proposed that
used requires that the lane (or the lane marking) width does capture information about the orientation of the edges, but
not change drastically. Hence, meaningful edges of the are not affected drastically by extraneous edges. Along these
video image are located at a certain distance apart, in order lines, the LANA algorithm [24] uses frequency-domain
to fit the lane-width model. Long-range aggregation is based features rather than features directly related to the detected
on a line intersection model, based on the assumption of edges. These feature vectors are used along with a
smooth road curvature. Thus gross road boundaries and deformable-template model of the lane markers in a
markings must be directed towards a specific point in the Bayesian estimation setting. The deformable template
image, the focus of expansion (FOE) of the camera system. introduces a priori information, whereas the feature vectors
Along these directions, Ref. [4] detects lane markings are used to compute the likelihood probability. The
through a horizontal (linear) edge detector and enhances parameters of the deformable template are estimated by
vertical edges via a morphological operator. For each optimizing the resulting maximum a posteriori objective
horizontal line, it then forms correspondences of edge points function [24]. Simpler linear models are used in Ref. [14]
to a two-lane road model (three lane markings) and for road boundaries and lane markings, with their
identifies the most frequent lane width along the image, parameters estimated via a recursive least squares (RLS)
through a histogram analysis. All pairs of edge pixels (along filter fit on candidate edge points.
each horizontal line) that fall within some limits around this In general, feature driven approaches are highly
width are considered as lane markings and corresponding dependent on the methods used to extract features and
points on different scan lines are aggregated together as they suffer from noise effects and irrelevant feature
lines of the road. A similar approach is used in Ref. [18] for structures. Often in practice the strongest edges are not
auto-calibration of the camera module. The Road Markings the road edges, so that the detected edges do not necessarily
Analysis (ROMA) system is based on aggregation of the fit a straight-line or a smoothly varying model. Shadow
gradient direction at edge pixels in real-time [19]. To detect edges can appear quite strong, highly affecting the line
edges that are possible markings or road boundaries, it tracking approach.
employs a contour following algorithm based on the range
of acceptable gradient directions. This range is adapted in 2.3.3. Model-driven approaches
real-time to the current state variables of the road model. In model-driven approaches the aim is to match a
The system can cope with discontinuities of the road borders deformable template defining some scene characteristic to
and can track road intersections. the observed image, so as to derive the parameters of the
Ref. [20] detects brightness discontinuities and retains model that match the observations. The pavement edges and
only long straight lines that point toward the FOE. For each lane markings are often approximated by circular arcs on a
edge point, it preserves the edge direction and the flat-ground plane. More flexible approaches have been
neighboring line curvature and performs a first elimination considered in Refs. [25,26] using snakes and splines to
of edges based on thresholding of the direction and model road segments. In contrast to other deformable line
curvature. This is done to preserve only straight lines that models, Ref. [26] uses a spline-based model that describes
point towards the specific direction of the FOE. The feature the perspective effect of parallel lines, considering simul-
aggregation is performed through correlation with a taneously both-side borders of the road lane. For small to
synthetic image that encodes the road structure for the moderate curvatures, a circular arc is approximated by a
specific FOE. The edge detection can be efficiently second-order parabola, whose parameters must be esti-
performed through morphological operators [21 – 23]. mated. The estimation can be performed on the image plane
The approach in Ref. [10] operates on search windows [27] or on the ground plane [24] after the appropriate
located along the estimated position of the lane markings. perspective mapping. Bayesian optimization procedures are
For each search window, the edges of the lane marking are often used for the estimation of these parameters.
364 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

Model-based approaches for lane finding have been pieced together from clothoids (i.e. arcs with constant
extensively employed in stereo vision systems, where the curvature change over their run length). The road assump-
estimation of the 3D structure is also possible. Such tions define a general highway scene, where the ground
approaches assume a parametric model of the lane plane is flat, the road boundaries are parallel with constant
geometry, and a tracking algorithm estimates the parameters width, the horizontal road curvature changes slowly (almost
of this model from feature measurements in the left and linearly) and the vertical curvature is insignificant. Assum-
right images [28]. In Ref. [28] the lane tracker predicts ing slow speed changes, or piecewise constant speed, the
where the lane markers should appear in the current image temporal change of curvature is linearly related to the speed
based on its previous estimates of the lane position. It then of the vehicle. Thus, the curvature parameters and their
extracts possible lane markers from the left and right association with the ego-motion of the camera can be
images. These feature measurements are passed to a robust formulated into a compact system of differential equations,
estimation procedure, which recovers the parameters of the providing a dynamic model for these parameters. The
lane along with the orientation and height of the stereo rig location of the road boundaries in the image is determined
with respect to the ground plane. The Helmholtz shear by three state variables, i.e. the vehicle lateral offset from
equation is used to verify that candidate lane markers the lane center, the camera heading relative to the road
actually lie on the ground plane [28]. The lane markers are direction, and the horizontal road curvature. The Kalman
modeled as white bars of a particular width against a darker filtering algorithm is employed in Ref. [32] to estimate the
background. Regions in the image that satisfy this intensity state-variables of the road and reconstruct the 3D location of
profile can be identified through a template matching the road boundaries.
procedure. In this form, the width of the lane markers in The previous model assumes no vertical curvature and no
the image changes linearly as a function of the distance from vertical deviation of the camera with respect to the road.
the camera, or the location of the image row considered. These assumptions imply a flat-road geometry model, which
Thus, different templates are used at different image is of limited use in practice. Other rigorous models, such as
locations along the length of the road, in both the left and the hill-and-dale and the zero-bank models have been
right images. Once a set of candidate lane markers has been considered for road geometry reconstruction [12,33]. The
recovered, the lane tracker applies a robust fitting procedure hill-and-dale model uses the flat-road model for the two
using the Hough transform, to find the set of model roadway points closest to the vehicle in the image, and
parameters which best match the observed data [28]. A forces the road model to move up or down from the flat-road
robust fitting strategy is absolutely essential in traffic plane so as to retain a constant road width. The zero-bank
applications, because on real highway traffic scenes the assumption models the road as a space ribbon generated by a
feature extraction procedure almost always returns a central line-spine and horizontal line-segments of constant
number of extraneous features that are not part of the lane width cutting the spine at their midpoint at a normal to the
structure. These extra features can come from a variety of spine’s 3D direction. Even more unstructured road geome-
sources, other vehicles on the highway, shadows or cracks in try is studied in Ref. [34], where all local road parameters
the roadway etc. are involved in the state-variable estimation process.
Another class of model-driven approaches involves the Model-driven approaches provide powerful means for
stochastic modeling of lane parameters and the use of the analysis of road edges and markings. However, the use
Bayesian inference to match a road model to the observed of a model has certain drawbacks, such as the difficulty in
scene. The position and configuration of the road, for choosing and maintaining an appropriate model for the road
instance, can be considered as variables to be inferred from structure, the inefficiency in matching complex road
the observation and the a posteriori probability conditioned structures and the high computational complexity.
on this observation [25,29]. This requires the description of
the road using small segments and the derivation of
probability distributions for the relative positions of these 3. Object detection
segments on regular road scenes (prior distribution on road
geometry). Moreover, it requires the specification of 3.1. Stationary camera
probability distributions for observed segments, obtained
using an edge detector on the observed image, conditioned In road traffic monitoring, the video acquisition cameras
on the possible positions of the road segments (a posteriori are stationary. They are placed on posts above the ground
distribution of segments). Such distributions can be derived to obtain optimal view of the road and the passing vehicles.
from test data [29]. In automatic vehicle guidance, the cameras are moving
The 3D model of the road can also be used in modeling with the vehicle. In these applications it is essential to
the road parameters through differential equations that relate analyze the dynamic change of the environment and its
motion with spatial changes. Such approaches using state- contents, as well as the dynamic change of the camera
variable estimation (Kalman filtering) are developed in itself. Thus, object detection from a stationary camera is
Refs. [30,31]. The road model consists of skeletal lines simpler in that it involves fewer estimation procedures.
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 365

Initial approaches in this field involve spatial, temporal and the 3D environment and its changes (self/ego motion or
spatio-temporal analysis of video sequences. Using a relative motion of other objects). It is possible from
sequence of images the detection principle is based monocular vision to extract certain 3D information from a
essentially on the fact that the objects to be searched for single 2D-projection image, using visual cues and a priori
are in motion. These methods prioritize temporal charac- knowledge about the scene. In such systems, obstacle
teristics compared with spatial characteristics, i.e. the determination is limited to the localization of vehicles by
detection deals mainly with the analysis of variations in means of a search for specific patterns, possibly supported
time of one and the same pixel rather than with the by other features such as shape, symmetry, or the use of a
information given by the environment of a pixel in one bounding box [38 – 40]. Essentially, forward projection of
image [35]. More advanced and effective approaches 3D models and matching with 2D observations is used to
consider object modeling and tracking using state-space derive the structure and location of obstacles. True 3D
estimation procedures for matching the model to the modeling, however, is not possible with monocular vision
observations and for estimating the next state of the object. and single frame analysis.
The most common techniques, i.e. analysis of the optical The availability of only partial information in 2D
flow field and processing of stereo images, involve images necessitates the use of robust approaches able to
processing two or more images. With optical-flow-field infer a complete scene representation from only partial
analysis, multiple images are acquired at different times representations. This problem concerns the matching of a
[36]; stereo images, of course, are acquired simultaneously low-abstraction image to a high-abstraction and complex-
from different points of view [37]. Optical-flow-based ity object. In other words, one must handle differences
techniques detect obstacles indirectly by analyzing the between the representation of the acquired data and the
velocity field. Stereo image techniques identify the projected representation of the models to be recognized. A
correspondences between pixels in the different images. priori knowledge is necessary in order to bridge the gap
Stereovision has advantages in that it can detect obstacles between these two representations [41]. A first source of
directly and, unlike optical-flow-field analysis, is not additional information is the temporal evolution of the
constrained by speed. Several approaches considering observed image, which enables the tracking of features
different aspects of object and motion perception from a over time. Furthermore, the joint consideration of a frame
stationery camera are considered in Section 3.3. sequence provides meaningful constraints of spatial
features over time or vice versa. For instance, Ref. [42]
3.2. Moving camera employs smoothness constraints on the motion vectors,
which are imposed by the gray-scale spatial distribution.
Autonomous vehicle guidance requires the solution of Such form of constraints convey the realistic assumption
different problems at different abstraction levels. The that compact objects should preserve smoothly varying
vision system can aid the accurate localization of the displacement vectors. The initial form of integrated
vehicle with respect to its environment, by means of spatio-temporal analysis operates on a so-called 2 12 D
matching observations (acquired images) over time, or feature space, where 2D features are tracked in time.
matching a single observation to a road model or even Additional constraints can be imposed through the
matching a sequence of observations to a dynamic model. consideration of 3D models for the construction of the
We can identify two major problems with the efficient environment (full 3D space reconstruction) and the
recognition of the road environment, namely the restricted matching of 2D data (observations) with the 3D
processing time for real-time applications and the limited representation of these models, or their projection on the
amount of information from the environment. For efficient camera coordinates (pose estimation problem). Such
processing we need to limit the ROI within each frame model information, by itself, enables the consideration
and process only relevant features within this ROI instead and matching of relative object poses [43].
of the entire image. Since the scene in traffic applications With the latest advances in computer architecture and
does not change drastically, the prediction of the ROI hardware, it becomes possible to consider even the dynamic
from previously processed frames become of paramount modeling of 3D objects. This possibility paved the way to
importance. Several efficient methods presented in the fully integrated spatio-temporal processing, where two
following are based on dynamic scene prediction using general directions have been proposed. The first one
motion and road models. The problem of limited amount considers the dynamic matching of low-abstraction (2D
of information in each frame stems from the fact that each image-level) features between the data and the model.
frame represents a non-invertible projection of the Although it keeps continuous track of changes in the 3D
dynamically changing 3D world onto the camera plane. model using both road and motion modeling (features in a
Since single frames encode only partial information, 3 12 D space), it propagates the current 2D representation of
which could be easily misinterpreted, the systems for the model in accordance with the current state of the camera
autonomous vehicle guidance require additional infor- with respect to the road [44]. Thus, it matches the
mation in the form of a knowledge-base that models observations with the expected projection of the world
366 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

onto the camera system and propagates the error for its adjacent pixels, edge strength, or successive frame
correcting the current (model) hypothesis [31]. The second differencing for motion analysis [5].
approach uses a full 4D model, where objects are treated as
3D motion processes in space and time. Geometric shape 3.3.3. Edge-based detection (spatial differentiation)
descriptors together with generic models for motion form Approaches in this class are based on the edge-features of
the basis for this integrated (4D or dynamic vision) analysis objects. They can be applied to single images to detect the
[45]. Based on this representation one can search for edge structure of even still vehicles [49]. Morphological
features in the 4D-space [45], or can match observations edge-detection schemes have been extensively applied,
(possibly from different sensors or information sources) and since they exhibit superior performance [4,18,50]. In traffic
models at different abstraction levels (or projections) [41]. scenes, the results of an edge detector generally highlight
This evolution of techniques and their abilities is summar- vehicles as complex groups of edges, whereas road areas
ized in Table 1 that is further discussed in the conclusion, yield relatively low edge content. Thus the presence of
after the consideration of established approaches. Some vehicles may be detected by the edge complexity within the
relevant approaches for moving object detection from a road area, which can be quantified through analysis of the
moving camera are summarized in Section 3.3. histogram [51].
Alternatively, the edges can be grouped together to form
3.3. Object detection approaches the vehicle’s boundary. Towards this direction, the
algorithm must identify relevant features (often line
Some fundamental issues of object detection are segments) and define a grouping strategy that allows the
considered and reviewed in this section. Approaches have identification of feature sets, each of which may correspond
been categorized according to the method used to isolate the to an object of interest (e.g. potential vehicle or road
object from the background on a single frame or a sequence obstacle). Vertical edges are more likely to form dominant
of frames. line segments corresponding to the vertical boundaries of
the profile of a road obstacle. Moreover, a dominant line
segment of a vehicle must have other line segments in its
3.3.1. Thresholding
neighborhood that are detected in nearly perpendicular
This is one of the simplest, but less effective techniques, directions. Thus, the detection of vehicles and/or obstacles
which operates on still images. It is based on the notion that can simply consist of finding the rectangles that enclose the
vehicles are compact objects having different intensity form dominant line segments and their neighbors in the image
their background. Thus, by thresholding intensities in small plane [2,30]. To improve the shape of object regions
regions we can separate the vehicle from the background. Ref. [52,53] employ the Hought transform to extract
This approach depends heavily on the threshold used, which consistent contour lines and morphological operations to
must be selected appropriately for a certain vehicle and its restore small breaks on the detected contours. Symmetry
background. Adaptive thresholding can be used to account provides an additional useful feature for relating these line
for lighting changes, but cannot avoid the false detection of segments, since vehicle rears are generally contour and
shadows or missed detection of parts of the vehicle with region-symmetric about a vertical central line [54].
similar intensities as its environment [46]. To aid the Edge-based vehicle detection is often more effective than
thresholding process, binary mathematical morphology can other background removal or thresholding approaches, since
be used to aggregate close pixels into a unified object [47]. the edge information remains significant even in variations
Furthermore, gray-scale morphological operators have been of ambient lighting [55].
proposed for object detection and identification that are
insensitive to lighting variation [48]. 3.3.4. Space signature
In this detection method, the objects to be identified
3.3.2. Multigrid identification of regions of interest (vehicles) are described by their characteristics (forms,
A method of directing attention to regions of interest dimensions, luminosity), which allow identification in their
based on multiresolution images is developed in Ref. [5]. environment [56,57]. Ref. [57] employs a logistic regression
This method first generates a hierarchy of images at approach using characteristics extracted from the vehicle
different resolutions. Subsequently, a region search begins signature, in order to detect the vehicle from its background.
at the top level (coarse to fine). Compact objects that differ Alternatively, the space signatures are defined in Ref. [58]
from their background remain distinguishable in the low- by means of the vehicle outlines projected from a certain
resolution image, whereas noise and small intensity number of positions (poses) on the image plane from a
variations tend to disappear at this level. Thus, the low- certain geometrical vehicle model. A camera model is
resolution image can immediately direct attention to the employed to project the 3D object model onto the camera
pixels that correspond to such objects in the initial image. coordinates at each expected position. Then, the linear edge
Each pixel of interest is selected according to some interest segments on each observed image are matched to the model
function which may be a function of the intensity values of by evaluating the presence of attributes of an outline, for
Table 1
Progressive use of information in different levels of system complexity and functionality

V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381


367
368 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

each of the pre-established object positions (poses). In a the stationary regions of the background are replaced by the
similar framework, Ref. [59] projects the 3D model at average of the current frame and the previous background
different poses to sparse 2D arrays, essentially encoding [50].
information about the projected edges. These arrays are
used for matching with the image data. 3.3.6. Inter-frame differencing
Space signatures can also be identified in an image This is the most direct method for making immobile
through correlation or template matching techniques, using objects disappear and preserving only the traces of objects
directly the typical gray-scale signature of vehicles [60]. in motion between two successive frames. The immediate
Due to the inflexible nature of template matching, a specific consequence is that stationary or slow-moving objects are
template must be created for each type of vehicle to be not detected. The inter-frame difference succeeds in
recognized. This creates a problem, since there are many detecting motion when temporal changes are evident.
geometrical shapes for vehicles contained in the same However, it fails when the moving objects are not
vehicle-class. Moreover, the template mask assumes that sufficiently textured and preserve uniform regions with the
there is little change in the intensity signature of vehicles. In background. To overcome this problem, the inter-frame
practice, however, changes in ambient lighting, shadows, difference is described using a statistical framework often
occlusion, and severe light reflection on the vehicle body employing spatial Markov random fields [64 –66]. Alter-
panels generate serious variation in the spatial signatures of natively, in Ref. [64] the inter-frame difference is modeled
same-type vehicles. To overcome such problems, the TRIP trough a two-component mixture density. The two com-
II system [58,61] employs neural networks for recalling ponents are zero mean corresponding to the static (back-
space signatures, and exploits their ability to interpolate ground) and changing (moving object) parts of the image.
among different known shapes [62]. Inter-frame differencing provides a crude but simple tool for
Despite its inefficiencies, vehicle detection based on sign estimating moving regions. This process can be comple-
patterns does not require high computational effort. More- mented with background frame differencing to improve the
over, it enables the system to deal with the tracking process estimation accuracy [67]. The resulting mask of moving
and keep the vehicle in track by continuously sensing its regions can be further refined with color segmentation [68]
sign pattern in real time. or accurate motion estimation by means of optical flow
estimation and optimization of the displaced frame
3.3.5. Background frame differencing difference [16,67], in order to refine the segmentation of
In the preceding methods, the image of motionless moving objects.
objects (background image) is insignificant. On the
contrary, this method is based on forming a precise 3.3.7. Time signature
background image and using it for separating moving This method encodes the intensity profile of a moving
objects from their background. The background image is vehicle as a function of time. The profile is computed at
specified either manually, by taking an image without several positions on the road as the average intensity of
vehicles, or is detected in real-time by forming a pixels within a small window located at each measurement
mathematical or exponential average of successive images. point. The analysis of the time signature recorded on these
The detection is then achieved by means of subtracting the points is used to derive the presence or absence of vehicles
reference image from the current image. Thresholding is [69]. The time signal of light intensity on each point is
performed in order to obtain presence/absence information analyzed by means of a model with pre-recorded and
of an object in motion [5,35,38]. periodically updated characteristics. Spatial correlation of
The background can change significantly with shadows time signatures allows further reinforcement of detection. In
cast by buildings and clouds, or simply due to changes in fact, the joint consideration of spatial and time signatures
lighting conditions. With these changing environmental provides valuable information for both object detection and
conditions, the background frame is required to be updated tracking. Through this consideration, the one task can
regularly. There are several background updating tech- benefit from the results of the other in terms of reducing the
niques. The most commonly used are averaging and overall computational complexity and increasing the
selective updating. In averaging, the background is built robustness of analysis [70]. Along these lines, the adaptable
gradually by taking the average of the previous background time delay neural network developed for the Urban Traffic
with the current frame. If we form a weighted average Assistant (UTA) system is designed and trained for
between the previous background and the current frame, the processing complete image sequences [71]. The network
background is build through exponential updating [63]. In is applied for the detection of general obstacles in the course
selective updating, the background is replaced by the of the UTA vehicle.
current frame only at regions with no motion detected;
where the difference between the current and the previous 3.3.8. Feature aggregation and object tracking
frames is smaller than a threshold [63]. Selective updating These techniques can operate on the feature space to
can be performed in a more robust averaging form, where either identify an object, or track characteristic points of
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 369

the object [32]. They are often used in object detection to location x at time t: The optical flow field encodes the
improve the robustness and reliability of detection and temporal displacement of observable gray-scale structures
reduce false detection rates. The aggregation step handles within an image sequence. It comprises information not
features previously detected, in order to find the vehicles only about the relative displacement of pixels, but also
themselves or the vehicle queues (in case of congestion). about the spatial structure of the scene.
The features are aggregated with respect to the vehicle’s Various approaches have been proposed for the efficient
geometrical characteristics. Therefore, this operation can be estimation of optical flow field [42,78 – 80]. In general, they
interpreted as a pattern recognition task. Two general can be characterized as (i) gradient-based (ii) correlation
approaches have been employed for feature aggregation, based (iii) feature-based and (iv) multigrid methods.
namely motion-based and model-based approaches [64]. Gradient-based techniques focus on matching gðx 2 uDt; 
Motion-based approaches group together visual motion t 2 DtÞ with gðx; tÞ on a pixel-by-pixel basis through the
consistencies over time [64,72,73]. Motion estimation is temporal gradient of the image sequence. In most cases, the
only performed at distinguishable points, such as corners intensity variations alone do not provide sufficient infor-
[72,74], or along contours of segmented objects [75], or mation to completely determine both components (magni-
within segmented regions of similar texture [14,67,70]. Line tude and direction) of the optical flow field uðx; tÞ [81].
segments or points can also be tracked in the 3D space by Smoothness constraints facilitate the estimation of optical
estimating their 3D displacements via a Kalman filter flow fields even for areas with constant or linearly
designed for depth estimation [18,64,72,73]. Model-based distributed intensities [78 – 80,82]. Gradient-based tech-
approaches match the representations of objects within the niques yield poor results for poor-texture images and in
image sequence to 3D models or their 2D projections from presence of shocks and vibrations [83]. Under such
different directions (poses) [44,73]. Several model-based conditions, correlation-based techniques usually derive
approaches have been proposed employing simple 2D more accurate results. Correlation-based techniques search
region models (mainly rectangles), active contours and for the maximum shift around each pixel that maximizes the
polygonal approximations for the contour of the object, 3D correlation of gray-level patterns between two consecutive
models that can be tracked in time and 4D models for full frames. Such procedures are quite expensive in terms of
spatio-temporal representation of the object [73,76]. computational complexity. Attempts to speed up the
Following the detection of features, the objects are computation at the cost of resolution often imply sub-
tracked. Two alternative methods of tracking are employed sampling of the image and computation of the motion field
in Ref. [32], namely numeric signature tracking and at fewer image points [83].
symbolic tracking. In signature tracking, a set of intensity Feature-based approaches consider the organization
and geometry-based signature features are extracted for (clustering) of pixels into crude object structures in each
each detected object. These features are correlated in the frame and subsequently compute motion vectors by
next frame to update the location of the objects. Next, the matching these structures in the sequence of frames. A
signatures are updated to accommodate for changes in robust feature-based method for the estimation of optical
range, perspective, and occlusion. In general, features for flow vectors has been developed by Kories and Zimmer-
tracking encode boundary (edge based) or region (object mann [84]. Each frame is first subjected to a bandpass filter.
motion, texture or shape) properties of the tracked object. Blobs representing local maxima and minima of the gray-
Active contours, such as snakes and geodesic contours are level are identified as features. The centroids of the detected
often employed for the description of boundaries and their blobs are tracked through subsequent frames, resulting in
evolution over the sequence of frames. For region-based optical flow vectors. A related technique is considered in
features tracking is based on correspondences among the Ref. [85], which aims at matching areas of similar
associated target regions at different time instances [64,77]. intensities in two consecutive frames. To reduce the amount
In symbolic tracking, objects are independently detected in of computation, pixels of interest are segmented prior to
each frame. A symbolic correspondence is made between matching using background removal, edge detection or
the sets of objects detected in a frame pair. A time- inter-frame difference. The accuracy of these techniques is
sequenced trajectory of each matched object provides a affected by sensor noise (quantization), algorithmic disturb-
track of the object [32]. ances and, more importantly, perspective distortions and
occlusion resulting from typical camera positions. Never-
3.3.9. Optical flow field theless, the methods are suitable for on-line qualitative
Approaches in this class exploit the fact that the monitoring, operating at much faster speeds than
appearance of a rigid object changes little during motion, human operators and without the problem of limited
whereas the drastic changes occur at regions where the attention spans [85].
object moves in and/or out of the background. The optical Multigrid methods are designed for fast estimation of the
flow field uðx; tÞ is computed by mapping the gray-value relevant motion vectors at low resolution and hierarchical
gðx 2 uDt; t 2 DtÞ recorded at time t 2 Dt at the image refinement of the motion flow field at higher resolution
point x 2 uDt onto the gray-value gðx; tÞ recorded at levels [86]. The multigrid approach in Ref. [5] relies upon
370 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

the organization of similar pixel-intensities into objects, centered at the epipole. Independently moving objects can
similar to the feature based approaches. This approach, be recovered by verifying that the displacement at any given
however, identifies object structures at low-resolution levels point is directed away from the epipole [91].
where it also computes a crude estimate of the motion field The problem of recovering the optical flow from time-
from the low-resolution image sequence. The motion vector varying image sequences is ill-posed and additional
field is refined hierarchically at higher resolution levels. A constraints must be often imposed to derive satisfactory
related approach is used in the ACTIONS system, where the solutions. Smoothness constraints stem from the fact that
optical flow vectors are clustered in order to incrementally uniformly moving objects possess slightly changing motion
create candidate moving-objects in the picture domain [81]. fields. Such constraints have been used in a joint spatio-
For a still camera, moving objects are readily identified temporal domain of analysis [92]. Ref. [93] first calculates
by thresholding the optical flow field. The detection of the optical flow and after smoothing the displacement
moving objects in image sequences taken from a moving vectors in both the temporal and the spatial domains, it
camera becomes much more difficult due to the camera merges regions of relatively uniform optical flow. Finally, it
motion. If a camera is translating through a stationary employs a voting process over time in each spatial location
environment, then the directions of all optical-flow vectors regarding the direction of the displacement vectors to derive
intersect at one point in the image plane, the focus of consistent trends in the evolution of the optical flow field
expansion or the epipole [81]. When the car bearing the and, thus, define consistently moving objects. In a different
camera is moving in a stationary environment along a flat form, Ref. [94] starts from similarity in the spatial domain.
road and the camera axis is parallel to the ground, the For each frame, it defines characteristic features (such as
motion field (due to ego-motion) is expected to have almost corners and edges) and matches these features on the present
quadratic structure [83]. If another moving object becomes and the previous frame to derive a list of flow vectors.
visible by the translating camera, the optical flow field Similar flow vectors are grouped together and compared to
resulting from this additional motion will interfere with the the spatial features, in order to verify not only temporal but
optical flow field of the ego-motion. This interference can be also spatial consistency of detected moving objects. In a
detected by testing if the calculated optical-flow vectors similar form, Ref. [94] defines patches of similar spatial
have the same direction as the estimated ego-motion model characteristics in each frame and uses local voting over the
vectors [81,83]. The detection of obstacles from a moving output of a correlation-type motion detector to detect
camera based on the optical flow field is generally divided moving objects. It also uses the inverse perspective mapping
into two steps. The ego-motion is first computed from the to eliminate motion effects on the ground plane due to the
analysis of the optical flow. Then, moving or stationary ego-motion of the camera [94].
obstacles are detected by analyzing the difference between
the expected and the real velocity fields [36,72,87]. These 3.3.10. Motion parallax
fields are re-projected to the 3D road coordinate system When the camera is moving forward towards an object,
using a model of the road (usually flat straight road) [88,89]. the object’s projection on the 2D image plane also moves
The estimation of ego-motion can be based on parametric relative to the image coordinate system. If an object extends
models of the motion field. For planar motion with no vertically from the ground plane, its image moves
parallax (no significant depth variations), at most eight differently from the immediate background. Moreover, the
parameters can characterize the motion field. These motion of points on the same object appears different
parameters can be estimated by optimizing an error measure relative to the background, depending on the distance from
on two subsequent frames using a gradient-based estimation the ground plane. This difference is called motion parallax
approach [66,90]. The optimization process is often applied [87]. If the environment is constrained, e.g. motion on a
on a multiresolution representation of the frames, to provide planar road, then differences observed on the motion-vector
robust performance of the algorithm [90]. When the scene is can be used to derive information regarding the objects
piecewise planar, or is composed of a few distinct portions moving within the scene. If we use the displacement field of
at different depths, then the ego-motion can be estimated in the road to displace the object, a clear difference between
layers of 2D parametric motion estimation. Each layer the predicted and the actual position of the object is
estimates motion at a certain depth due to the camera and experienced. In other words, all points in the image that are
removes the associated portions of the image. Image regions not on the ground plane will be erroneously predicted. Thus,
that cannot be aligned in two frames at any depth are the prediction error (above an acceptable threshold)
segmented into independently moving objects [90]. For indicates locations of vertically extended objects in the
more general motion of the camera, the ego-motion effect scene [87]. If we compensate the ego-motion of the camera,
can be decomposed into the planar and the parallax parts. then independently moving (or stationary) obstacles can be
After compensating for the planar 2D motion, the residual readily detected.
parallax displacements in two subsequent frames are The parallax effect is used in the Intelligent Vehicle (IV)
primarily due to translational motion of the camera. These in a different form for obstacle detection [95]. A stereo rig is
displacements due to camera motion form a radial field positioned vertically, so that one camera is located above
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 371

the other. Obstacles located above the ground plane appear sensors of the vehicle. Using this information, the road
identical in the camera images, except from their different geometry can be estimated from visual data [10,45].
location. On the other hand, figures on the road appear
different on the two cameras. In this configuration, an 3.3.12. Inverse perspective mapping
obstacle generates the same time signature, whereas road A promising approach in real-time object detection from
figures generate different time signatures on the two video images is to remove the inherent perspective effect
cameras. Thus, progressive scanning and delaying one of from acquired single or stereo images. The perspective
the camera signals make the detection of obstacles possible. effect relates differently 3D points on the road (world)
Nevertheless, the system relies on and is highly affected by coordinate system with 2D pixels on the image plane,
brightness changes, shadows and shades on the road depending on their distance from the camera. This effect
structure [95]. associates different information content to different image
pixels. Thus, road markings or objects of the same size
appear smaller in the image as they move away from the
3.3.11. Stereo vision
camera coordinate system. The inverse perspective mapping
The detection of stationary or moving objects in traffic aims at inverting the perspective effect, forcing homo-
applications has been also considered through stereo vision geneous distribution of information within the image plane.
systems. The disparity between points in the two stereo To remove the perspective effect it is essential to know the
images relates directly to the distance of the actual 3D image acquisition structure with respect to the road
location from the cameras. For all points lying on a plane, coordinates (camera position, orientation, etc.) and the
the disparity on the two stereo images is a linear function of road geometry (the flat-road assumption highly simplifies
image coordinates (Helmholtz shear equation). This Helm- the problem). The inverse perspective mapping can be
holtz shear relation highly simplifies the computation of applied to stereovision [4], by re-mapping both right and left
stereo disparity. It may be used to re-map the right image images into a common (road) domain. Using this approach,
onto the left, or both images onto the road coordinate the localization of the lane and the detection of generic
system, based on the given model of the road in front of the obstacles on the road can be performed without any 3D-
vehicle (e.g. flat straight road) [6,7,28,38,96]. All points on world reconstruction [4]. The difference of the re-mapped
the ground plane appear with zero disparities, whereas views transforms relatively square obstacles into two
residual disparities indicate objects lying above the ground neighboring triangles corresponding to the vertical bound-
plane and can become potential obstacles. A simple aries of the object, which can be easily detected on a polar
threshold can be used to identify these objects in the histogram of the difference image.
difference of the re-mapped images.
Besides the projection of images onto the ground plane, 3.3.13. 3D modeling and forward mapping
stereo vision can be effectively used for the reconstruction The previous approaches reflect attempts to invert the 3D
of the 3D space ahead of the vehicle. This reconstruction is projection for a sequence of images and reconstruct the
based on correspondences between points in the left and actual (world) spatial arrangement and motion of objects.
right images. Once this has been accomplished, the 3D The class of model-based techniques takes a different
coordinates of the matched point can be computed via a re- approach. It tries to solve the analysis task by carrying out
projection transform. The approach in the Path project [28] an iterative synthesis with prediction error feedback using
considers such a matching of structural characteristics spatio-temporal world models.
(vertical edges). Candidate matches in the left and right Model based approaches employ a parameterized 3D
images are evaluated by computing the correlation between vehicle model for both its structural (shape) characteristics
a window of pixels centered on each edge [28]. The and its motion [73,76]. Considering first a stationery
matching can also be based on stochastic modeling, which camera, two major problems must be solved, namely the
can take under consideration the spatial intra and inter- model matching and the motion estimation. The model
correlation of the stereo images [97]. The re-projection matching process aims at finding the best match between the
transform maps the matched points onto the road coordinate observed image and the 3D model projected onto the camera
system. For this purpose it is necessary to know the exact plane. This step is essentially a pose identification process,
relationship among the camera, vehicle and road coordinate which derives the 3D position of the vehicle relative to the
systems. Under the assumption of a flat road, this re- camera coordinates, based on 2D projections. The vehicle
projection process is quite straightforward (triangulation model often assumes straight line segments represented by
transform). In the case of general road conditions, however, their length and mid point location [44]. The line segments
the road geometry has to be estimated first in order to derive extracted from the image are matched to the model
the re-projection transform from the camera to road segments projected on the 2D camera plane. The matching
coordinate systems. This estimation requires the exact can be based on the optimization of distance measures
knowledge of the state of the car (yaw rate, vehicle speed, between the observation and the model; the Mahalanobis
steering angle, etc.), which can be provided by appropriate distance is used in Ref. [44]. The motion estimation process
372 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

is based on models that describe the vehicle motion. The the spatial state estimation through vision can be performed
motion parameters of this model are estimated using a time- through recursive least squares estimation and Kalman
recursive estimation process. For instance, the maximum a filtering schemes, where the Jacobian matrix reflects the
posteriori (MAP) estimator is employed in Ref. [44], observed image variation.
whereas the extended Kalman filter is used in Ref. [98]. By applying this scheme to each object in the
The estimation of motion and shape parameters can be environment in parallel, an internal representation of the
combined in a more general (overall) state estimation actual environment can be maintained in the interpretation
process [98]. process, by prediction error feedback [31]. A Kalman filter
In the case of a moving camera, the changing views of can be used to predict the vector of the state estimates based
objects during self or ego-motion reveal different aspects of on the vectors of measurements and control variables. The
the 3D geometry of objects and their surrounding environ- measurement equation has to be computed only in the
ment. It becomes obvious that knowledge about the forward direction, from state variables (3D world) to
structure of the environment and the dynamics of motion measurement space (image plane). This approach avoids
are relevant components in real-time vision. In a computer- the ill-posed approximation of the non-unique inverse
ized system, generic models of objects from the real world projection transform, through the fusion of dynamic models
can be stored as three-dimensional structures carrying (that describe spatial motion) with 3D shape models (that
visible features at different spatial positions relative to describe spatial distribution of visual features). The forward
their center of gravity. From the ego-motion dynamics, the projection mapping is easily evaluated under the flat-road
relative position of the moving vehicle and its camera can be model [31]. Other road models, including the Hill-and-Dale,
inferred. From this knowledge and applying the laws of Zero-Bank and Modified Zero-Bank models have been
forward projection (which is done much more easily than considered along with the inverse and/or forward mapping
the inverse), the position and orientation of visual features in [12,33,34].
the image can be matched to those of the projected model
[31,34,66]. In a different form, Ref. [31] models the
remaining difference image from two consecutive frames 4. Representative systems and future trends
after ego-motion compensation as a Markov random field
(MRF) that incorporates the stochastic model of the Based on the previous categorization of video analysis
hypothesis that a pixel is either static (background) or methods, we attempt a brief review of existing systems
mobile (vehicle). The MRF also induces spatial and for traffic monitoring and automatic vehicle guidance. It
temporal smoothness constraints. The optimization of the should be mentioned that this review does not by any
energy function of the resulting Gibbs posterior distribution means cover all existing systems, but it rather considers
provides the motion-detection map at every pixel [66]. representative systems that highlight the major trends in
The dynamic consideration of a world model allows not the area. We also attempt a categorization of these
only the computation of present vehicle positions, but also systems in terms of their domain of operation, the basic
the computation of the effects of each component of the processing techniques used and their major applications.
relative state vector on the vehicle position. This infor- This categorization is summarized in Table 2. More
mation can be maintained and used for estimating future specifically, the fundamental processing techniques from
vehicle positions. The partial derivatives for the parameters Sections 2 and 3 are summarized for each system.
of each object at its current spatial position are collected in Furthermore, their operating domain is classified in terms
the Jacobian matrix as detailed information for interpreting of the nature of features utilized. Thus, we consider
the observed image. The ego-motion dynamics can be operation simply in the spatial domain, spatial features
computed from the actuators of the moving vehicle. The with temporal projection of feature locations, temporal
dynamics of other moving obstacles can be modeled by features (optical flow field) constrained on spatial
stochastic disturbance variables. For simplicity, the motion characteristics (mainly 2 12 D and joint spatio-temporal
of obstacles can be decomposed into translation and rotation operation. Whenever important, we also emphasize the
over their center of gravity. Having this information, we can estimation of vehicle’s state variables. In terms of their
proceed with a prediction of the vehicle and obstacles’ states major applications, we first indicate the status of the
for the next time instant, when new measurements are taken. camera (static or moving) and categorize applications as
If the cycle time of the measurement and control process is in traffic monitoring, automatic lane finding, lane
small and the state of the object is well known, the following, vehicle following or autonomous vehicle
discrepancy between prediction and measurement should be guidance.
small. Therefore, a linear approximation to the non-linear In summary, this paper provides a review of video
equations of the model should be sufficient for capturing the analysis tools and their operation in traffic applications. The
essential inter-relationships of the estimation process [31]. review focuses on two areas, namely automatic lane finding
Moreover, for linear models the recursive state estimation is and obstacle detection. It attempts to compile the differences
efficiently performed through least-square processes. Thus, in the requirements and the constraints of these two areas,
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 373

Table 2
Representative systems and their functionality

System Operating domain Processing techniques Major applications

ACTIONS [81] † Spatio-temporal † Optical flow field † Traffic monitoring


with spatial smoothness
constraints
† Static camera

AUTOSCOPE [99–101] † Spatial and temporal † Background frame † Traffic monitoring


domain independently differencing & interframe
differencing
† Edge detection with † Static camera
spatial and temporal gradients
for object detection

CCATS [69] † Temporal-domain † Background removal † Traffic monitoring


with spatial constraints and model of time signature
for object detection
† Static camera

CRESTA [102] † Temporal-domain † Interframe differencing † Traffic monitoring


differences for object detection
with spatial constraints
† Static camera

IDSC [103] † Spatial domain process † Background frame differencing † Traffic monitoring
with temporal background
updating
† Static camera

MORIO [89] † Spatial domain processing † Optical flow field † Traffic monitoring
with temporal tracking of features
† And 3D object modeling † Static camera

TITAN [47] † Spatial operation with † Background frame differencing † Traffic monitoring
temporal tracking of features
† and Morphological processing † Static camera
for vehicle segmentation

TRIP II [61] Spatial operation † Spatial signature with † Traffic monitoring


with neural nets neural nets for object detection
† Static camera

TULIP [104] † Spatial operation † Thresholding for object detection † Traffic monitoring
† Static camera

TRANSVISION [5] † Spatio-temporal domain † Lane region detection † Traffic monitoring


(activity map)
† and background frame † Static camera
differencing for object detection

VISATRAM [105] † Spatio-temporal domain † Background frame differencing † Traffic monitoring


† Inverse perspective mapping † Static camera
† Spatial differentiation
† Tracking on epipolar plane of
the spatio-temporal cube

ARCADE [106] † Spatial processing † Model-driven approach with † Automatic Lane finding
deformable templates for
edge matching
† Static camera

LANELOK [107] † Spatial processing † Model-driven approach † Automatic lane finding


with deformable templates
for edge matching
† Moving camera
(continued on next page)
374 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

Table 2 (continued)
System Operating domain Processing techniques Major applications

LANA [24] † Spatial processing † Model-driven approach † Automatic lane finding


exploiting features to
compute likelihoud
† DCT features † Moving camera
† Deformable template
models for priors

LOIS [9] † Spatial processing † Model-driven approach † Automatic lane finding


with deformable templates
for edge matching
† Moving camera

CLARK [108] † Spatial processing of images † LOIS for lane detection † Automatic lane
finding and obstacle detection
† Temporal estimation † Color and deformable † Moving camera
of range observations templates for object detection

PVS and AHVS [95] † Spatial processing † Feature-driven approach † Automatic lane finding
Using edge detection † Moving camera

RALPH [109] † Spatial processing † Feature-driven approach † Automatic lane finding


† Stereo images using edge orientation † Moving camera
† Mapping of left to
right image features

ROMA [19] † Spatial operation † Feature-driven approach † Automatic lane finding


with tempo-ral projection
of lane location
Using edge orientation † static camera

SIDE WALK [110] † Spatial processing † Lane-region detection † Automatic lane finding
† via Thresholding for area † Moving camera
segmentation

SCARF [111] † Spatial processing † Model-driven approach † Automatic lane following


† Using stochastic modeling † Moving camera
for image segmentation

CAPC [10] † Spatial-domain lane † Feature-driven approach † Automatic lane following


finding with temporal
projection of lane location
† Temporal estimation † Using edge detection † Moving camera
of vehicle’s state variables and constraints on model
for lane width and lane spacing

ALV [12] † Spatial-domain lane † Lane-region detection † Automatic lane following


and object detection for ALF using color classification
† Temporal estimation † Spatial signature for object † Moving camera
of vehicle’s state variables detection via color segment.

NAVLAB [11] † Spatial-domain lane finding † Lane-region detection † Automatic lane following
for alf via color and
texture classification
† Temporal estimation † Moving camera
of vehicle’s state variables
for 3D road-geometry
estimation and projection
of frame-to-world coordinates

ALVINN and MANIAC [112] † Spatial-processing with neural nets † Recognition of space † Automatic lane following
signature of road through
neural nets
† Moving camera
† Form of temporal matching

Ref. [3] † Spatial-processing † Model-driven approach † Automatic lane following


V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 375

Table 2 (continued)
System Operating domain Processing techniques Major applications

† Using multiresolution † Moving camera


estimation of lane position
and orientation

Ref. [113] † Spatial-processing † Model-driven approach † Automatic lane following


with temporal projection
of lane location
† Temporal estimation of † 3D modeling of † Moving camera
vehicle’s state variables lane markings and borders

Ref. [41] † Spatial detection † Interframe differencing † Vehicle recognition


of 2D line segments and edge detection for 2D and tracking
line segments (pose estimation)
† Temporal projection † Inverse projection of 2D † Moving camera
of segment locations line segments and grouping
in 3D space
via coplanar transforms

Ref [114] † Spatial detection † Spatial signatures of object † Vehicle recognition


of 2D line segments discontinuities and tracking
† Temporal projection † Inverse projection of 2D † Moving camera
of segment locations features in the 3D space
and tracking of 3D features
† Temporal estimation
of vehicle’s state variables
for ego-motion estimation

Ref. [115] † Spatio-temporal processing † Optical flow field † Obstacle avoidance


estimation, constrained
on spatial edges for
object detection
† Temporal prediction † Moving camera
of optical flow
† Temporal estimation
of vehicle’s state variables

BART [116] † Spatio-temporal processing † Feature tracking for † Vehicle following


object detection
† Stereo images † Projection of 3D † Moving camera
coordinates on
2D stereo images
† Temporal estimation
of vehicle’s state
IV (INTELLIGENT VEHICLE) † Spatio-temporal processing † Motion parallax for † Autonomous vehicle
[95] object detection guidance
† Moving camera

PATH Project [28,117] † Spatial-domain for † Model-driven approach † Autonomous vehicle


alf with temporal projection guidance
of lane locations
† Spatial correspondence † Temporal matching † Moving camera
of structure in stereo images for lane detection
for object detection
† Stereo images † Hough transform to estimate
line model for object edges
† Stereo matching of object lines

GOLD system [4] for ARGO † Spatial-domain


and MOB-LAB vehicles processing for alf and
(Prometheus project) object detection
† Temporal projection † Feature-driven approach † Autonomous vehicle
of lane locations guidance
† Temporal estimation † Edge detection constrained † Moving camera
of vehicle’s state variables on lane width in each stereo
image for alf
(continued on next page)
376 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

Table 2 (continued)
System Operating domain Processing techniques Major applications

† Edge detection individually


on each stereo image
For object detection
† Detection of lane and
object through inverse
perspective mapping

VAMORS [31] † Spatio-temporal † Model-driven approach † Autonomous vehicle


(Prometheus project) processing for ALF guidance
and vehicle guidance
† Temporal estimation † 3D object modeling † Moving camera
of vehicle’s state variable and forward perspective
mapping
† State-variable estimation
of road skeletal lines for Alf
† State-variable estimation
of 3D model structure for
object detection

UTA [14] † Spatio-temporal processing † Feature-driven approach † Autonomous vehicle


for ALF and object detection guidance
Based on neural networks † Use of spatio-temporal signature † Moving camera

Ref. [14] † Spatial processing for alf † Feature-driven approach † Autonomous vehicle
and object detection guidance
† Feature tracking in † Color road detection † Moving camera
temporal domain and lane detection via RLS fitting
† Interframe differencing
and edge detection for locating
potential object templates
† Feature tracking via RLS

which lead to different processing techniques on various the actual task is to reconstruct an inherent 3D represen-
levels of information abstraction. Video sensors have tation of the spatial environment from the observed 2D
demonstrated the ability to obtain traffic measurements images.
more efficiently than other conventional sensors. In cases Image processing and analysis tools become essential
emulating conventional sensors, video sensors have been components of automated systems in traffic applications, in
shown to offer the following advantages: competitive cost, order to extract useful information from video sensors. From
non-intrusive sensing, lower maintenance and operation the methods presented in this survey, the big thrust is based
costs, lower installation cost and installation/operation on traditional image processing techniques that employ
during construction. However, because video sensors have either similarity or edge information to detect roads and
the potential of wide area viewing, they are capable of more vehicles and separate them from their environment. This is
than merely emulating conventional sensors. Some evident from the comprehensive overview of systems and
additional measurements needed for adaptive traffic man- their properties in Table 2. Most of these algorithms are
agement are: approach queue length, approach flow profile, rather simple, emphasizing more on high processing speeds
ramp queue length, vehicle deceleration, and automatic than on accuracy of the results. With the rapid progress in
measurement of turning movements. Vision also provides the electronics industry, computational complexity becomes
powerful means for collecting information regarding the less restrictive for real-time applications. Thus, modern
environment and its actual state during autonomous hardware systems allow for more sophisticated and accurate
locomotion. A vision-based guidance system applied to algorithms to be employed that capitalize on the real
outdoor navigation usually involves two main tasks of advantages of machine vision. Throughout this work we
perception, namely finding the road geometry and detecting emphasize on the trend for utilizing more and more
the road obstacles. First of all, the knowledge about road information, in order to accurately match the dynamically
geometry allows a vehicle to follow its route. Subsequently, changing 3D world to the observed image sequences. This
the detection of road obstacles is a necessary and important evolution is graphically depicted in Table 1, which relates
task to avoid other vehicles present on a road. The applications to required pieces of information depending on
complexity of the navigation problem is quite high, since the complexity of each application. The levels of abstraction
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 377

and information analysis range from 2D frame processing to on-line from a fixed sensor (camera) position or while a
full 4D spatio-temporal analysis, allowing the development human operator drives the vehicle. The second class of
of applications from simple lane detection to the extremely adaptable systems can deal with more general scenarios,
complex task of autonomous vehicle guidance. like rainy or extremely hot conditions, where the measure-
A question that arises at this point concerns the future ments may not allow for indisputable inference. Under such
developments in the field. Towards the improvement of the circumstances, ‘illusion’ patterns on the scene may be easily
image-processing stage itself, we can expect morphological misinterpreted as vehicles or roadway structures. In these
operators to be used more widely for both the segmentation cases, the measurements convey a large degree of
of smooth structures and the detection of edges. Such non- uncertainty. Approaches that deal with uncertainty possibly
linear operators provide algorithmic robustness and based on fuzzy-set theory have not been studied in
increased discrimination ability in complex scenes, such transportation applications. Since they provide powerful
as in traffic applications. Furthermore, we can expect means of incorporating possibility and linguistic interpret-
increased use of multiresolution techniques that provide ations expressed by human experts, they are expected to
not only detailed localized features in the scale-space or in dominate in systems that can intelligently adapt to the
the wavelet domains but also abstract overall information, environment.
simulating more closely the human perception. The second requirement for information fusion is still
Most sophisticated image processing approaches adopt in primitive stages. Due to the complexity of a
the underlying assumption that there exists hard evidence in dynamically changing road scene, vision sensors may
the measurements (image) to provide characteristic features not provide enough information to analyze the scene.
for classification and further mapping to certain world Then, it is necessary to combine multisensory data in
models. Recursive estimation schemes (employing Kalman order to detect road characteristics and obstacles
filters) proceed in a probabilistic mode that first derives the efficiently. Representative systems are developed in the
most likely location of road lanes and vehicles and then OMNI project for traffic management that combines
matches the measured data to these estimated states. video sensors with information from vehicles equipped
Nevertheless, they also treat the measurements as hard with GPS/GSM [18] and in the autonomous vehicle of
evidence that provide indisputable information. A few Ref. [118] that combines vision and laser radar systems
approaches deal with difficult visual conditions (shades, with DGPS localization data and maps. The fusion of
shadowing, changing lighting), but they also rely heavily on different sources of information, mainly video and range
the measurements and the degree of discrimination that can data has been considered as a means of providing more
be inferred from them. Since measurements in real-life are evidence in reconstructing the 3D world in robotic
amendable to dispute, it is only natural to expect the applications. Nevertheless, most of these approaches use
development of systems for traffic application that are based data sources in sequential operation, where the video
on so-called soft computing techniques. image guides the focus for attention of the range finder
Real-world traffic applications must account for several or vice versa [33,108,119,120].
aspects that accept diverse interpretation, such as the We can distinguish two kinds of multisensory
weather, light, static or moving objects on the road, noise, cooperation, the active and intelligent sensing (sequential
manmade interventions, etc. It is difficult for a vision sensor operation) and the fusion-oriented sensing (simul-
algorithm to account for all kinds of variations. In fact, it is taneous sensor operation). The former uses the results of 2D
impossible to define thresholds and other needed parameters image analysis to guide range sensing. Pioneer work with
for feature extraction and parameter estimation under all this idea is presented in Refs. [121,122]. For instance,
different situations. Moreover, under diverse conditions, Ref. [121] shows how to recognize 3D objects by first
different algorithms or even different modality systems detecting characteristic points in images and then using
provide quite different results and/or information regarding them to guide the acquisition of 3D information around
the same scene. Thus, two major requirements from these points. The second scheme adopts the strategy of
advanced systems are expected to emerge namely (i) combining both intensity and range information to facilitate
adaptation to environmental changes and (ii) ability of the perception task [33,123]. Along these lines, Ref. [123]
combining (fusing) information from different sources. develops and uses a (combined) range-intensity histogram
The first issue of adaptability can be dealt either with for the purpose of identifying military vehicles. Ref. [124]
systems that can be trained in diverse conditions or with uses a trainable neural network structure for fusing
systems that tolerate uncertainty. The first class comprises information from a video sensor and a range finder, in
systems that can be trained in some conditions and have the order to determine the appropriate turn curvature so as to
ability to interpolate among learnt situations when an keep the vehicle at the middle of the road. Similar strategies
unknown condition is presented. Neural networks form can be used for fusing results of different image processing
powerful structures for capturing knowledge and interpret- techniques on the same set of data. Individual Processing
ation skills by training. In traffic applications there algorithms provide specific partial solutions under given
are enough data for extensive training that can be gathered constraints. The results of such algorithms are not
378 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

independent, revealing a redundancy of information that can [12] M.A. Turk, D.G. Morgenthaler, K.D. Gremban, M. Marra, VITS—a
provide robust and reliable solutions if results of individual vision system for autonomous land vehicle navigation, IEEE
Transactions on Pattern Analysis and Machine Intelligence 10 (3)
algorithms are suitably combined or fused. An approach for (1988).
fusion of algorithms is developed in Ref. [125] using neural [13] S.D. Buluswar, B.A. Draper, Color machine vision for autonomous
networks. Another approach for optimal fusion through vehicles, Engineering Applications of Artificial Intelligence 11
Kalman filtering is developed in Ref. [85]. In all these (1998) 245 –256.
approaches, strict assumptions regarding the coordination of [14] M. Betke, E. Haritaoglu, L.S. Davis, Real-time multiple vehicle
detection and tracking from a moving vehicle, Machine Vision and
the data sources must be used for the formulation of the Applications 12 (2000) 69– 83.
fusion algorithm. The fully cooperative fusion of infor- [15] J. Zhang, H. Nagel, Texture-based segmentation of road images,
mation from different sources that observe the same Proceedings of IEEE, Symposium on Intelligent Vehicles 94, IEEE
dynamic scene simultaneously, where evidence from one Press, Piscataway, NJ, 1994.
[16] J. Badenas, M. Bober, F. Pla, Segmenting traffic scenes from grey
source can support (increase) or reject (reduce) evidence
level and motion information, Pattern Analysis and Applications 4
from the other source, is still in primitive stages and is (2001) 28 –38.
expected to receive increasing attention. Even less estab- [17] K. Kluge, G. Johnson, Statistical characterization of the visual
lished is the fusion of (probabilistic) hard and (fuzzy) soft characteristics of painted lane markings, Proceedings of IEEE
evidence [126,127]. The Theory of Evidence provides tools Intelligent Vehicles 95, Detroit (1995) 488–493.
[18] W. Kasprzak, An iconic classification scheme for video-based traffic
for such a synergetic consideration of different data analysis
sensor tasks, in: W. Skarbek (Ed.), CAIP 2001, Springer, Berlin,
techniques or data acquisition sensors, where evidence 2001, pp. 725 –732.
regarding the same spatio-temporal scene from two or more [19] W. Enkelmann, G. Struck, J. Geisler, ROMA—a system for model-
sources is jointly combined to provide increased assurance based analysis of road markings, Proceedings of IEEE Intelligent
about the classification results. Towards this direction, the Vehicles 95, Detroit (1995) 356–360.
[20] A.Y. Nooralahiyan, H.R. Kirby, Vehicle classification by acoustic
theory of evidence has been presented as a valuable tool for
signature, Mathematical and Computer Modeling 27 (9– 11)
information and/or sensor fusion [128]. (1998).
[21] A. Broggi, Parallel and local feature extraction: a real-time approach
References to road boundary detection, IEEE Transaction on Image Processing 4
(2) (1995) 217–223.
[22] S. Beucher, M. Bilodeau, Road segmentation and obstacle detection
[1] B.D. Stewart, I. Reading, M.S. Thomson, T.D. Binnie, K.W. by a fast watershed transform, Proceedings of IEEE Intelligent
Dickinson, C.L. Wan, Adaptive lane finding in road traffic image Vehicles 94, Paris, France October (1994) 296– 301.
analysis, Proceedings of Seventh International Conference on Road [23] X. Yu, S. Beucher, M. Bilodeu, Road tracking, lane segmentation
Traffic Monitoring and Control, IEE, London (1994). and obstacle recognition by mathematical morphology, Proceedings
[2] M. Xie, L. Trassoudaine, J. Alizon, J. Gallice, Road obstacle of IEEE Intelligent Vehicles 92 (1992) 166– 170.
detection and tracking by an active and intelligent sensing strategy, [24] C. Kreucher, S. Lakshmanan, LANA: a lane extraction algorithm
Machine Vision and Applications 7 (1994) 165 –177. that uses frequency domain features, IEEE Transactions on Robotics
[3] A. Broggi, S. Berte, Vision-based road detection in automotive and Automation 15 (2) (1999).
systems: a real-time expectation-driven approach, Journal of [25] A.L. Yuille, J.M. Coughlan, Fundamental limits of Bayesian
Artificial Intelligence Research 3 (1995) 325 –348. inference: order parameters and phase transitions for road tracking,
[4] M. Bertozzi, A. Broggi, GOLD: a parallel real-time stereo vision IEEE Pattern Analysis and Machine Intelligence 22 (2) (2000)
system for generic obstacle and lane detection, IEEE Transaction 160– 173. February.
Image Processing 7 (1) (1998). [26] Y. Wang, D. Shen, E.K. Teoh, Lane detection using spline model,
[5] C.L. Wan, K.W. Dickinson, T.D. Binnie, A cost-effective image Pattern Recognition Letters 21 (1994) 677–689.
sensor system for transport applications utilising a miniature CMOS [27] K. Kluge, S. Lakshmanan, A deformable-template approach to lane
single chip camera, IFAC Transportation systems, Tianjin, Proceed- detection, IEEE Proceedings of Intelligent Vehicles 95 (1995)
ings (1994). 54 –59.
[6] M. Bertozzi, A. Broggi, Vision-based vehicle guidance, Computer [28] C.J. Taylor, J. Malik, J. Weber, A real time approach to stereopsis
Vision 30 (7) (1997). and lane-finding, IFAC Transportation Systems Chania, Greece
[7] J. Weber, D. Koller, Q.-T. Luong, J. Malik, New results in stereo- (1997).
based automatic vehicle guidance, Proceedings of IEEE Intelligent [29] D. Geman, B. Jedynak, An active testing model for tracking roads in
Vehicles 95, Detroit (1995) 530 –535. satellite images, IEEE Pattern Analysis and Machine Intelligence 18
[8] Q.-T. Luong, J. Weber, D. Koller, J. Malik, An integrated stereo- (1) (1996) 1–14.
based approach to automatic vehicle guidance, Proceedings of the [30] E.D. Dickmanns, Vehicle Guidance By Computer Vision, Papa-
Fifth ICCV, Boston (1995) 12– 20. georgiou Markos (Ed.), Concise Encyclopedia of Traffic and
[9] A. Soumelidis, G. Kovacs, J. Bokor, P. Gaspar, L. Palkovics, L. Transportation Systems.
Gianone, Automatic detection of the lane departure of vehicles, [31] E.D. Dickmanns, B. Mysliwetz, T. Christians, An integrated spatio-
IFAC Transportation Systems Chania, Greece (1997). temporal approach to automatic visual guidance of autonomous
[10] D.J. LeBlanc, G.E. Johnson, P.J.T. Venhovens, G. Gerber, R. vehicles, IEEE Transactions on Systems, Man, and Cybernetics 20
DeSonia, R. Ervin, C.F. Lin, A.G. Ulsoy, T.E. Pilutti, CAPC: a road- (6) (1990).
departure prevention system, IEEE Control Systems December [32] P.G. Michalopoulos, D.P. Panda, Derivation of advanced traffic
(1996). parameters through video imaging, IFAC Transportation Systems
[11] C. Thorpe, M.H. Hebert, T. Kanade, S.A. Shafer, Vision and Chania, Greece (1997).
navigation for the carnegie–mellon navlab, IEEE Transactions on [33] D.G. Morgenthaler, S.J. Hennessy, D. DeMenthon, Range-video
Pattern Analysis and Machine Intelligence 10 (3) (1988). fusion and comparison of inverse perspective algorithms in static
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 379

images, IEEE Transactions on Systems Man and Cybernetics 20 [55] M. Fathy, M.Y. Siyal, A window-based image processing technique
(1990) 1301–1312. for quantitative and qualitative analysis of road traffic parameters,
[34] E.D. Dickmanns, B.D. Mysliwetz, Recursive 3D road and relative IEEE Transactions on Vehicular Technology 47 (4) (1998).
ego-state recognition, IEEE Pattern Analysis and Machine Intelli- [56] D.C. Hogg, G.D. Sullivan, K.D. Baker, D.H. Mott, Recognition of
gence 14 (2) (1992) 199–213. vehicles in traffic scenes using geometric models, IEE, Proceedings
[35] M. Papageorgiou, Video sensors, Papageorgiou Markos (Ed.), of the International Conference on Road Traffic Data Collection,
Concise Encyclopedia of traffic and transportation systems, pp. London (1984) 115–119. London.
610–615. [57] P. Klausmann, K. Kroschel, D. Willersinn, Performance prediction
[36] W. Enkelmann, Obstacle detection by evaluation of optical of vehicle detection algorithms, Pattern Recognition 32 (1999)
flow field from image sequences, Proceedings of European 2063– 2065.
Conference on Computer Vision, Antibes, France 427 (1990) [58] K.W. Dickinson, C.L. Wan, Road traffic monitoring using the TRIP
134–138. II system, IEE Second International Conference on Road Traffic
[37] B. Ross, A practical stereo vision system, Proceedings of Monitoring, Conference Publication Number 299 February (1989)
International Conference on Computer Vision and Pattern Recog- 56–60.
nition, Seattle, WA (1993) 148– 153. [59] G.D. Sullivan, K.D. Baker, A.D. Worrall, C.I. Attwood, P.M.
[38] M. Bertozzi, A. Broggi, S. Castelluccio, A real-time oriented system Remagnino, Model-based vehicle detection and classification using
for vehicle detection, Journal of System Architecture 43 (1997) orthographic approximations, Image and Vision Computing 15
317–325. (1997) 649– 654.
[39] F. Thomanek, E.D. Dickmanns, D. Dickmanns, Multiple object [60] A.D. Houghton, G.S. Hobson, N.L. Seed, R.C. Tozer, Automatic
recognition and scene interpretation for autonomous road vehicle vehicle recognition, IEE Second International Conference on Road
guidance, Proceedings of IEEE Intelligent Vehicles 94, Paris, France Traffic Monitoring, Conference Publication Number 299 February
(1994) 231–236. (1989) 71–78.
[40] G.L. Foresti, V. Murino, C. Regazzoni, Vehicle recognition and [61] C.L. Wan, K.W. Dickinson, Road traffic monitoring using image
tracking from road image sequences, IEEE Transactions on processing—a survey of systems, techniques and applications, IFAC
Vehicular Technology 48 (1) (1999) 301–317. Control Computers, Communications in Transportation, Paris, France
[41] G.L. Foresti, V. Murino, C.S. Regazzoni, G. Vernazza, A distributed (1989).
approach to 3D road scene recognition, IEEE Transactions on [62] S. Mantri, D. Bullock, Analysis of feedforward-backpropagation
Vehicular Technology 43 (2) (1994).
neural networks used in vehicle detection, Transportation Research
[42] H.-H. Nagel, W. Enkelmann, An investigation of smoothness
Part C 3 (3) (1995) 161 –174.
constrains for the estimation of displacement vector fields from
[63] N. Hoose, IMPACT: an image analysis tool for motorway analysis
image sequences, IEEE Transactions on Pattern Analysis and
and surveillance, Traffic Engineering Control Journal (1992)
Machine Intelligence (1986) 565 –593.
140– 147.
[43] T.N. Tan, G.D. Sullivan, K.D. Baker, Model-based location and
[64] N. Paragios, R. Deriche, Geodesic active contours and level sets
recognition of road vehicles, International Journal of Computer
for the detection and tracking of moving objects, IEEE
Vision 27 (1) (1998) 5–25.
Transactions on Pattern Analysis and Machine Intelligence 22
[44] D. Koller, K. Daniilidis, H. Nagel, Model-based object tracking in
(3) (2000) 266–280.
monocular image sequences of road traffic scenes, International
[65] T. Aach, A. Kaup, Bayesian algorithms for adaptive change
Journal Computer Vision 10 (1993) 257–281.
detection in image sequences using Markov random fields, Signal
[45] E.D. Dickmanns, V. Graefe, Dynamic monocular machine vision,
Processing: Image Communication 7 (1995) 147–160.
Machine vision and applications 1 (1988) 223 –240.
[46] Y. Park, Shape-resolving local thresholding for object detection, [66] N. Paragios, G. Tziritas, Adaptive detection and localization of
Pattern Recognition Letters 22 (2001) 883 –890. moving objects in image sequences, Signal Processing: Image
[47] J.M. Blosseville, C. Krafft, F. Lenoir, V. Motyka, S. Beucher, Communication 14 (1999) 277 –296.
TITAN: new traffic measurements by image processing, IFAC [67] J.B. Kim, H.S. Park, M.H. Park, H.J. Kim, A real-time region-based
Transportation systems, Tianjin, Proceedings (1994). motion segmentation using adaptive thresholding and K-means
[48] Y. Won, J. Nam, B.-H. Lee, Image pattern recognition in clustering, in: M. Brooks, D. Corbett, M. Stumptner (Eds.), AI 2001,
natural environment using morphological feature extraction, in: Springer, Berlin, 2001, pp. 213–224.
F.J. Ferri (Ed.), SSPR&SPR 2000, Springer, Berlin, 2001, pp. [68] M. Dubuisson, A. Jain, Contour extraction of moving objects in
806–815. complex outdoor scenes, International Journal of Computer Vision
[49] K. Shimizu, N. Shigehara, Image processing system used cameras 14 (1995) 83 –105.
for vehicle surveillance, IEE Second International Conference on [69] N. Hoose, Computer Image Processing in Traffic Engineering,
Road Traffic Monitoring, Conference Publication Number 299 Taunton Research Studies Press, UK, 1991.
February (1989) 61–65. [70] J. Badenas, J.M. Sanchiz, F. Pla, Motion-based segmentation and
[50] M. Fathy, M.Y. Siyal, An image detection technique based on region tracking in image sequences, Pattern Recognition 34 (2001)
morphological edge detection and background differencing for real- 661– 670.
time traffic analysis, Pattern Recognition Letters 16 (1995) [71] C. Wohler, J.K. Anlauf, Real-time object recognition on image
1321–1330. sequences with the adaptable time delay neural network algorithm—
[51] N. Hoose, Computer vision as a traffic surveillance tool, IFAC applications for autonomous vehicles, Image and Vision Computing
Transportation systems, Tianjin, Proceedings (1994). 19 (2001) 593 –618.
[52] X. Li, Z.-Q. Liu, K.-M. Leung, Detection of vehicles from traffic [72] K.W. Lee, S.W. Ryu, S.J. Lee, K.T. Park, Motion based object
scenes using fuzzy integrals, Pattern Recognition 35 (2002) 967–980. tracking with mobile, Camera Electronics Letters 34 (3) (1998)
[53] H. Moon, R. Chellapa, A. Rosenfeld, Performance analysis of a 256– 258.
simple vehicle detection algorithm, Image and Vision Computing 20 [73] B. Coifman, D. Beymer, P. McLauchlan, J. Malik, A real-time
(2002) 1–13. computer vision system for vehicle tracking and traffic surveillance,
[54] A. Kuehnel, Symmetry based recognition of the vehicle rears, Transportation Research Part C 6 (1998) 271 –288.
Pattern Recognition Letters 12 (1991) 249–258. North Holland, [74] Y.-K. Jung, Y.-S. Ho, A feature-based vehicle tracking system
Amsterdam. in congested traffic video sequences, in: H.-Y. Shum, M. Liao,
380 V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381

S.F. Chang (Eds.), PCM 2001, Springer, Berlin, 2001, pp. [94] H.A. Mallot, H.H. Bulthoff, J.J. Little, S. Bohrer, Inverse perspective
190–197. mapping simplifies optical flow computation and obstacle detection,
[75] A. Techmer, Real-time motion based vehicle segmentation in traffic Biological Cybernetics 64 (1991) 177–185.
lanes, in: B. Radig, S. Florczyk (Eds.), DAGM 2001, Springer, [95] S. Tsugawa, Vision-based vehicles in Japan: machine vision systems
Berlin, 2001, pp. 202– 207. and driving control systems, IEEE Transactions on Industrial
[76] R. Fraile, S.J. Maybank, Building 3D models of vehicles for Electronics 41 (4) (1994) 398 –405.
computer vision, in: D. Huijsmans, A. Smeulders (Eds.), Visual’99, [96] D. Pomerleau, T. Jocchem, Rapidly adaptive machine vision for
Springer, Berlin, 1999, pp. 697–702. automated vehicle steering, IEEE Machine Vision April (1996)
[77] B. Gloyer, H.K. Aghajan, K.-Y. Siu, T. Kailath, Video-based 19 –27.
monitoring system using recursive vehicle tracking, Proceedings of [97] L. Matthies, Stereo vision for planetary rovers: stochastic modeling
IS&T/SPIE Symposium on Electronic Image: Science & Technol- to near real-time implementation, International Journal of Computer
ogy—Image and Video Processing (1995). Vision 8 (1992) 71–91.
[78] B.K.P. Horn, B.G. Schunck, Determining optical flow, Artificial [98] J. Schick, E.D. Dickmanns, Simultaneous estimation of 3D shape
Intelligence 17 (1981) 185 –203. and motion of objects by computer vision, Proceedings of IEEE
[79] W. Enkelmann, R. Kories, H.-H. Nagel, G. Zimmermann, An Workshop on Visual Motion, Princeton, NJ October (1991)
experimental investigation of estimation approaches for optical flow 256 –261.
fields, in: W.N. Martin, J.K. Aggarwal (Eds.), Motion Under- [99] S. Mammar, J.M. Blosseville, Traffic variables recovery, IFAC
standing: Robot and Human Vision, Kluwer, Dordrecht, 1987, pp. Transportation Systems, Tianjin, Proceedings (1994).
189–226. [100] P.G. Michalopoulos, R.D. Jacobson, C.A. Anderson, T.B.
[80] R. Kories, H.-H. Nagel, G. Zimmermann, Motion detection in image DeBruycker, Automatic Incident Detection Through Video
sequences: an evaluation of feature detectors, Proceedings of Image Processing, Traffic Engineering þ Control February
International Joint Conference on Pattern Recognition, Montreal (1993) 66 –75.
(1984) 778–780. [101] P.G. Michalopoulos, Vehicle detection video through image
[81] W. Enkelmann, Interpretation of traffic scenes by evaluation of processing: the autoscope system, IEEE Transactions on Vehicular
optical flow fields from image sequences, IFAC Control Computers, Technology 40 (1) (1991).
Communications in Transportation, Paris, France (1989). [102] J.G. Postaire, P. Stelmaszyk, P. Bonnet, A visual surveillance system
[82] H.H. Nagel, Constraints for the estimation of displacement vector for traffic collision avoidance control, IFAC Transportation
fields from image sequences, Proceedings of Intelligent Joint Symposium International Federation of Automatic Control, Laxen-
Conference on Artificial Intelligence, Karlsruhe/FRG August burg, Austria (1986).
(1983) 945–951. [103] S. Takaba, et al., A traffic flow measuring system using a solid state
[83] A. Giachetti, M. Campani, V. Torre, The use of optical flow for road sensor, Proceedings of IEE Conference on Road Traffic Data
navigation, IEEE Transactions on Robotics and Automation 14 (1) Collection, London, UK (1984).
(1998). [104] A. Rourke, M.G.H. Bell, Applications of low cost image processing
[84] R. Kories, G. Zimmermann, Workshop on Motion: Representation technology in transport, Proceedings of the World Conference on
and Analysis, Kiawah Island Resort, Charleston/SC, Workshop on Transport Research, Japan (1989) 169 –183.
Motion: Representation and Analysis, Kiawah Island Resort, [105] Z. Zhu, G. Xu, B. Yang, D. Shi, X. Lin, VISATRAM: a real-time
Charleston/SC, May, IEEE Computer Society Press, 1986, pp. vision system for automatic traffic monitoring, Image and Vision
101–106. Computing 18 (2000) 781– 794.
[85] S.A. Velastin, J.H. Yin, M.A. Vicencio-Silva, A.C. Davies, R.E. [106] K. Kluge, Extracting road curvature and orientation from image
Allsop, A. Penn, Image processing for on-line analysis of crowds in edge points without perceptual grouping into features, Proceed-
public areas, IFAC Transportation systems, Tianjin, Proceedings ings of IEEE Intelligent Vehicles Symposium’94 (1994)
(1994). 109 –111.
[86] W. Enkelmann, Investigations of multigrid algorithms for the [107] S.K. Kenue, LANELOK: an algorithm for extending the lane sensing
estimation of optical flow fields in image sequences, Computer operation range to 100 feet, Procedings of SPIE-Mobile Robots V
Vision, Graphics, and Image Processing (1988) 150 –177. 1388 (1991) 222–233.
[87] S. Carlsson, J.O. Eklundh, Object detection using model-based [108] M. Beauvais, S. Lakshmanan, CLARK: a heterogeneous sensor
prediction and motion parallax, Proceedings of Europe Con- fusion method for finding lanes and obstacles, Image and Vision
ference on Computer Vision, Antibes, France 427 (1990) Computing 18 (5) (2000) 397–413.
297–306. [109] D. Pomerleau, Ralph: rapidly adapting lateral position handler,
[88] A.R. Bruss, B.K.P. Horn, Passive navigation, Computer Vision, Proceedings of IEEE Intelligent Vehicles’95, Detroit, MI (1995)
Graphics, and Image Processing 21 (1983) 3–20. 506 –511.
[89] L. Dreschler, H.-H. Nagel, Volumetric model and 3D-trajectory of a [110] Y. Goto, K. Matsuzaki, I. Kweon, T. Obatake, CMU sidewalk
moving car derived from monocular TV—frame sequences of a navigation system: a blackboard outdoor navigation system using
street scene, Computer Vision, Graphics, and Image Processing 20 sensor fusion with color-range images, Proceedings of First Joint
(1982) 199–228. Conference ACM/IEEE November (1986).
[90] M. Irani, P. Anandan, A unified approach to moving object detection [111] J.D. Crisman, C.E. Thorpe, SCARF: a color vision system that tracks
in 2D and 3D scenes, IEEE Transactions on Pattern Analysis and roads and intersections, IEEE Transactions on Robotics and
Machine Intelligence 20 (6) (1998) 577 –589. Automation 9 (1) (1993).
[91] M. Irani, B. Rousso, S. Peleg, Recovery of egomotion using region [112] T.M. Jochem, D.A. Pomereau, C.E. Thorpe, Maniac, a next
alignment, IEEE Transactions on Pattern Analysis and Machine generation neurally based autonomous road follower, Proceedings
Intelligence 19 (3) (1997) 268– 272. of International Conference on Intelligent Autonomous Systems:
[92] A. Nagai, Y. Kuno, Y. Shirai, Detection of moving objects against a IAS-3, Pittsburgh, PA February (1993).
changing background, Systems and Computer in Japan 30 (11) [113] H.H. Nagel, F. Heimes, K. Fleischer, M. Haag, H. Leuck, S.
(1999) 107–116. Noltemeier, Quantitative comparison between trajectory estimates
[93] S.M. Smith, J.M. Brady, A scene segmenter; visual tracking of obtained from a binocular camera setup within a moving road
moving vehicles, Engineering Applications of Artificial Intelligence vehicle and from the outside by a stationary monocular camera,
7 (2) (1994) 191–204. Image and Vision Computing 18 (2000) 435 –444.
V. Kastrinaki et al. / Image and Vision Computing 21 (2003) 359–381 381

[114] Q. Zheng, R. Chellappa, Automatic feature point extraction and [121] M.J. Magee, B.A. Boyter, C.H. Chien, J.K. Aggarwal, Experiments
tracking in image sequences for arbitrary camera motion, Inter- in intensity guided range sensing recognition of three-dimensional
national Journal of Computer Vision 15 (1995) 31– 76. objects, IEEE Pattern Analysis and Machine Intelligence 7 (1985)
[115] D. Coombs, M. Herman, T. Hong, M. Nashman, Real-time obstacle 629– 637.
avoidance using central flow divergence, and peripheral flow, IEEE [122] P.J. Burt, Smart sensing within a pyramid vision machine,
Transactions on Robotics and Automation 14 (1) (1998) 49 –59. Proceedings of IEEE 76 (1988) 1006–1115.
[116] N. Kehtarnavaz, N.C. Griswold, J.S. Lee, Visual control of an [123] A. Jain, T. Newman, M. Goulish, Range-intensity histogram for
autonomous vehicle (BART)—the vehicle-following problem, IEEE segmenting LADAR images, Pattern Recognition Letters 13 (1992)
Transactions on Vehicular Technology 40 (3) (1991) 654–662. 41–56.
[117] S.E. Shladover, D.A. Desoer, J.K. Hedrick, M. Tomizuka, J. [124] D.A. Pomerleau, Advances in Neural Information Processing,
Walrand, W.-B. Zhang, D.H. McMahon, H. Peng, S. Sheikholeslam, Advances in Neural Information Processing, 1, Morgan-Kaufman,
N. McKeown, Automatic vehicle control developments in the PATH San Francisco, CA, 1989, pp. 305–313.
program, IEEE Transactions on Vehicular Technology 40 (1) (1991) [125] U. Handmann, T. Kalinke, C. Tzomakas, M. Werner, W. Seelen, An
114–129. image processing systems for driver assistance, Image and Vision
[118] S. Kato, S. Tsugawa, Cooperative driving of autonomous vehicles Computing (18) (2000) 367 –376.
based on localization, inter-vehicle communications and vision [126] H. Schodel, Utilization of fuzzy techniques in intelligent sensors,
systems, JSAE Review 22 (2001) 503 –509. Fuzzy Sets and Systems 63 (1994) 271– 292.
[119] M.M. Trivedi, M.A. Abidi, R.O. Eason, R.C. Gonzalez, Developing [127] F. Sandakly, G. Giraudon, 3D scene interpretation for a mobile robot,
robotics systems with multiple sensors, IEEE Transactions on Robotics and Autonomous Systems 21 (1997) 399–414.
Systems, Man, and Cybernetics 20 (6) (1990) 1285–1300. [128] Y.-G. Wu, J.-Y. Yang, K. Liu, Obstacle detection and
[120] F. Chavand, E. Colle, Y. Chekhan, E.C. N’zi, 3D measurements environment modeling based on multisensor fusion for robot
using a video and a range finder, IEEE Transactions on Instrumenta- navigation, Artificial Intelligence in Engineering 10 (1996)
tion and Measurement 46 (6) (1997) 1229–1235. 323–333.

You might also like