Detection of Vehicles
Detection of Vehicles
net/publication/3427818
CITATIONS READS
800 8,955
4 authors, including:
All content following this page was uploaded by Osama Masoud on 11 April 2014.
Abstract—This paper presents algorithms for vision-based de- The paper starts by describing an overview of related work,
tection and classification of vehicles in monocular image sequences then a description of our approach is presented, a camera cali-
of traffic scenes recorded by a stationary camera. Processing is bration tool developed by our group is described, experimental
done at three levels: raw images, region level, and vehicle level.
Vehicles are modeled as rectangular patches with certain dynamic results are presented, and finally conclusions are drawn.
behavior. The proposed method is based on the establishment
of correspondences between regions and vehicles, as the vehicles II. RELATED WORK
move through the image sequence. Experimental results from
highway scenes are provided which demonstrate the effectiveness Tracking moving vehicles in video streams has been an
of the method. We also briefly describe an interactive camera active area of research in computer vision. In [2], a real
calibration tool that we have developed for recovering the camera time system for measuring traffic parameters is described. It
parameters using features in the image selected by the user. uses a feature-based method along with occlusion reasoning
Index Terms—Camera calibration, vehicle classification, vehicle for tracking vehicles in congested traffic scenes. In order to
detection, vehicle tracking. handle occlusions, instead of tracking entire vehicles, vehicle
subfeatures are tracked. This approach however is very com-
I. INTRODUCTION putationally expensive. In [6], a moving object recognition
method is described that uses an adaptive background subtrac-
3-D polyhedral model to classify vehicles in a traffic sequence. in real time, be insensitive to lighting and weather conditions,
The system uses a generic vehicle model based on the shape of and require a minimal amount of initialization. In [6], a seg-
a typical sedan. The underlying assumption being that in typical mentation approach using adaptive background subtraction is
traffic scenes, cars are more common than trucks or other types described. Kalman filtering is used to predict the background
of vehicles. The University of Reading has done extensive work image during the next update interval. The error between the
in three-dimensional tracking of vehicles and classification prediction and the actual background image is used to update
of the tracked vehicles using 3-D model matching methods. the Kalman filter-state variables. This method has the advantage
Baker and Sullivan [1] and Sullivan [13] utilized knowledge that it automatically adapts to changes in lighting and weather
of the camera calibration and that vehicles move on a plane in conditions. However, it needs to be initialized with an image
their 3-D model-based tracking. Three-dimensional wireframe of the background without any vehicles present. In [5], a prob-
models of various types of vehicles (e.g., sedans, hatchbacks, abilistic approach to segmentation is described. They use the
wagons, etc.) were developed. Projections of these models were expectation maximization (EM) method to classify each pixel
then compared to features in the image. In [15], this approach as moving object, shadow or background. Another approach
was extended so that the image features act as forces on the to segmentation is time differencing, (used in [9]) which con-
model. This reduced the number of iterations and improved sists of subtracting successive frames (or frames a fixed interval
performance. They also parameterized models as deformable apart). This method is also insensitive to lighting conditions and
templates and used principal component analysis to reduce has the further advantage of not requiring initialization with a
the number of parameters. Sullivan et al. [14] developed a background image. However, this method produces many small
simplified version of the model-based tracking approach using regions that can be difficult to separate from noise.
orthographic approximations to attain real-time performance. We use a self-adaptive background subtraction method for
segmentation. This is similar in principle to the method de-
III. OVERVIEW scribed in [5]. However we use a much simpler and more robust
method for updating the background. In addition, our method
The system we propose consists of six stages. automatically extracts the background from a video sequence
1) Segmentation: in this stage, the vehicles are separated and so no manual initialization is required.
from the background in the scene. Our segmentation technique consists of three tasks:
2) Region Tracking: the result of the segmentation step is • Segmentation;
a collection of connected regions. This stage tracks re- • Background update;
gions over a sequence of images using a spatial matching • Background extraction.
method.
3) Recovery of Vehicle Parameters: to enable accurate clas- A. Segmentation
sification of the vehicles, the vehicle parameters such as
length, width, and height need to be recovered from the For each frame of the video sequence (referred to as current
2-D projections of the vehicles. This stage uses informa- image), we take the difference between the current image and
tion about the camera’s location and makes use of the fact the current background giving the difference image. The differ-
that in a traffic scene, all motion is along the ground plane. ence image is thresholded to give a binary object mask. The ob-
4) Vehicle Identification: our system assumes that a vehicle ject mask is a binary image such that all pixels that correspond
may be made up of multiple regions. This stage groups to foreground objects have the value 1, and all the other pixels
the tracked regions from the previous stage into vehicles. are set to 0.
5) Vehicle Tracking: due to occlusions, noise, etc., there is
not necessarily a one-to-one correspondence between re- B. Adaptive Background Update
gions and vehicles, i.e., a vehicle may consist of multiple The basic principle of our method is to modify the back-
regions and a single region might correspond to multiple ground image that is subtracted from the current image (called
vehicles. To enable tracking of vehicles despite these dif- the current background) so that it looks similar to the back-
ficulties, our system does tracking at two levels—region ground in the current video frame. We update the background
level and the vehicle level. by taking a weighted average of the current background and the
6) Vehicle Classification: after vehicles have been detected current frame of the video sequence. However, the current image
and tracked, they are classified. also contains foreground objects. Therefore, before we do the
The following sections describe each of these stages in more update we need to classify the pixels as foreground and back-
detail. ground and then use only the background pixels from the current
image to modify the current background. Otherwise, the back-
ground image would be polluted with the foreground objects.
IV. SEGMENTATION
The binary object mask is used to distinguish the foreground
The first step in detecting vehicles is segmenting the image pixels from the background pixels. The object mask is used as a
to separate the vehicles from the background. There are var- gating function that decides which image to sample for updating
ious approaches to this, with varying degrees of effectiveness. the background. At those locations where the mask is 0 (corre-
To be useful, the segmentation method needs to accurately sep- sponding to the background pixels), the current image is sam-
arate vehicles from the background, be fast enough to operate pled. At those locations where the mask is 1—corresponding to
GUPTE et al.: DETECTION AND CLASSIFICATION OF VEHICLES 39
foreground pixels—the current background is sampled. The re- result in poor segmentation. Therefore we need a way to up-
sult of this is what we call the instantaneous background. This date the threshold as the current background changes. The dif-
operation described above can be shown in diagrammatic form ference image is used to update the threshold. In our images, a
as in Fig. 1. major portion of the image consists of the background. There-
The current background is set to be the weighted average of fore the difference image would consist of a large number of
the instantaneous and the current background pixels having low values, and a small number of pixels having
high values. We use this observation in deciding the threshold.
The histogram of the difference image will have high values for
CB IB CB (1)
low pixel intensities and low values for the higher pixel inten-
sities. To set the threshold, we need to look for a dip in the
The weights assigned to the current and instantaneous back-
histogram that occurs to the right of the peak. Starting from
ground affect the update speed. We want the update speed to be
the pixel value corresponding to the peak of the histogram, we
fast enough so that changes in illumination are captured quickly,
search toward increasing pixel intensities for a location on the
but slow enough so that momentary changes (due to, say the
histogram that has a value significantly lower than the peak
AGC of the camera being activated) do not persist for an unduly
value (we use 10% of the peak value). The corresponding pixel
long amount of time. The weight has been empirically deter-
value is used as the new threshold.
mined to be 0.1. We have found that this gives the best tradeoff in
terms of update speed and insensitivity to momentary changes. D. Automatic Background Extraction
In video sequences of highway traffic it might be impossible
C. Dynamic Threshold Update to acquire an image of the background. A method that can auto-
After subtracting the current image from the current back- matically extract the background from a sequence of video im-
ground, the resultant difference image has to be thresholded ages would be very useful. Here we assume that the background
to get the binary object mask. Since the background changes is stationary and any object that has significant motion is consid-
dynamically, a static threshold cannot be used to compute the ered part of the foreground. The method we propose works with
object mask. Moreover, since the object mask itself is used in video images and gradually builds up the background image
updating the current background, a poorly set threshold would over time.
40 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 3, NO. 1, MARCH 2002
(a) (b)
(c) (d)
(e)
Fig. 2. Background adaptation to changes in lighting conditions. (a) Initial background provided to the algorithm. (b) Image of the scene at dusk. (c) Current
background after 4 s. (d) Current background after 6 s. (e) Current background after 8 s.
The background and threshold updating described above is (b) shows the same image at dusk. The images (c), (d), and (e)
done at periodic update intervals. To extract the background, we show how the background adaptation algorithm updates the
compute a binary motion mask by subtracting images from two background so that it closely matches the background of image
successive update intervals. All pixels that have moved between (b). Fig. 3(a)–(e) demonstrates how the algorithm copes with
these update intervals are considered part of the foreground. To changes in camera orientation.
compute the motion mask for frame , the binary object
masks from update interval and update interval V. REGION TRACKING
are used. The motion mask is computed as
A vision-based traffic monitoring system needs to be able to
track vehicles through the video sequence. Tracking eliminates
(2) multiple counts in vehicle counting applications. Moreover, the
tracking information can also be used to derive other useful in-
This motion mask is now used as the gating function to com- formation like vehicle velocities. In applications like vehicle
pute the instantaneous background as described above. Over a classification, the tracking information can also be used to refine
sequence of frames the current background looks similar to the the vehicle type and correct for errors caused due to occlusions.
background in the current image. The output of the segmentation step is a binary object mask.
We perform region extraction on this mask. In the region
E. Self-Adaptive Background Subtraction Results tracking step, we want to associate regions in frame with the
Fig. 2(a)–(e) shows some images that demonstrate the regions in frame . This allows us to compute the velocity
effectiveness of our self-adaptive background subtraction of the region as it moves across the image and also helps in the
method. The image (a) was taken during the day. This was vehicle tracking stage. There are certain problems that need
given as the initial background to the algorithm. The image to be handled for reliable and robust region tracking. When
GUPTE et al.: DETECTION AND CLASSIFICATION OF VEHICLES 41
(a) (b)
(c) (d)
(e)
Fig. 3. Background adaptation to changes in camera orientation. (a) Initial background provided to the algorithm. (b) Image of the scene after camera orientation
has changed. (c) Current background after 4 s. (d) Current background after 6 s. (e) Current background after 8 s.
considering the regions in frame and frame the following • A new region might appear. Some possible reasons for this
problems might occur: include the following.
• A region might disappear. Some of the reasons why this • A new vehicle enters the field of view of the camera
may happen are as follows. so a new region corresponding to this vehicle ap-
• The vehicle that corresponded to this region is no pears.
longer visible in the image, and, hence, its region • For the same reason as that mentioned above, as the
disappears. pattern of reflections from a vehicle changes, its in-
• Vehicles are shiny metallic objects. The pattern of tensity might now rise above the threshold used for
reflection seen by the camera changes as the vehicles segmentation, and the region corresponding to this
move across the scene. The segmentation process vehicle is now detected.
uses thresholding, which is prone to noise. At some • A previously occluded vehicle might no longer be
point in the scene, the pattern of reflection from a occluded.
vehicle might fall below the threshold and, hence, • A single region in frame might split into multiple regions
those pixels will not be considered as foreground. in frame because of the following reasons.
Therefore the region might disappear even though • Two or more vehicles might have been passing close
the vehicle is still visible. enough to each other that they occlude (or are oc-
• A vehicle might become occluded by some part of cluded) and, hence, are detected as one connected
the background or another vehicle. region. As these vehicles move apart and are not oc-
42 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 3, NO. 1, MARCH 2002
(a) (b)
(c) (d)
(e)
Fig. 4. Detected regions and their resultant association graph. (a) Frame i-1. (b) Frame i. (c) Previous regions P. (d) Current regions C. (e) The association graph
shows that P has split into C and C .
cluded, the region corresponding to these vehicles the previous frame, and all the vertices in the other partition
might split up into multiple regions. correspond to regions in the current frame, . An edge be-
• Due to noise and errors during the thresholding tween vertices and indicates that the previous region is
process, a single vehicle that was detected as a associated with the current region . A weight is assigned
single region might be detected as multiple regions to each edge . The weight of edge is calculated as
as it moves across the image.
• Multiple regions may merge. Some reasons why this may (3)
occur include the following.
• Multiple vehicles (each of which were detected as i.e., the weight of edge is the area of overlap between region
one or more regions) might occlude each other and and region . The weight of the graph is defined as
during segmentation get detected as a single region.
• Again, due to errors in thresholding, a vehicle that (4)
was detected as multiple regions might later be de-
tected as a single region.
The region tracking method needs to be able to robustly handle A. Building the Association Graph
these situations and work reliably even in the presence of these The region extraction step is done for each frame resulting in
difficulties. We form an association graph between the regions new regions being detected. These become the current regions,
from the previous frame and the regions in the current frame. . The current regions from frame become the previous re-
We model the region tracking problem as a problem of finding gions in frame . To add the edges in this graph, a score
the maximal weight graph. The association graph is a bipartite is computed between each previous region and each current
graph where each vertex corresponds to a region. All the ver- region . The score is a pair of values . It is a
tices in one partition of this graph correspond to regions from measure of how closely a previous region matches a current
GUPTE et al.: DETECTION AND CLASSIFICATION OF VEHICLES 43
(5)
(6)
This makes the score independent of the actual area of both Fig. 6. Camera calibration tool.
regions and .
terms of weight), however, this does not have an unduly large
B. Adding Edges effect on the tracking and is good enough in most cases.
Each previous region is compared with each current region
and the area of intersection between and is computed. VI. RECOVERY OF VEHICLE PARAMETERS
The current region that has the maximum value for To be able to detect and classify vehicles, the location,
with is determined. An edge is added between and . length, width and velocity of the regions (which are vehicle
Similarly, for each region , the previous region that fragments) needs to be recovered from the image. Knowledge
has the maximum value for with is determined. An of camera calibration parameters is necessary in estimating
edge is added between vertices and . these attributes. Accurate calibration can therefore significantly
The rationale for having a two-part score is that it allows us to impact the computation of vehicle velocities and classification.
handle region splits and merges correctly. Moreover, by always Calibration parameters are usually difficult to obtain from the
selecting the region that has the maximum value scene as they are rarely measured when the camera is installed.
for we do not need to set any arbitrary thresholds Moreover, since the cameras are installed approximately 20–30
to determine if an edge should be added between two regions. feet above the ground, it is usually difficult to measure certain
This also ensures that the resultant association graph generated quantities such as pan and tilt that can help in computing
is a maximal weight graph. An example is shown in Fig. 4. the calibration parameters. Therefore, it becomes difficult
to calibrate after the camera has been installed. One way to
C. Resolving Conflicts compute the camera parameters is to use known facts about the
When the edges are added to the association graph as de- scene. For example, we know that the road, for the most part,
scribed above, we might possibly get a graph of the form shown is restricted to a plane. We also know that the lane markings
in Fig. 5. In this case, can be associated with or , or are parallel and lengths of markings as well as distances
both and (similarly, for ). To be able to use this graph between those markings are precisely specified. Once the
for tracking we need to choose one assignment from among camera parameters are computed, any point on the image can
these. We enforce the following constraint on the association be back-projected onto the road. Therefore, we have a way of
graph—in every connected component of the graph only one finding the distance between any two points on the road by
vertex may have degree greater than 1. A graph that meets this knowing their image locations.
constraint is considered a conflict-free graph. A connected com- We have developed a camera calibration tool specifically for
ponent that does not meet this constraint is considered a con- the kind of traffic scenes that we are frequently asked to ana-
flict component. We need to remove edges from a conflict com- lyze. This interactive tool has a user interface that allows the
ponent so that it becomes conflict free. different conflict user to point to some locations on the image (e.g., the endpoints
free components can be generated from a conflict component of a lane divider line whose length is known). The system can
having vertices. One possibility is to generate each of the then compute the calibration parameters automatically. The pro-
connected components and select the one with the maximum posed system is easy to use and intuitive to operate, using ob-
weight. This is the method used in [10]. However, this can be vious landmarks, such as lane markings, and familiar tools, such
computationally quite expensive. We use a different method for as a line-drawing tool. The graphical user interface (GUI) al-
resolving conflicts. For each conflict component we add edges lows the user to first open an image of the scene. The user is
in increasing order of weight if and only if adding the edge does then able to draw different lines and optionally assign lengths
not violate the constraint mentioned above. If adding an edge to those lines. The user may first draw lines that represent lane
will violate the constraint, we simply ignore the edge and separation (solid line, Fig. 6). They may then draw lines to des-
select the next one. The resulting graph may be suboptimal (in ignate the width of the lanes (dashed line, Fig. 6). The user may
44 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 3, NO. 1, MARCH 2002
Fig. 7. (a) Previous regions 5 and 10 merge to form current region 14. (b) Final outcome where all these regions merge to create vehicle 4.
also designate known lengths in conjunction with the lane sep- location, velocity and dimensions of each vehicle based on this
aration marks. An additional feature of the interface is that it association graph. The location and dimensions of a vehicle are
allows the user to define traffic lanes in the video, and also the computed as the bounding box of all its constituent blobs. The
direction of traffic in these lanes. Also, special hot spots can velocity is computed as the weighted average of the velocities
be indicated on the image, such as the location where we want of its constituent blobs. The weight for a region vehicle
to compute vehicles’ speeds. The actual computation of camera is calculated as
calibration from the information given by the GUI is fully out-
lined in [12]. This GUI proved to be much more intuitive than (7)
the methods we previously used. The only real difficulty arose
with respect to accuracy in determining distances in the direc- is the area of overlap between vehicle and region
tion of the road. Some of these inaccuracies arise because the
. The vehicle’s velocity is used to predict its location in the
markings on the road themselves are not precise. Another part of
next frame.
the inaccuracy depends on the user’s ability to mark endpoints in
A region can be in one of five possible states. The vehicle
the image. In general, however, in spite of the inaccuracies dis-
covered, this method of calibration proved to be much quicker tracker performs different actions depending on the state of each
than those previously used, more accurate, and more adaptable region that is owned by a vehicle. The states and corresponding
to generic scenes. actions performed by the tracker are:
1) Update: a previous region matches exactly one current
VII. VEHICLE IDENTIFICATION region . The tracker simply updates the ownership re-
lation so that the vehicle that owned now owns ;
A vehicle is made up of (possibly multiple) regions. The ve-
2) Merge: regions merge into a single region .
hicle identification stage groups regions together to form vehi-
The area of overlap between each vehicle assigned to
cles. New regions that do not belong to any vehicle are con-
is computed with , if the overlap is above a
sidered orphan regions. A vehicle is modeled as a rectangular
minimum threshold, is assigned to that vehicle;
patch whose dimensions depend on the dimensions of its con-
3) Split: region splits into regions . Again the
stituent regions. Thresholds are set for the minimum and max-
area of overlap between each vehicle is computed
imum sizes of vehicles based on typical dimensions of vehicles.
with . If it is greater than a minimum value, the
A new vehicle is created when an orphan region of sufficient
region is assigned to ;
size is tracked over a sequence of a number of frames (three in
4) Disappear: a region is not matched by any
our case).
. The region is simply removed from all the vehicles that
owned it. If a vehicle loses all its regions, it becomes a
VIII. VEHICLE TRACKING phantom vehicle. Sometimes a vehicle may become tem-
Our vehicle model is based on the assumption that the scene porarily occluded and then later reappear. Phantoms pre-
has a flat ground. A vehicle is modeled as a rectangular patch vent such a nonoccluded vehicle from being considered a
whose dimensions depend on its location in the image. The di- new vehicle. A phantom is kept around for a few frames
mensions are equal to the projection of the vehicle at the corre- (3), and if it cannot be resurrected within this time, it is
sponding location in the scene. removed;
A vehicle consists of one or more regions, and a region might 5) Appear: a region does not match any . We
be owned by zero or more vehicles. The region tracking stage check with the phantom vehicles. If a phantom vehicle
produces a conflict-free association graph that describes the re- overlaps new region(s) of sufficient area, it is resurrected.
lations between regions from the previous frame and regions If the region does not belong to a phantom vehicle and is
from the current frame. The vehicle tracking stage updates the of sufficient size, a new vehicle is created.
GUPTE et al.: DETECTION AND CLASSIFICATION OF VEHICLES 45
Fig. 8. (a) Previous region 1 splits into current regions 3 and 4. (b) The tracker correctly associates both the new regions with the same vehicle.
Fig. 9. Snapshots of different car/truck classification results. Cars are shown surrounded by their bounding box; noncars are marked by drawing the diagonals of
the bounding box. Notice how in (g), the two vehicles were grouped and considered a truck because of large combined size. In (h), the car was misclassified as a
truck due to large size.
We are also currently exploring various techniques to deal with To enable classification into a larger number of categories, we
the problems mentioned above. intend to use a nonrigid model-based approach to classify vehi-
cles. Parameterized 3-D models of exemplars of each category
XI. CONCLUSIONS AND FUTURE WORK will be used. Given the camera calibration a 2-D projection of
the model will be formed from this viewpoint. This projection
We have presented a model-based vehicle tracking and clas- will be compared with the vehicles in the image to determine
sification system capable of working robustly under most cir- the class of the vehicle.
cumstances. The system is general enough to be capable of de-
tecting, tracking and classifying vehicles while requiring only
minimal scene-specific knowledge. In addition to the vehicle REFERENCES
category, the system provides location and velocity information
[1] K. D. Baker and G. D. Sullivan, “Performance assessment of model-
for each vehicle as long as it is visible. Initial experimental re- based tracking,” in Proc. IEEE Workshop Applications of Computer Vi-
sults from highway scenes were presented. sion, Palm Springs, CA, 1992, pp. 28–35.
GUPTE et al.: DETECTION AND CLASSIFICATION OF VEHICLES 47
[2] D. Beymer, P. McLauchlan, B. Coifman, and J. Malik, “A real-time com- Osama Masoud was born in Riyadh, Saudi Arabia,
puter vision system for measuring traffic parameters,” in Proc. IEEE in 1971. He received the B.S. and M.S. degrees in
Conf. Computer Vision and Pattern Recognition, Puerto Rico, June 1997, computer science from King Fahd University of
pp. 496–501. Petroleum and Minerals (KFUPM), Dhahran, Saudi
[3] M. Burden and M. Bell, “Vehicle classification using stereo vision,” in Arabia, in 1992 and 1994, respectively, and the Ph.D.
Proc. 6th Int. Conf. Image Processing and Its Applications, vol. 2, 1997, degree in computer science from the University of
pp. 881–887. Minnesota, Minneapolis, in 2000.
[4] A. De La Escalera, L. E. Moreno, M. A. Salichs, and J. M. Armingol, He is currently a Postdoctoral Associate in the
“Road traffic sign detection and classification,” IEEE Trans. Ind. Elec- Department of Computer Science and Engineering
tron., vol. 44, pp. 848–859, Dec. 1997. at the University of Minnesota. His research interests
[5] N. Friedman and S. Russell, “Image segmentation in video sequences,” include computer vision, robotics, transportation
in Proc. 13th Conf. Uncertainty in Artificial Intelligence, Providence, applications, and computer graphics.
RI, 1997. Dr. Masoud is the recipient of a Research Contribution Award from the Uni-
[6] K. P. Karmann and A. von Brandt, “Moving object recognition using an versity of Minnesota, the Rosemount Instrumentation Award from Rosemount
adaptive background memory,” in Proc. Time-Varying Image Processing Inc., Chanhassen, MN, and the Matt Huber Award for Excellence in Transporta-
and Moving Object Recognition, vol. 2, V. Capellini, Ed., 1990. tion Research.
[7] D. Koller, “Moving object recognition and classification based on recur-
sive shape parameter estimation,” in Proc. 12th Israel Conf. Artificial
Intelligence, Computer Vision, Dec. 27–28, 1993.
Robert F. K. Martin was born in Appleton, WI, in
[8] D. Koller, J. Weber, T. Huang, G. Osawara, B. Rao, and S. Russel, “To-
1972. He received the B.E.E. degree from the Univer-
ward robust automatic traffic scene analysis in real-time,” in Proc. 12th
sity of Minnesota-Twin Cities, Minneapolis, in 1995.
Int. Conf. Pattern Recognition, vol. 1, 1994, pp. 126–131.
He is currently working toward the Ph.D. degree at
[9] A. J. Lipton, H. Fujiyoshi, and R. S. Patil, “Moving target classification
the same university.
and tracking from real-time video,” in Proc. IEEE Workshop Applica-
From 1995 to 2001, he was a Principal Engineer
tions of Computer Vision, 1998, pp. 8–14.
with Lockheed Martin, Bethesda, MD. His research
[10] O. Masoud and N. P. Papanikolopoulos, “Robust pedestrian tracking
interests include computer vision, machine learning,
using a model-based approach,” in Proc. IEEE Conf. Intelligent Trans-
and cognitive psychology.
portation Systems, Nov. 1997, pp. 338–343.
[11] O. Masoud, N. P. Papanikolopoulos, and E. Kwon, “Vision-based moni-
toring of weaving sections,” in Proc. IEEE Conf. Intelligent Transporta-
tion Systems, Oct. 1999, pp. 770–775.
[12] O. Masoud, S. Rogers, and N.P. Papanikolopoulos, “Monitoring
Weaving Sections,” ITS Institute, Univ. Minnesota, Minneapolis, CTS Nikolaos P. Papanikolopoulos (S’88–M’93) was
01-06, Oct. 2001. born in Piraeus, Greece, in 1964. He received
[13] G. D. Sullivan, “Model-based vision for traffic scenes using the ground- the Diploma degree in electrical and computer
plane constraint,” Phil. Trans. Roy. Soc. (B), vol. 337, pp. 361–370, engineering from the National Technical University
1992. of Athens, Athens, Greece, in 1987, the M.S.E.E.
[14] G. D. Sullivan, K. D. Baker, A. D. Worrall, C. I. Attwood, and P. M. Re- degree in electrical engineering, and the Ph.D.
magnino, “Model-based vehicle detection and classification using or- degree in electrical and computer engineering from
thographic approximations,” Image Vis. Comput., vol. 15, no. 8, pp. Carnegie Mellon University (CMU), Pittsburgh, PA,
649–654, Aug. 1997. in 1988, and 1992, respectively.
[15] G. D. Sullivan, A. D. Worrall, and J. M. Ferryman, “Visual object recog- Currently, he is a Professor in the Department of
nition using deformable models of vehicles,” in Proc. Workshop on Con- Computer Science and Engineering at the University
text-Based Vision, Cambridge, MA, June 1995, pp. 75–86. of Minnesota, Minneapolis, MN. He has authored or coauthored more than 150
journal and conference papers and 35 refereed journal papers. His research inter-
ests include robotics, sensors for transportation applications, control, and com-
puter vision.
Dr. Papanikolopoulos was the recipient of a Kritski Fellowship, in 1986 and
1987. He was the finalist for the Anton Philips Award for Best Student Paper at
the IEEE Robotics and Automation Conference, in 1991. He was a Mc Knight
Surendra Gupte was born in Bombay, India, in 1974. He received the Bach- Land-Grant Professor at the University of Minnesota, from 1995 to 1997, and
elor of Engineering degree from Poona University, Poona, India, in 1996 and has received the National Science Foundation’s (NSFs) Research Initiation and
the M.S. degree in computer science from the University of Minnesota, Min- Early Career Development Awards. Finally, he has received grants from the De-
neapolis, in 2001. fense Advanced Research Projects Agency (DARPA), Sandia National Labo-
He is currently working at Sun Microsystems, Palo Alto, CA. His research ratories, NSF, the United States Department of Transportation (USDOT), the
interests include computer vision, computer graphics, and artificial intelligence. Minnesota Department of Transportation (MN/DOT), Honeywell, and 3M.