0% found this document useful (0 votes)
26 views6 pages

Automatic Detection of Cars in Real Roads Using Haar-Like Features

This document summarizes a research paper that describes a computer vision system for detecting cars in real-world environments using Haar-like features. The system is trained on labeled image samples of cars and non-cars to learn discriminative Haar-like features. Preliminary results show the method can effectively detect cars in real-time with some occasional false detections. Despite false detections, the fast speed of the method allows it to act as an initial filter for more computationally intensive detection tests.

Uploaded by

fikry wsb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views6 pages

Automatic Detection of Cars in Real Roads Using Haar-Like Features

This document summarizes a research paper that describes a computer vision system for detecting cars in real-world environments using Haar-like features. The system is trained on labeled image samples of cars and non-cars to learn discriminative Haar-like features. Preliminary results show the method can effectively detect cars in real-time with some occasional false detections. Despite false detections, the fast speed of the method allows it to act as an initial filter for more computationally intensive detection tests.

Uploaded by

fikry wsb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Automatic Detection of Cars in Real Roads using Haar-like Features

M. Oliveira, V. Santos

Department of Mechanical Engineering, University of Aveiro, 3810 Aveiro, Portugal.


{mriem, vitor}@ua.pt

Abstract: This paper describes a computer vision based system designed for the detection of cars in real world
environments. The system uses the Haar-like features method firstly introduced by Viola and Jones which is
known for its fast processing and good detection rates. The process requires representative data sets to be used
for training and validation including positive (presence of objects to detect) and negative (absence of objects to
detect) image samples. Therefore, several example images of cars were hand labeled for training and
performance calculation purposes. Preliminary results show that the method can be very effective to detect cars at
fast rates and show generalization capabilities. Despite some occasional false detections, because this method is
quite fast, it can act as a primordial filter of promising regions of the image, where more effective yet time
demanding tests can later be employed.

Keywords: Haar Features, Computer Vision, Automatic Detection of Cars, Transportation systems,

1 INTRODUCTION 2 HAAR-LIKE FEATURES


Automatic navigation in real roads is an old aspiration of Haar-like features were proposed by Viola and Jones [4]
road drivers, and of the automotive industry in general, as an alternative method for face detection. The general idea
because of the large importance it can one day reach in what was to describe an object as a cascade of simple feature
concerns security. Indeed, annually, all over the world, many classifiers organized into several stages. This is a very fast
casualties occur on the road due to accidents and are often method, performing face detection as effectively as any other
caused by driver distraction or lack of responsiveness in methods. As stated in [4], in the CMU+MIT reference test
demanding driving conditions (traffic, weather, individual set, the method performed 15 times faster than the Baluja-
focusing on the driving tasks, etc.). Kanade detector and about 600 times faster than the
The future of automatic navigation in roads will Schneiderman-Kanade detector.
necessarily require advanced and robust perception of the The classification of images is based on the value of
road and traffic entourage, which can be very complex due to simple basic features. Features are used instead of simple
the huge variety of subjects and conditions (roads, vehicles, raw pixel values generally because they can act to encode
illumination and weather, etc.). Roads and vehicles are ad-hoc domain knowledge but also, in this particular case,
among the most relevant subjects in that framework because they are much faster to process.
therefore, they represent a must when starting to develop
systems for automatic navigation on the road detection.
While the authors have been questing for autonomous
navigation in road-like tracks in a parallel research activity
[1][2] this paper focuses specifically on one method for
automatic car detection. The technique uses Haar-like
features, and cascade classifiers are “trained” to match cars
in real roads. A paper regarding the detection of a single
object using Haar features was already published by the
authors [3]. The paper introduces briefly the Haar-like
features concept, then describes the cascades used specially Figure 1. Basic set of Haar features used by [4] (left), and
extended set applied by [5] (right). Taken from [5].
for cars, and before the conclusion presents extensive results
on untrained road images under varied circumstances. Later on, Lienhart and Maydt proposed to extend the pool
of basic features by utilizing also 45º rotated features thus
“significantly enhancing the expressional power of the

1
learning system and consequently improving the Where I ( x′, y ′) is the value of the image’s pixel at
performance of the object detection system” [5]. The coordinates ( x, y ). The value of any RecSum( x, y, w, h) can
features that were proposed by Viola and Jones (the basic
be obtained by simply four lookups at the SAT.
set) and latter by Lienhart and Maydt (extended set) are
shown in Figure 1. It is important to emphasize that the RecSum ( x, y, w, h ) =
features of Figure 1 are mere prototypes. They are scaled
= SAT ( x + w, y + h ) − SAT ( x + w, y ) − (6)
independently in horizontal and vertical directions in order
to get an over complete set of features (the 24x24 window − SAT ( x, y + h ) + SAT ( x, y )
proposed in [4] the amount of possible features is around
180000). The result of the application of each feature to a This procedure is shown on Figure 2.
particular image region is given by the sum of the pixels that
lie within the black rectangles of the feature subtracted by
the sum of the ones overlapping the white rectangles. The x,y x,y SAT(x+w,y)

rectangles are defined by their top left coordinates x, y , their


SAT(x+w,y+h) SAT(x+w,y+h)
width w and height h . The sum of the pixels that lie within
the rectangle ri is represented by RecSum ( ri ) .
SAT(x+w)
N
feature1 = ∑ Wi × RecSum ( ri ) =
x,y SAT(x+w,y) x,y SAT(x+w,y)
i =1
N
(1)
= ∑ Wi × RecSum ( x, y, w, h ) SAT(x,y+h) SAT(x+w,y+h) SAT(x,y+h) SAT(x+w,y+h)
i =1

The values of N ,Wi and of ri are arbitrarily chosen. In Figure 2. Fast RecSum(ri ) calculation.
the case of [5], it has been defined that N = 2 and that black
Viola and Jones [4] set up a framework to combine several
rectangles ( r0 ) have negative weight Wi and white ( r1 ) features into a cascade, i.e. a sequence of tests on the image
have positive weights. Furthermore, the relationship between or on particular regions of interest, organized into several
weights is given by the difference of area occupied by the stages, each based on the results of one or more different
black and white rectangles. Haar features. For an object to be recognized, it must pass
through all of the stages of the cascade. The cascade is built
−W0 ⋅ Area ( r0 ) = W1 ⋅ Area ( r1 ) (2) by supplying a set of positive and negative examples to the
training algorithm. The used algorithm is called Adaboost,
Assuming W0 = −1 , one can obtain:
known for its high performance in what concerns
Area ( r0 ) generalization speed [4]. At each stage of the cascade, the
W1 = (3) machine learning algorithm selects the feature or a
Area ( ri )
combination of features that best separate negative from
Consequently, for example for feature (2a) of Figure 1, positive examples, by tuning the threshold classification
with a height h = 2 and width w = 6 the outcome of the function. There is a trade off relationship between the
feature application to a rectangular region positioned at x, y number of stages in a cascade and features in each stage and
the amount of time it takes to process the cascade. Viola and
would be:
Jones define, for each stage, a target for the minimum
feature2 a = −1⋅ RecSum ( x, y, 6, 2 ) + reduction in false positives and a maximum decrease in
(4) detection. The mentioned rates are obtained by using a
6× 2
+ ⋅ RecSum ( x + 2, y, 2, 2 ) validation set made up of the positive and negative
2× 2
examples. In order to improve the time performance of the
In order to compute the value of each feature very rapidly, algorithm, the same authors have also presented the notion of
an intermediate image representation is calculated. This attentional cascade. The idea consists of using the first stages
representation is called integral image or Summed Area of the cascade to effectively discard most of the regions of
Table (SAT). The value of the integral image at coordinates the image that have no objects. This is done by adjusting the
( x, y ), is given by the sum of all the pixels in the image that classifier’s threshold so that the false negative is close to
are above and to the left of ( x, y ): zero. By discarding many candidate regions early in the
cascade, Viola and Jones significantly improve the method’s
SAT ( x, y ) = ∑ I ( x ', y ') (5) performance. In fact, it makes a lot of sense that the
x ' ≤ x , y '≤ y detection system is able to quickly discard obvious negative
regions of an image using valuable time to better test much

2
more promising regions by submitting them to higher level
stages of the cascade that yield more complex features.
It has already been said that Haar-like features were
especially applied to perform face detection. However, the
framework is all-purpose. Some other approaches have
successfully used it for pedestrian detection [6].

3 A GENERALIZED CASCADE FOR CAR DETECTION


Figure 4. Car dataset taken by Markus Weber, California
The next attempt was to train a generalized cascade for the Institute of Technology.
detection of car’s rears, i.e. to train a Haar cascade that
would detect not a particular car model, but an A third dataset was made by the authors during a car travel
indiscriminate car’s rear detector. Haar features are first in Portugal (from Algarve to Aveiro) and nearly 2 hours of
trained to obtain a representation to be used latter for real footage was captured. Resolution was 752 x 512 pixels. Over
time object detection. For this purpose several image 1000 images were extracted from the film. Positive examples
collections were acquired. They will be described in the were separated and cars were also hand labeled (Figure 5).
subsequent chapters. No rescaling was performed. This will be referred to as
training dataset 3 (TDS3).
3.1 Training Datasets Description
For training purposes, two image datasets were borrowed
from the internet and a third was made by the authors. This
chapter will describe in detail each set, indicating the number
of images per set, their properties and locations where they
were taken. Table 1 sums up the training sets information.
Training datasets will, henceforth, be named as TDS
followed by their respective number.
Figure 5. Authors’ own car dataset from Portuguese roads.
Table 1. Training datasets description.
Some of the images were taken during adverse weather
Name Nº Img Resolution Location Authors conditions, such as rain. Some examples are present on
TDS 1 1556 variable California unknown Figure 6. These images are also included in TDS3.
TDS 2 126 896x592 California Weber
TDS 3 1004 752x512 Portugal Oliveira, Santos

California Institute of Technology dataset is composed of Figure 6. Authors’ own car dataset with poor weather
1156 images in png format, though many are very similar conditions.
(Figure 3). Image resolution is variable. This dataset is used
for training and will henceforth be named training dataset 1 3.2 Performance Datasets Description
(TDS1).
For the purpose of testing, three separate datasets are
used. The first dataset was built by Brad Philip and Paul
Updike (Figure 7).

Figure 3. Samples from of California Institute of Technology


dataset.
Markus Weber’s dataset is not as broad, bearing only 126 Figure 7. Car dataset taken by Brad Philip and Paul Updike,
images. The resolution is 896x592 pixels, jpg format and the California Institute of Technology.
images were taken in the California Institute of Technology’s
parking lots. Some examples are on Figure 4. This will be It was taken on the freeways of southern California. It is
named training dataset 2 (TDS2). composed of 530 images in jpeg format. Resolution is
constant at 320x240 pixels. Images are quite similar to TDS1
but are not included in it. This test dataset will be employed
to measure the performance of the cascades and will be

3
mentioned as performance dataset 1 (PDS1). Table 3 summarizes the setup used for several cascades
Performance dataset 2 (PDS2) is taken from the footage trained using different combinations of TDS, number of
that provided images for TDS3. The images are not the same stages, features pool, i.e., BASIC for Viola Jones features
although they are similar. PDS2 consists of 105 images, collection and ALL meaning Lienhart and Maydt extended
756x512 pixels of resolution, saved in png format. No set as well as the number of positive and negative samples
demanding weather conditions, city environment or gas generated after the dataset (T. Samples). Also, several
stations images were included. The idea was to use a window sizes were attempted. Cascades will henceforth be
simplified version of the footage. Finally, performance named by C followed by their respective number.
dataset 3 (PDS3) is an extension of PDS2 obtained by
including all kinds of images: poor weather, city, gas 4 PERFORMANCE TESTS AND RESULTS
stations, bridges etc. (Figure 8). Opencv [7] provides a tool for cascade performance
testing. The tool applies the cascade to all test images and
compares the algorithm’s outcome to the report generated by
hand labeling. Hit (HR) and false detection (or false alarm)
(FDR) rates are generated based on this comparison. In order
Figure 8. Complex images in PDS3.
to assume a given detection as the one described in the
PDS3 is a much harder set. It consists of 232 images with report, some tolerances are assumed. Tolerances are related
the same resolution and format as the ones of PDS2. to the disparities in position and size from the current
Table 2. Performance datasets description. detection and the one manually generated for comparison
For every detection made, a search in the corresponding PDS
Name Nº Img Resolution Location Authors is executed to see if the detection is true or false. In order to
PDS 1 530 320x240 California Philip, Updike allow for easy performance comparison, the tolerances
PDS 2 105 752x512 Portugal Oliveira, Santos employed are the default values of the mentioned tool. PDS1
was tested with several cascades and several scaling factors
PDS 3 232 752x512 Portugal Oliveira, Santos
scaling factor, sf , which is a Haar detection parameter that
3.3 Cascades Description indicates how much the reference window should be scaled
Having ensured a wide variety of examples, hand labeling up. HRs are quite good (some above 95%) though FDRs are
was performed over all images both in training and quite high (Table 4).
performance sets. A semi-automatic hand labeling
application was developed to ease the process by enabling Table 4. Performance results for PDS1.
fast mouse selection. It also generates a text file were the Name sf Hits Missed
False
HR FDR
Detect.
ROI or ROIs (i.e. the regions were the car or cars can be
found) is/are defined for every image. Intel Open Source 1.05 508 18 400 0,966 0,760
C1
Computer Vision Library (Opencv) [7] provides a tool that 1.9 370 156 263 0,703 0,500
creates samples by clipping the defined ROIs from TDS 1.05 501 25 444 0,952 0,844
C2 1.5 501 25 193 0,952 0,367
images, converting them to grayscale, rescaling them to
window size, and inserting them into a random background 2.9 311 215 500 0,591 0,951
image. The background image pool, or negative set, has not C3
1.05 440 86 4840 0,837 9,202
yet been described. A negative set consists of a set of images 1.9 436 90 1529 0,829 2,907
where no objects (cars) exist. They haven’t been mentioned 1.05 449 77 2676 0,854 5,087
since they are impaired with their respective TDS, i.e., every C5 1,9 495 31 953 0,941 1,812
TDS also has a set of negative examples, usually road 2,9 397 129 874 0,755 1,662
images where no cars are present. C6 1.05 442 84 5799 0,840 11,025

Table 3. Cascades Description. 1.05 429 97 3385 0,816 6,435


C7 1,9 396 130 925 0,753 1,759
Win Train. T. Samples Nº
Name Features Set 2,9 193 333 868 0,367 1,650
Size Set(s) (pos/neg) Stages
C1 30x20 1+2 unknown 20 BASIC
There is a trade-off relationship between HR and FDR.
C2 60x40 1+2 1282 / 754 20 BASIC
Detecting a given feature with a HR of 100%, would
C3 30x20 3 unknown 20 BASIC
obviously raise the FDR. The results here shown present the
C4 30x20 3 unknown 20 ALL
HR and FDRs of the complete cascades, i.e. the cascades
C5 60x40 3 unknown 20 BASIC
with all the stages included. The tables show the results for
C6 30x20 1+2+3 1556 / 915 20 BASIC
the maximum achieved HR, and the FDR associated with
C7 30x20 1+2+3 1556 / 915 20 ALL
them. It is important to bear in mind that, though the FDRs
C8 20x12 1+2+3 1556 / 915 30 ALL

4
presented in Table 4, Table 5 and Table 6 may seem quite or if further validation tests are to be implemented.
high, the charts on Figure 9, Figure 11 and Figure 13 provide Performance comparison between C4 1.05, C5 1.05 and C6 1.05 tested on
PDS2
a much better view of the HR versus the FDR relationship. A
close analysis of those figures clearly shows that a small loss 1

in the HR would imply a large reduction of the FDR. Also, 0,9


0,8
some techniques that may considerably reduce the FDRs are
0,7
discussed ahead. 0,6 C4 1.05

Hit Rate
The cascades that best perform would be C1sf =1.05 and 0,5 C5 1.05
0,4 C6 1.05
C 2sf =1.5 . The ROC curves for both are presented at Figure 9. 0,3
0,2
Performance comparison between C1 1.05 and C2 1.5 tested on PDS1
0,1

1 0
0 2 4 6 8 10
0,9
False Alarm Rate
0,8

0,7 Figure 11. ROC of the cascades that best performed on


0,6 PDS2.
Hit Rate

C1 1.05
0,5
C2 1.5
0,4 On Figure 11, the optimum point of cascade C 4 sf =1.05
0,3
presents values of HR  0.78 and FDR  0.75 . Tests with
0,2

0,1
PDS2 were not entirely satisfactory in what concerns FDRs.
0
However, this is a very difficult set and some additional
0 0,2 0,4 0,6 0,8 1 procedures could have been implemented to ease the FDR
False Alarm Rate
and also, in some cases, improve the HR.
Figure 9. ROC of the best performing cascades on PDS1.
Figure 9 shows that cascade C 2 sf =1.05 performs better
than C1sf =1.05 . Cascade C 2sf =1.5 can achieve the same HR as
C1sf =1.05 at a lower cost, i.e., lower FDR. Analyzing Figure 9
one could assume the optimum point to be HR  0.92 and
FDR  0.18 . This would imply that 92% of all cars were
detected yielding only 18 false detections per every 100
truthful ones. Some examples of C 2sf =1.5 detections can be
seen on Figure 10.

Figure 10. Examples of detections made by C 2sf =1.5 on Figure 12. PDS2’s apparent problems.
PDS1. First of all, the images from PDS3 could be clipped
Regarding PDS2, fewer tests were executed. Table 5 clearly without loss of reliable extrapolation of the algorithm’s
shows a much higher FDR’s average score. performance (Figure 12). The upper and the lower parts of
these images contain no information on the road (sky/rear
Table 5. Performance results for PDS2. mirror and car interior panel). This clipping operation would
Name sf Hits Missed
False
HR FDR lower considerably the FDR since many of these false
Detect. detections are in these areas of the images. Also, many of the
C4 1.05 116 28 2150 0,806 14,931 detections are very close to each other. An algorithm for
C5 1.05 79 65 1213 0,549 8,424 merging overlapping detections (or a fine tune of the
C6 1.05 84 60 1081 0,583 7,507 performance calculation tolerances mentioned at the
beginning of this chapter) could be easily implemented thus
While C 4 sf =1.05 yields the best HR, it also has a FDR of decreasing even more the FDR. In the case of Figure 12, one
14, i.e. for every detection that should be made, 14 false would go from a situation with 15 false detections to none, if
detections occur. This number may appear high if the these procedures were implemented, which would
cascade is used for actual detection but may loose relevance dramatically lower the FDR. Taking the previous
if the cascade is to be used as a simple attention mechanism considerations into account, it seemed interesting to test

5
some cascades on PDS3, even knowing that it is even more were used for the training of the cascades. Also, three
demanding than PDS2. The results are outlined on Table 6 different PDS were employed for performance testing. The
best achieved results show [ HR  0.92 ; FDR  0.18 ];
Table 6. Performance results for PDS3.
[ HR  0.78 ; FDR  0.75 ] and [ HR  0.72 ; FDR  1 ]
False
Name sf Hits Missed
Detect.
HR FDR respectively for PDS 1, 2 and 3.
C1 1.5 78 249 156 0,239 0,477
C2 1.5 89 238 612 0,272 1,872
C3 1.5 269 58 1510 0,823 4,618
1,05 256 71 4837 0,783 14,792
C4
1.5 204 123 2028 0,624 6,202
1.5 158 169 1252 0,483 3,829
C5
1.05 79 65 1213 0,549 8,424
C6 1.5 158 169 1457 0,483 4,456
C7 1.5 115 212 1649 0,352 5,043
1.05 295 32 4119 0,902 12,596 Figure 14.Some detections of PDS3.
C8 1,9 30 297 43 0,092 0,131 Some methods for decreasing the FDRs were suggested
2,9 189 138 837 0,578 2,560 and will be further explored in the future. Most of the
cascades were tested against all PDSs, which may provide
In this particularly difficult dataset, most of the cascades relevant information regarding the influence of variables
present a low HR. However, cascades C 3sf =1.5 , C 4 sf =1.05 and such as window size, training sample size and variability,
particularly C8sf =1.05 have acceptable HRs. Of course that usage of rotated Haar features and others on the cascade
performance.
the FDRs are considerable, but there is the conviction that
In the case of PDS1, C 2sf =1.5 performed better than C1sf =1.05 ,
these rates can be substantially reduced by means of the
already mentioned clipping and merging techniques. which seems to corroborate the idea that a larger detection
Performance of C4 1.05 tested on PDS3
window may best describe an object (review Table 3).
Regarding PDS2, C 4 sf =1.05 is more efficient than both
1

0,9 C 5sf =1.05 and C 6sf =1.05 . The increase in performance may be
0,8
due to the usage of both simple and rotated Haar features.
0,7
The processing of the cascades is quite fast, which enables
0,6
Hit Rate

0,5 C4 1.05
the future implementation of this method in a real time
0,4
system.
0,3
0,2 6 REFERENCES
0,1
[1] R. Cancela, M. Neta, M. Oliveira, V. Santos, 2006. ATLAS III: Um
0
Robô com Visão Orientado para Provas em Condução Autónoma,
0 2 4 6 8 10
Robótica, nº 62, 2006, p. 8 (ISSN: 0874-9019).
False Alarm Rate
[2] M. Oliveira, V. Santos , A Vision-based Solution for the Navigation
of a Mobile Robot in a Road-like Environment, Robótica, nº69, 2007
Figure 13. ROC curve of C 4 sf =1.05 tested on PDS3. p. 8 (ISSN: 0874-9019).
[3] M. Oliveira, V. Santos. Combining View-based Object Recognition
Performance data was not extracted from C 3sf =1.5 neither with Template Matching for the Identification and Tracking of Fully
Dynamic Targets, 7th Conference on Mobile Robots and
from C8sf =1.05 and so Figure 13 presents only the results Competitions, Festival Nacional de Robótica 2007. Paderne, Algarve.
of C 4 sf =1.05 . The optimum point of detection performance for 27/04/2007
[4] P. Viola, M. Jones 2001. Rapid Object Detection using a Boosted
cascade C 4 sf =1.05 would, nonetheless, yield acceptable Cascade of Simple Features, Conference on Computer Vision and
Pattern Recognition CVPR, Hawaii, December 9-14, 2001.
HR  0.72 and FDR  1 . Bearing in mind that FDRs could [5] R. Lienhart and J. Maydt. An Extended Set of Haar-like Features for
be overstated and that PDS3 is a set of high complexity, Rapid Object Detection. IEEE ICIP 2002, Vol. 1, pp. 900-903, Sep.
including images in the rain, city and other tricky obstacles, 2002.
[6] G. Monteiro, P. Peixoto, U. Nunes, 2006. Vision-based Pedestrian
the HR of C8sf =1.05 is quite acceptable (Figure 14). Detection using Haar-like Features. Encontro Científico, Festival
Nacional de Robótica 2006.
[7] Opencv, Intel Open Source Computer Vision, found at
5 CONCLUSIONS AND FUTURE WORK http://www.intel.com/technology/computing/Opencv/ on January
This paper presented a method based on Haar-like features 2007.

designed for the detection of cars in real roads. Three TDS

You might also like