Data Face Detection
Data Face Detection
Face detection techniques have been researched for years and much progress has been
proposed in literature. Most of the face detection methods focus on detecting frontal faces
with good lighting conditions. These methods can be categorized into four types: knowledge-
based, feature invariant, template matching and appearance-based.
Knowledge-based methods use human-coded rules to model facial features, such as
two symmetric eyes, a nose in the middle and a mouth underneath the nose.
Feature invariant methods try to find facial features which are invariant to pose,
lighting condition or rotation. Skin colors, edges and shapes fall into this category.
Template matching methods calculate the correlation between a test image and pre-
selected facial templates.
Several algorithms are used for face recognition. Some of the popular methods are
discussed here. Face recognition by feature matching is one such method .We have to locate
points in the face image with high information content. We don’t have to consider the face
contour or the hair. We have to concentrate on the center of the face area, as most stable and
informative features are found there. The high informative points in the face are considered
around eyes, nose and mouth. To enforce this we apply Gaussian weighting to the center of
the face.
Paul Viola and Michael Jones presented a fast and robust method for face detection
which is 15 times quicker than any technique at the time of release with 95% accuracy at
around 17 fps. This work has three key contributions:
For their face detection framework Viola and Jones decided to use simple features
based on pixel intensities rather than to use pixels directly. They motivated this choice by two
main factors:
Features can encode ad-hoc domain knowledge, which otherwise would
be difficult to learn using limited training data.
Features-based system operates must faster than a pixel-based system.
They have defined three kinds of Haar-like rectangle features :
Two-rectangle feature was defined as a difference between the sum of
the pixels within two adjacent regions (vertical or horizontal),
Three-rectangle feature was defined as a difference between two outside
rectangles and an inner rectangle between then,
Four-rectangle feature was defined as a difference between diagonal pairs of
rectangles.
Figure 4.6.1: Rectangle features example: (A) and (B) show two-rectangle features,
(C) shows three-rectangle feature, and (D) shows four-rectangle feature.
4.6.2.Integral Image:
Maintaining a cumulative row sum at each location x; y, the integral image can be
computed in a single pass over the original image. Once it is computed, rectangle features can
be calculated using only a few accesses to it (see Figure 4.5.2.2):
i. Two-rectangle features require 6 array references,
ii. Three-rectangle features require 8 array references, and
iii. Four-rectangle features require 9 array references.
Figure 4.6.2.2: Calculation example. The sum of the pixels within rectangle D can be
computed as 4 + 1 - (2 + 3), where 1-4 are values of the integral image.
The authors defined the base resolution of the detector to be 24x24. In other words,
every image frame should be divided into 24x24 sub-windows, and features are extracted at
all possible locations and scales for each such sub-window. This results in an exhaustive set
of rectangle features which counts more than 160,000 features for a single sub-window.
4.6.3. AdaBoost learning algorithm:
The AdaBoost algorithm was introduced in 1995 by Freund and Schapire. The
complete set of features is quite large - 160,000 features per a single 24x24 sub-window.
Though computing a single feature can be done with only a few simple operations, evaluating
the entire set of features is still extremely expensive, and cannot be performed by a real-time
application.
In its original form, AdaBoost is used to improve classification results of a learning
algorithm by combining a collection of weak classifiers to form a strong classifier. The
algorithm starts with equal weights for all examples. In each round, the weight are updated so
that the misclassified examples receive more weight.
By drawing an analogy between weak classifiers and features, Viola and Jones
decided to use AdaBoost algorithm for aggressive selection of a small number of good
features, which nevertheless have significant variety.
Practically, the weak learning algorithm was restricted to the set of classification
functions, which of each was dependent on a single feature. A weak classifier h(x; f; p;θ )
was then defined for a sample x (i.e. 24x24 sub-window) by a feature f, a threshold θ, and a
polarity p indicating the direction of the inequality:
h(x,f, p,θ) = 1 if pf(x) < pθ;
= 0 otherwise......................................(2)
The key advantage of the AdaBoost over its competitors is the speed of learning. For
each feature, the examples are sorted based on a feature value. The optimal threshold for that
feature can be then computed in a single pass over this sorted list.
In their paper [8], Viola and Jones show that a strong classifier constructed from 200
features yields reasonable results - given a detection rate of 95%, false positive rate of 1 to
14,084 was achieved on a testing dataset. These results are promising. However, authors
realized that for a face detector to be practical for real applications, the false positive rate
must be closer to 1 in 1,000,000. The straightforward technique to improve detection
performance would be to add features to the classifier. This, unfortunately, would lead to
increasing computation time and thus would turn the classifier into inappropriate for real-time
applications.
4.6.4.Detectors Cascade:
There is a natural trade-off between classifier performance in terms of detection rates
and its complexity, i.e. an amount of time required to compute the classification result.
Viola and Jones [2], however, were looking for a method to speed up performance
without compromising quality. As a result, they came up with an idea of detectors cascade
(see Figure 4.5.4). Each sub-window is processed by a series of detectors, called cascade, in
the following way. Classifiers are combined sequentially in the order of their complexity,
from the simplest to the most complex. The processing of a sub-window starts, then, from a
simple classifier, which was trained to reject most of negative (non-face) frames, while
keeping almost all positive (face) frames. A sub-window proceeds to the following, more
complex, classifier only if it was classified as positive at the preceding stage. If any one of
classifiers in a cascade rejects a frame, it is thrown away, and a system proceeds to the next
sub-window. If a sub-window is classified as positive by all the classifiers in the cascade, it is
declared as containing a face.
References:
1.“Human face segmentation and identification”, S. A. Sirohey, Technical Report CAR-TR-
695, Center for Automation Research, University of Maryland, College Park, MD, 1993
[2]Paul Viola, Michael J. Jones, “Robust Real-Time Face Detection”, International Journal of
Computer Vision 57(2), 137–154, 2004.
[3] R. Lienhart and J. Maydt, “An extended set of haar-like features for rapid object
detection,” in IEEE ICIP 2002, Vol. 1, pp 900-903, 2002.
4.“Human and machine recognition of faces: A survey”, R. Chellappa, C. L. Wilson, and S.
Sirohey, Proc. of IEEE, volume 83, pages 705-740, 1995