Face Detection and Recognition
Face Detection and Recognition
A
MINOR PROJECT REPORT
Submitted by
DHRUV GOEL-44114802717
AKHILESH CHAUHAN-44514802717
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Extension: There are vast number of applications from this face detection project, this project can
be extended that the various parts in the face can be detect which are in various directions and
shapes.
Table of Contents
1 INTRODUCTION
1.1 Objectives 6
1.2 Parameters used 6
1.3 Face recognition 6
1.4 Face detection 7
2 LITERATURE SURVEY 9
3 MATLAB
3.1 Introduction 15
3.2 MATLAB’s power of computational mathematics 15
3.3 features of MATLAB 16
3.4 Uses of MATLAB 16
3.5 Understanding MATLAB environment 17
3.6 Commands 18
3.7 M Files 19
4 FACE DETECTION
5 FACE RECOGNITION
6 RESULTS
6.1 Face detection in images 29
6.2 Real time face detection 30
6.3 Face recognition 30
8 REFERENCES 32
9 APPENDIX 34
CHAPTER-1
INTRODUCTION
Face recognition is the task of identifying an already detected object as a known or unknown face.
Often the problem of face recognition is confused with the problem of face detectionFace
Recognition on the other hand is to decide if the "face" is someone known, or unknown, using for
this purpose a database of faces in order to validate this input face.
1.1 OBJECTIVE
This project will serve the following objectives: -
3. Storing unknown image in image database using real time face detection.
There are two predominant approaches to the face recognition problem: Geometric (feature based)
and photometric (view based). As researcher interest in face recognition continued, many different
algorithms were developed, three of which have been well studied in face recognition literature.
1. Pre-Processing: To reduce the variability in the faces, the images are processed before they are
fed into the network. All positive examples that is the face images are obtained by cropping
images with frontal faces to include only the front view. All the cropped images are then corrected
for lighting through standard algorithms.
2. Classification: Neural networks are implemented to classify the images as faces or nonfaces by
training on these examples. We use both our implementation of the neural network and the Matlab
neural network toolbox for this task. Different network configurations are experimented with to
optimize the results.
3. Localization: The trained neural network is then used to search for faces in an image and if
present localize them in a bounding box. Various Feature of Face on which the work has done on:-
Position Scale Orientation Illumination
CHAPTER-2
LITERATURE SURVEY
Face detection is a computer technology that determines the location and size of human face
in arbitrary (digital) image. The facial features are detected and any other objects like trees,
buildings and bodies etc are ignored from the digital image. It can be regarded as a ‗specific‘ case
of object-class detection, where the task is finding the location and sizes of all objects in an image
that belong to a given class. Face detection, can be regarded as a more ‗general‘ case of face
localization. In face localization, the task is to find the locations and sizes of a known number of
faces (usually one). Basically there are two types of approaches to detect facial part in the given
image i.e. feature base and image base approach. Feature base approach tries to extract features of
the image and match it against the knowledge of the face features. While image base approach tries
to get best match between training and testing images.
2.1.1 Snakes:
The first type uses a generic active contour called snakes, first introduced by Kass et al. in 1987
Snakes are used to identify head boundaries [8,9,10,11,12]. In order to achieve the task, a snake is
first initialized at the proximity around a head boundary. It then locks onto nearby edges and
subsequently assume the shape of the head. The evolution of a snake is achieved by minimizing an
energy function, Esnake (analogy with physical systems), denoted asEsnake = Einternal +
EExternal Where Einternal and EExternal are internal and external energy functions. Internal
energy is the part that depends on the intrinsic properties of the snake and defines its natural
evolution. The typical natural evolution in snakes is shrinking or expanding. The external energy
counteracts the internal energy and enables the contours to deviate from the natural evolution and
eventually assume the shape of nearby features—the head boundary at a state of equilibria
Deformable templates were then introduced by Yuille et al. to take into account the a priori of
facial features and to better the performance of snakes. Locating a facial feature boundary is not an
easy task because the local evidence of facial edges is difficult to organize into a sensible global
entity using generic contours. The low brightness contrast around some of these features also
makes the edge detection process. Yuille et al. took the concept of snakes a step further by
incorporating global information of the eye to improve the reliability of the extraction process.
Independently of computerized image analysis, and before ASMs were developed, researchers
developed statistical models of shape. The idea is that once you represent shapes as vectors, you
can apply standard statistical methods to them just like any other multivariate object. These models
learn allowable constellations of shape points from training examples and use principal components
to build what is called a Point Distribution Model. These have been used in diverse ways, for
example for categorizing Iron Age broaches. Ideal Point Distribution Models can only deform in
ways that are characteristic of the object.
Based on low level visual features like color, intensity, edges, motion etc. Skin Color Base Color is
a vital feature of human faces. Using skin-color as a feature for tracking a face has several
advantages. Color processing is much faster than processing other facial features. Under certain
lighting conditions, color is orientation invariant. This property makes motion estimation much
easier because only a translation model is needed for motion estimation. Tracking human faces
using color as a feature has several problems like the color representation of a face obtained by a
camera is influenced by many factors like ambient light, object movement, etc.
When use of video sequence is available, motion information can be used to locate moving
objects. Moving silhouettes like face and body parts can be extracted by simply thresholding
accumulated frame differences. Besides face regions, facial features can be located by frame
differences .
Gray information within a face can also be treat as important features. Facial features such
as eyebrows, pupils, and lips appear generally darker than their surrounding facial regions. Various
recent feature extraction algorithms search for local gray minima within segmented facial regions.
In these algorithms, the input images are first enhanced by contrast-stretching and gray-scale
morphological routines to improve the quality of local dark patches and thereby make detection
easier. The extraction of dark patches is achieved by low-level gray-scale thresholding. Based
method and consist three levels. Yang and huang presented new approach i.e. faces gray scale
behavior in pyramid (mosaic) images. This system utilizes hierarchical Face location consist three
levels. Higher two level based on mosaic images at different resolution. In the lower level, edge
detection method is proposed. Moreover, this algorithm gives fine response in complex background
where size of the face is unknown.
These algorithms aim to find structural features that exist even when the pose, viewpoint, or
lighting conditions vary, and then use these to locate faces. These methods are designed mainly for
face localization
Paul Viola and Michael Jones presented an approach for object detection, which minimizes
computation time while achieving high detection accuracy. Paul Viola and Michael Jones [39]
proposed a fast and robust method for face detection, which is 15 times quicker than any technique
at the time of release with 95% accuracy at around 17-fps.The technique, relies on the use of
simple Haar-like features that are evaluated quickly through the use of a new image representation.
Based on the concept of an ―Integral Image‖ it generates a large set of features and uses the
boosting algorithm AdaBoost to reduce the overcomplete set and the introduction of a degenerative
tree of the boosted classifiers provides for robust and fast interferences. The detector is applied in a
scanning fashion and used on gray-scale images, the scanned window that is applied can also be
scaled, as well as the features evaluated.
using distance formula. At last, the distances arecompared with database. If match occurs, it means
that thefaces in the image are detected. Equation of Gabor filter [40] is shown below`
All methods discussed so far are able to track faces but still some issue like locating faces
of various poses in complex background is truly difficult. To reduce this difficulty investigator
form a group of facial features in face-like constellations using more robust modelling approaches
such as statistical analysis. Various types of face constellations have been proposed by Burl et al. .
They establish use of statistical shape theory on the features detected from a multiscale Gaussian
derivative filter. Huang et al. also apply a Gaussian filter for pre-processing in a framework based
on image feature analysis.Image Base Approach.
SVMs were first introduced Osuna et al. for face detection. SVMs work as a new paradigm to train
polynomial function, neural networks, or radial basis function (RBF) classifiers.SVMs works on
induction principle, called structural risk minimization, which targets to minimize an upper bound
on the expected generalization error. An SVM classifier is a linear classifier where the separating
hyper plane is chosen to minimize the expected classification error of the unseen test patterns.In
Osunaet al. developed an efficient method to train an SVM for large scale problems,and applied it
to face detection. Based on two test sets of 10,000,000 test patterns of 19 X 19 pixels, their system
has slightly lower error rates and runs approximately30 times faster than the system by Sung and
Poggio . SVMs have also been used to detect faces and pedestrians in the wavelet domain.
CHAPTER-3
MATLAB
3.1 INTRODUCTION
The name MATLAB stands for MATrix LABoratory. MATLAB was written originally to
provide easy access to matrix software developed by the LINPACK (linear system package) and
EISPACK (Eigen system package) projects. MATLAB is a high-performance language for
technical computing. It integrates computation, visualization, and programming environment.
MATLAB has many advantages compared to conventional computer languages (e.g., C,
FORTRAN) for solving technical problems. MATLAB is an interactive system whose basic data
element is an array that does not require dimensioning. Specific applications are collected in
packages referred to as toolbox. There are tool boxes for signal processing, symbolic computation,
control theory, simulation, optimization, and several other fields of applied science and
engineering.
• Linear Algebra
• Algebraic Equations
• Non-linear Functions
• Statistics
• Data Analysis
• Transforms
• Curve Fitting
• It also provides an interactive environment for iterative exploration, design and problem
solving.
• It provides vast library of mathematical functions for linear algebra, statistics, Fourier
analysis, filtering, optimization, numerical integration and solving ordinary differential
equations.
• It provides built-in graphics for visualizing data and tools for creating custom plots.
• MATLAB's programming interface gives development tools for improving code quality,
maintainability, and maximizing performance.
• It provides functions for integrating MATLAB based algorithms with external applications
and languages such as C, Java, .NET and Microsoft Excel.
MATLAB is widely used as a computational tool in science and engineering encompassing the
fields of physics, chemistry, math and all engineering streams. It is used in a range of applications
including:
• control systems
• computational biology
Current Folder - This panel allows you to access the project folders and files.
Command Window - This is the main area where commands can be entered at the command line.
It is indicated by the command prompt (>>).
Fig3 command window
Workspace - The workspace shows all the variables created and/or imported from files.
Fig 4 workspace
Command History - This panel shows or rerun commands that are entered at the command line.
3.6 COMMANDS
MATLAB is an interactive program for numerical computation and data visualization. You can
enter a command by typing it at the MATLAB prompt '>>' on the Command Window.
3.6.1 Commands for managing a session
MATLAB provides various commands for managing a session. The following table
provides all
Commands Purpose
Clc Clear command window
Clear Removes variables from memory
Exist Checks for existence of file or
variable.
Global Declare variables to be global.
Help Searches for help topics.
Look for Searches help entries for a
keyword.
Quit Stops MATLAB.
Who Lists current variable.
Whos Lists current variables (Long
Display).
Table 1 commands for managing a session
3.7 M FILES
MATLAB allows writing two kinds of program files:
Scripts:
script files are program files with .m extension. In these files, you write series of commands, which
you want to execute together. Scripts do not accept inputs and do not return any outputs. They
operate on data in the workspace.
Functions:
functions files are also program files with .m extension. Functions can accept inputs and return
outputs. Internal variables are local to the function.
CHAPTER-4
FACE DETECTION
There are three main approaches considered for implementing face detection. They are
‘Neural network based face detection’, ‘Image pyramid statistical method’ and ‘Voila &
Jones’. Each implementation is discussed below:
6
5.0
Time (s)
3 Time taken
1
0.067
0
Neural network based Image pyramid Voila and Jones method
face detection statistical method
Figure 1 Plot showing time taken by each method. (Values are taken from their respective
papers)
The plot above shows the time taken for each method to process an image. The images
used to establish those benchmarks were approximately 320x240 pixels and were processed
on relatively similar processor.
In other algorithms the speed of detection wasn’t taken into account as such. Hence, we can
clearly see that ‘Voila and Jones’ method work significantly faster when compared with
others. It is also considered as one of the breakthroughs in the face detection industry.
False detections 10 31 50 65 78 95 167
Detector
Rowley-Baluja-Kanade 83.2% 86.0% - - - 89.2% 90.1%
Schneiderman-Kanade - - - 94.4% - - -
Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9%
Table 1 Detection rate for various numbers of false positives on the MIT+CMU test set containing
130 images and 507 faces.
Table 1 shows the detection rate results of the different algorithms. From this we can
clearly say that ‘Voila and Jones’ method also provides reasonably high detection rate.
After analyzing each algorithm’s complexity, Voila & Jones seems a better choice. It
offers a high detection rate combined with a very low processing time which is what the
system needs.
C – Line features
They are formed of one low interval and high interval or in other words are single
wavelength square waves. A square wave is a pair of one light and one dark adjacent
rectangles. The calculation of these wavelets is relatively easy as the white areas are just
subtracted from the black ones. Above Figure shows the four basic types of Haar wavelets
in 2D.
The feature extraction is made faster by integral image which is a special representation of
the image. A machine learning method, called ‘AdaBoost’ enables classifier training and
feature selection. All of the detected features are then combined efficiently by using a
cascaded classifier. This is shown in figure below
It is process of identifying different parts of human faces like eyes, nose, mouth, etc… this process
can be achieved by using MATLAB code In this project the author will attempt to detect faces in
still images by using image invariants. To do this it would be useful to study the grey-scale
intensity distribution of an average human face. The following 'average human face' was
constructed from a sample of 30 frontal view human faces, of which 12 were from females and 18
from males. A suitably scaled color map has been used to highlight grey-scale intensity
differences.
The grey-scale differences, which are invariant across all the sample faces are strikingly apparent.
The eye-eyebrow area seem to always contain dark intensity (low) gray-levels while nose forehead
and cheeks contain bright intensity (high) grey-levels. After a great deal of experimentation, the
researcher found that the following areas of the human face were suitable for a face detection
system based on image invariants and a deformable template.
The above facial area performs well as a basis for a face template, probably because of the clear
divisions of the bright intensity invariant area by the dark intensity invariant regions. Once this
pixel area is located by the face detection system, any particular area required can be segmented
based on the proportions of the average human face After studying the above images it was
subjectively decided by the author to use the following as a basis for dark intensity sensitive and
bright intensity sensitive templates. Once these are located in a subject's face, a pixel area 33.3%
(of the width of the square window) below this.
Now that a suitable dark and bright intensity invariant templates have been decided on, it is
necessary to find a way of using these to make 2 A-units for a perceptron, i.e. a computational
model is needed to assign neurons to the distributions displayed .
CHAPTER-5
FACE RECOGNITION
There are various available implementations of face recognition software. The Face
Recognition Technology (FERET) program has already analyzed most of those algorithms
and have published detailed performance results for each of these approaches.
The main goal of this program was to test each algorithm on the same data sets so a genuine
comparative results study can be formed. It is managed by the Defense Advanced Research
Projects Agency (DARPA) and the National Institute of Standards and Technology (NIST)
so that the best solution can be known and deployed.
This Figure shows the FERET10
FAFB results. In this test each
algorithm was run with faces with
different facial expressions. All
other conditions were kept the
same.
Principal Component Analysis (or Karhunen-Loeve expansion) is a suitable strategy for face
recognition because it identifies variability between human faces, which may not be immediately
obvious. Principal Component Analysis (hereafter PCA) does not attempt to categorize faces using
familiar geometrical differences, such as nose length or eyebrow width. Instead, a set of human
faces is analyzed using PCA to determine which 'variables' account for the variance of faces. In
face recognition, these variables are called eigen faces because when plotted they display an eerie
resemblance to human faces. Although PCA is used extensively in statistical analysis, the pattern
recognition community started to use PCA for classification only relatively recently. As described
by Johnson and Wichern (1992), 'principal component analysis is concerned with explaining the
variance- covariance structure through a few linear combinations of the original variables.' Perhaps
PCA's greatest strengths are in its ability for data reduction and interpretation. For example a
100x100 pixel area containing a face can be very accurately represented by just 40 eigen values.
Each eigen value describes the magnitude of each eigen face in each image. Furthermore, all
interpretation (i.e. recognition) operations can now be done using just the 40 eigen values to
represent a face instead of the manipulating the 10000 values contained in a 100x100 image. Not
only is this computationally less demanding but the fact that the recognition information of several
thousand.
This method is based upon Principal component analysis (PCA). An initial set of images of
faces are used to create a training set. The number of face shots of each person stored in the
database depends on how much processing time they will take. These faces are then broken
down into individual vectors. The magnitude of each vector represents the brightness of
individual sectors of the gray scale image. A covariance matrix is formed by normalizing
these vectors. After this eigenvectors are derived from this covariance matrix and a set of
eigenvectors of an image forms an eigenface. Eigenface helps in just focusing at the main
face features rather than the whole face data. In other words it enables us to find the
weight of each face.
When a new face image is acquired the weight of that face is calculated and then
subtracted from the each of the weights of other images in the database. Those
difference numbers represents how much different each image is from the original
image. The lower the number the closer is the match. This difference is also known as
the max Euclidean distance.
This vector can also be regarded as a point in 10000 dimension space. Therefore, all the images of
subjects' whose faces are to be recognized can be regarded as points in 10000 dimension
space. Face recognition using these images is doomed to failure because all human face images are
quite similar to one another so all associated vectors are very close to each other in the 10000-
dimension space.
Figure 4
Figure 5 Eigen faces
The transformation of a face from image space (I) to face space (f) involves just a simple
matrix multiplication. If the average face image is A and U contains the (previously calculated)
eigenfaces,
f = U * (I - A)
CHAPTER-6
RESULTS
Figure 1
Figure 2
Figure 4
CHAPTER-7
6.1Conclusion
The approach presented here for face detection and recognition decreases the computation time
producing results with high accuracy. Viola Jones is used for detecting facial features. Not only in
video sequences, it has also been tested on live video using a webcam. Principal Component
Analysis (PCA) is used for recognizing images from a set of database using eigen values. Using the
system many security and surveillance systems can be developed and required object can be traced
down easily. In the coming days these algorithms can be used to detect a particular object rather
than faces.
6.2Future Scope
1 Future work is to work on the same domain but to track a particular face in a video sequence.
That is like avoiding all other faces except the face required.
2 Viola Jones Algorithm is only meant for frontal and upright movement of the faces. It doesn’t
work when it comes to any arbitrary movement and hence, doesn’t make sense. We would try
to train classifiers so that it is subtle to all sort of movements.
3 We would definitely try to find the displacement of the Eigen vectors using Taylor Series.
REFERENCES
18. Hadamard, J. (1923) Lectures on the Cauchy Problem in Linear Partial Differential
Equations, Yale University Press
19. Rowley, Baluja, and Kanade, “Neural Network-Based Face Detection,” PAMI, January ,
1998
20. Henry Schneiderman and Takeo Kanade, “A Statistical Method for 3D Object
Detection Applied to Faces and Cars Henry Schneiderman and Takeo Kanade”
21. Viola, P.; Jones, M.: Rapid, "Object Detection Using a Boosted Cascade of
Simple Features”, TR2004-043 May 2004
APPENDIX
filewithpath=strcat(pathname,filename);
Img=imread(filewithpath);
faceDetector = vision.CascadeObjectDetector;
faceDetector.MergeThreshold = 5;
bboxes = faceDetector(Img);
if ~isempty(bboxes)
Imgf =
insertObjectAnnotation(Img,'rectangle',bboxes,'Face','LineWid
th',3);
imshow(Imgf)
title('Detected faces');
else
position=[0,0];
label='No face Detected';
Imgn =
insertText(Img,position,label,'FontSize',25,'BoxOpacity',1);
imshow(Imgn)
end
while runLoop
img = snapshot(cam);
[croppedimage, bboxPoints]=myfacedetect(img);
bboxPolygon = reshape(bboxPoints', 1, []);
videoFrame = insertShape(img, 'Polygon', bboxPolygon,
'LineWidth', 4);
step(videoPlayer, videoFrame);
% check whether the video player window has been closed.
runLoop = isOpen(videoPlayer);
end
clear cam;
release(videoPlayer);
FACE RECOGNITION:
img=rgb2gray(img);
img=imresize(img,[M,N]);
img=double(reshape(img,[1,M*N]));
for i=1:n
distarray(i)=sum(abs(T(i,:)-imgpca)); %Finding L1
distance
end
Explanation:
The code is meant to work on pictures only. The recognition function decides if there is any close
valid match from the database. In the scenario where the face is ‘unknown’ the function simply
returns not recognized. After this the function loops back to the beginning for a new frame.