0% found this document useful (0 votes)
319 views

Biometric System Notes - Unit 3

This document discusses face recognition and hand geometry biometrics. It provides 2 mark and 5 mark questions on these topics. For face recognition, it discusses the components of facial recognition systems, advantages of 3D modeling, visual pattern classification, strengths and weaknesses, and biometric fusion diagrams. For hand geometry, it discusses using neural networks in face recognition to reduce complexity, examples of authentication algorithms like facial metrics and eigenfaces, and extracting geometric features from hand images for verification.

Uploaded by

0136akshaykumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
319 views

Biometric System Notes - Unit 3

This document discusses face recognition and hand geometry biometrics. It provides 2 mark and 5 mark questions on these topics. For face recognition, it discusses the components of facial recognition systems, advantages of 3D modeling, visual pattern classification, strengths and weaknesses, and biometric fusion diagrams. For hand geometry, it discusses using neural networks in face recognition to reduce complexity, examples of authentication algorithms like facial metrics and eigenfaces, and extracting geometric features from hand images for verification.

Uploaded by

0136akshaykumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT-3 FACE RECOGNITION AND HAND GEOMETRY

2 Marks Questions

1. State the basic components of facial rrecognition system

2. What are the advantages of 3D Model based face recognition?

Unlike standard facial recognition systems, 3D facial recognition is unaffected by light,


and scans can even be done in complete darkness. Another advantage of 3D facial
recognition is that it can recognize a target from many angles rather than just a straight-
on appearance.

3. Classify the visual based patterns.

Statistical pattern recognition techniques, which assumes that each object or pattern class
can be represented as feature vector and make decisions on which class to assign a certain
pattern based on distance calculations or probabilistic models.

4. List the strength and weakness of face recognition.

Improved Security- Any trespassers will be quickly captured by the recognition system
and you would be alerted promptly.
High Accuracy- due to the developments of 3D facial recognition technologies and
infrared cameras, makes it very hard to trick the system
Fully Automated- The facial recognition technology can now fully automate the process
and ensure its accuracy at a very high rate
Data Storage-Occupies more space

1
Camera Angle- For a facial recognition system to completely identify a face, it needs
multiple angles, including profile, frontal, 45 degree and more, to ensure the most
accurate resulting matches.

5. Draw the level of biometric fusion in pictorial form.

5 Marks Questions

6. What is the role of neural network in face recognition? Justify.

Neural Network in principle, may be trained to recognize face images directly. For even
an image with moderate size, however, the network can be very complex and therefore
difficult to train. For example, if the image is 128x128 pixels, the number of inputs of the
network would be 16,384. To reduce complexity, neural network is often applied to the
pattern recognition phase rather than to the feature extraction phase. Face detection
algorithm down-samples a face image into a 19x19 facial feature vector before they apply
the elliptical k-mean clustering to model the distributions of the "face samples" and the
"non-face samples". also reduce the dimension of the facial image to 20x20 by
downsampling before the facial image is fed into their multi-layer neural network face
detector.

One example of the neural classifier is the Probabilistic Decision-based Neural Network
(PDNN) .PDNN does not have the fully connected network topology. Instead, it divides
the network into K subnets. Each subnet is dedicated to recognize one person in the
database. PDNN uses the Gaussian activation function for its neurons, and the output of
each "face subnet" is the weighted summation of the neuron outputs. In other words, the
face subnet estimates the likelihood density using the popular mixture-of-Gaussian
model. Compared to the AWGN scheme, mixture of Gaussian provides a much more
flexible and complex model for approximating the true likelihood densities in the face
space.

The learning scheme of the PDNN consists of two phases. In the first phase, each subnet
is trained by its own face images (no cross training). In this phase, the weights and bias
are trained by the Expectation-Maximization (EM) algorithm. The EM has been proven

2
to be an efficient algorithm for ML estimation. In terms of system implementation, the
advantage of the EM algorithm is that it does not require the learning rate parameter. The
learning rate parameter in the conventional neural network training scheme controls the
speed of adjusting the network weights.

The learning rate is often a sensitive parameter; an improper selection may cause the
whole network fail to converge. The second phase of the PDNN learning is called the
decision-based learning. In this phase, the subnet parameters may be trained by some
particular samples from other face classes.

The decision-based learning scheme does not use all the training samples for the training.
Only those who are misclassified are used. If a sample is mis-classified to the wrong
subnet, the rightful subnet will tune its parameters so that its "territory"(decision region)
can be moved closer to the misclassified sample. This learning process is also known as
the Reinforced Learning. In the meantime, the subnet that wrongfully claims the identity
of the questionable sample will try to move itself away from the sample. This is called
the Anti-reinforced Learning.

Appraise the authentication of face recognition using suitable algorithm as example.


Facial recognition is based on determining shape and size of jaw, chin, shape and location of
the eyes, eyebrows, nose, lips, and cheekbones. 2D facial scanners start reading face geometry
and recording it on the grid. The facial geometry is transferred to the database in terms of
points. The comparison algorithms perform face matching and come up with the results. Facial
recognition is performed in the following ways
Facial Metrics − In this type, the distances between pupils or from nose to lip or chin are
measured.
Eigen faces − It is the process of analyzing the overall face image as a weighted combination
of a number of faces.
Skin Texture Analysis − The unique lines, patterns, and spots apparent in a person’s skin are
located.
Merits of Facial Recognition System

 It offers easy storage of templates in database.

 It reduces the statistic complexities to recognize face image.


 It involves no physical contact with the system

3
.
Demerits of Facial Recognition System

 Facial traits change over time.

 Uniqueness is not guaranteed, for example, in case of identical twins.


 If a candidate face shows different expressions such as light smile, then it can affect the
result.

 It requires adequate lighting to get correct input.


Applications of Facial Recognition System

 General Identity Verification.

 Verification for access control.

 Human-Computer Interaction.

 Criminal Identification.
Surveillance.

7. Explain about hand geometry with examples.


Hand geometry, as the name suggests, refers to the geometric structure of the hand. This
structure includes width of the fingers at various locations, width of the palm, thickness of the
palm, length of the fingers, contour of the palm, etc. Although these metrics do not vary
significantly across the population, they can still be used to verify the identity of an individual.
Hand geometry measurement is non-intrusive and the verification involves a simple processing
of the resulting features. Unlike palmprint, this method does not involve extraction of detailed
features of the hand (for example, wrinkles on the skin).
A typical hand geometry system consists of four main components: image acquisition, hand
segmentation and alignment, feature extraction, and feature matching.

4
1.Thumb length 2. Index finger length 3. Middle finger length 4. Ring finger length
5. Pinkie length 6. Thumb width 7. Index finger width 8.Middle finger width
9. Ring finger width 10. Pinkie width 11. Thumb circle radius 12. Index circle radius lower
13. Index circle radius upper 14. Middle circle radius lower 15. Middle circle radius upper
16. Ring circle radius lower 17. Ring circle radius upper 18. Pinkie circle radius lower
19. Pinkie circle radius upper 20. Thumb perimeter 21. Index finger perimeter
22. Middle finger perimeter 23. Ring finger perimeter 24. Pinkie perimeter
25. Thumb area 26. Index finger area 27. Middle finger area 28. Ring finger area
29. Pinkie area 30. Largest inscribed circle radius

Image capture
Most hand geometry systems acquire an image of the back of the human hand. This image is
often referred to as the dorsal aspect of the hand. Thus, most commercial systems require the
subject to place their hand on a platen with the palm facing downward. A suitably positioned
camera above the hand is then used to acquire an image of the dorsal aspect.
Hand segmentation
After the hand image is captured, the hand boundary must be extracted in order to determine the
region of interest. To accomplish this, typically, the image is thresholded in order to deduce the
region associated with the hand. This is followed by certain morphological operators (e.g.,
dilation and erosion followed by a connected region analysis) to extract the silhouette of the

5
hand. If the image is very noisy (e.g., due to variable illumination and shadows), then more
complex segmentation techniques such as the mean shift algorithm may be needed. The
segmented hand may still contain some artifacts such as the pegs on the platen, rings worn by the
user, clothing that covers certain parts of the hand, and discontinuous contours due to non-
uniform lighting. These artifacts are removed using specialized image processing techniques that
are tailored to specific artifacts. Once a reliable hand shape is obtained, a hand contour is
extracted and is used for further processing.
Feature Extraction
Typically, two kinds of features are extracted from a hand or a finger silhouette: one-dimensional
geometric measurements and two-dimensional shape-based features. The geometric
measurements include length and width of fingers, length and width of the palm, and thickness
of the fingers. See Figure 5.10 for an example of geometric measurements obtained from a hand
image.
Feature matching
The features extracted from a segmented hand image can often be denoted as a feature vector in
the Euclidean space. Consequently, common distance measures such as Euclidean and
Manhattan distances can be effectively used to compare two hand images.
Challenges in hand geometry recognition
Hand geometry systems have been successfully deployed in several applications including
nuclear power plants, border control systems, recreational centers and time-and-attendance
systems. In these applications, the biometric system typically operates in the verification mode.
Since the hand geometry of subsets of individuals can be similar, the identification accuracy due
to this biometric modality can be low. Further, the shape of an individual’s hand can change with
time - a factor that is especially pronounced in young children. More recent research has
explored the use of hand geometry in conjunction with fingerprints and low-resolution
palmprints in a multibiometric configuration for improved accuracy.

10 arks Questions

1. Review the facial recognition technique using the following (i) shape and (ii)
texture.

 The face is the frontal portion of the human head, extending from the forehead to the
chin and includes the mouth, nose, cheeks, and eyes. The face is considered to be the
most commonly used biometric trait by humans. Hence, it has become a standard
practice to incorporate face photographs in various tokens of authentication such as ID
cards, passports, and driver’s licenses.
 Face recognition can be defined as the process of establishing a person’s identity based
on their facial characteristics. In its simplest form, the problem of face recognition
involves comparing two face images and determining if they are of the same person.

6
 Face images of a person may have variations in age, pose, illumination, and facial
expressions as well as exhibit changes in appearance due to make-up, facial hair, or
accessories e.g., sunglasses.

Facial features

 Level 1 details consist of gross facial characteristics that are easily observable. Examples
include the general geometry of the face and global skin color. Such features can be used
to quickly discriminate between (a) a short round face and an elongated thin face; (b)
faces exhibiting predominantly male and female characteristics; or (c) faces from
different races.

 Level 2 details consist of localized face information such as the structure of the face
components (e.g., eyes), the relationship between facial components and the precise
shape of the face.

 Level 3 details consist of unstructured, micro level features on the face, which includes
scars, freckles, skin discoloration, and moles. One challenging face recognition problem
where Level 3 details may be critical is the discrimination of identical twins.
Design of a face recognition system
 A typical face recognition system is composed of three modules: (a) image acquisition,
(b) face detection, and (c) face matching (see Figure 3.6). The face image 104 3 Face
Recognition acquired from a sensor can be categorized based on (a) the spectral band
(e.g., visible, infrared, and thermal) used to record the image and (b) the nature of the
image rendering technique (e.g., 2D, 3D, and video).

Face Detection

 Face detection is the first step in most face-related applications including face
recognition, facial expression analysis, gender/ethnicity/age classification, and face
modeling. Variations in pose and expression, diversities in gender and skin tone, and
occlusions (e.g., due to glasses) are the typical challenges confounding face detection.
 While there are a number of approaches for detecting faces in a given image, state-of-the-

7
art face detection methods are typically based on extracting local texture from the given
image and applying a binary (two-class) classifier to distinguish between a face and non-
face.

Feature Extraction and Matching

 There are three main approaches to match the detected face images .Appearance-based,
Model-based, and Texture-based methods.

 Appearance-based techniques generate a compact representation of the entire face region


in the acquired image by mapping the high-dimensional face image into a lower
dimensional sub-space. This sub-space is defined by a set of representative basis vectors,
which are learned using a training set of images. Though the mapping can be either linear
or non-linear, commonly used schemes such as Principal Component Analysis (PCA),
Linear Discriminant Analysis (LDA), and Independent Component Analysis (ICA)
involve linear projections.

 Model-based techniques attempt to build 2D or 3D face models that facilitate matching of


face images in the presence of pose variations. While the Face Bunch Graphs (FBG) and
Active Appearance Model (AAM) are examples of 2D face models, the morphable model
is a 3D model. • Texture-based approaches try to find robust local features that are
invariant to pose or lighting variations. Examples of such features include gradient
orientations and Local Binary Patterns (LBP)

Appearance-based face recognition

 Appearance-based schemes are based on the idea of representing the given face image as
a function of different face images available in the training set, or as a function of a few
basis faces. For example, the pixel value at location (x,y) in a face image can be
expressed as a weighted sum of pixel values in all the training images at (x,y).

 The set of training images or basis faces forms a subspace and if the given face image is
linearly projected onto this subspace, it is referred to as linear subspace analysis.

 Principal Component Analysis Principal Component Analysis (PCA) is one of the earliest
automated methods proposed for face recognition. PCA uses the training data to learn a
subspace that accounts for as much variability in the training data as possible. This is
achieved by performing an Eigen value decomposition of the covariance matrix of the
data.

Model-based face recognition


 Model-based techniques try to derive a pose-independent representation of the face
images that can enable matching of face image across different poses. These schemes
typically require the detection of several fiducial or landmark points in the face (e.g.,
corners of eyes, tip of the nose, corners of the mouth, homogeneous regions of the face,

8
and the chin), which leads to increased complexity compared to appearance-based
techniques..

 The Elastic Bunch Graph Matching (EBGM) scheme represents a face as a labeled image
graph with each node being a fiducial or landmark point on the face. While each node of
the graph is labeled with a set of Gabor coefficients (also called a jet) that characterizes
the local texture information around the landmark point, the edge connecting any two
nodes of the graph is labeled based on the distance between the corresponding fiducial
points.
 The Gabor coefficient at a location in the image can be obtained by convolving the image
with a complex 2D Gabor filter centered at that location. By varying the orientation and
frequency of the Gabor filter, a set of coefficients or a Gabor jet can be obtained.

Texture-based face recognition

 Appearance-based schemes typically make use of the raw pixel intensity values, which
are quite sensitive to changes in ambient lighting and facial expressions. An alternative is
to use more robust feature representation schemes that characterize the texture of an
image using the distribution of local pixel values.
 Scale Invariant Feature Transformation (SIFT) and Local Binary Pattern (LBP) are two
most wellknown schemes for analysis of local textures. Scale Invariant Feature
Transform Scale Invariant Feature Transform (SIFT) is one of the most popular local
representation schemes used in object recognition.

2. Explain about facial recognition using correspondence mapping techniques.


The common feature is the extraction of correspondence maps between an image to be analyzed
and several stored views of known objects, which are called models. This means that pairs of
points in model and image must be found which are images of the same point on the physical
face. This is not trivial and has acquired the name correspondence problem.
If one tries to compare images that are identical copies of each other except for a constant shift in
the image plane. If they are compared on a pixel by pixel basis, huge differences will generally
occur. This can be overcome by first aligning the images, e.g. by finding the maximum of the
autocorrelation function. This alignment is the simplest case of determining which points

9
correspond to each other. In more complicated cases such as movements in three-dimensional
space, partial occlusion, additional noise or difficult lighting conditions, the correspondence
problem must be solved.
Its difficulty depends on the choice of features. If, e.g., grey values of pixels are taken as local
features, there is a lot of ambiguity, i.e., many points from very different locations share the
same pixel value without being correspondent points. A possible remedy to that consists in
combining local patches of pixels, which of course reduces this ambiguity. If this is done too
extensively, i.e., if local features are influenced by a large area, the ambiguities disappear if
identical images are used, but the features become more and more sensitive to distortions and
changes in background. Therefore, methods for establishing correspondences must make use of
robust features and additional information about their relative locations
Correspondence-based recognition Once a mapping is established, the feature similarities of
corresponding points are added up (or averaged) over the whole mapping. This yields a
similarity value for each stored model, and the highest similarity belongs to the recognized
person. A measure for the reliability or significance of the recognition is derived by a simple
statistical analysis of the series of all similarity values: The similarity histogram of one image
against all models is analyzed by dividing the distance of the highest similarity value by the
standard deviation of the distribution of all similarities except the highest one (Figure 1).
Alternatively, the distance of the highest similarity to the runner-up can be used. A reliability
measure is required because the case of an unknown person (low similarities and low
significance) must be distinguished from the case of a highly distorted image of a known person
(low similarities but high significance). In combination with a hierarchical matching scheme
such as the one described in Section 4.3 the significance measure can be used to stop the
refinement once a significant recognition has been achieved. On average, this yields a higher
recognition speed and interesting insights into the distribution of recognition-relevant
information over spatial scales.

Elastic Graphs
the model faces are represented by sparse graphs whose vertices are labeled with the vectors of
all Gabor features at an image location (jets) and whose edges are labeled with the distance
vectors of the connected vertices. Matching is done by first optimizing the similarity of an
undistorted copy of the graph in the input image and then optimizing the individual locations of

10
the vertices. This result in a distorted graph whose vertices are at approximately corresponding
locations to the ones in the model graph. The rectangular model graph arrangement has been
chosen in and in the vertices have been carefully placed on salient points, thus yielding a larger
recognition rate. 5 Figure 4: The result of labeled graph matching. The right hand side shows the
graph with maximal similarity to the one on the left.

Bunch Graphs
The major drawback of correspondence-based recognition systems is that the computationally
expensive procedure of creating a correspondence map must be carried out for each of the stored
models. This has partly been overcome by the concept of bunch graphs. The idea is that the
database of models is arranged in such a way that corresponding graph nodes are already located
at corresponding object points, e.g., a certain node lies on the left eye in all models. For large
databases, this reduces the recognition time by orders of magnitude. Another important
innovation consists in the possibility of matching the most similar features from different persons
and thus acquiring information on, e.g., gender, beardedness or the presence of glasses about an
unknown person. Although it is not straightforward to apply the bunch graph principle to
arbitrary object classes, it has been applied successfully to hand gesture recognition .
8. Explain about the biometric fusion techniques.
A unibiometric system, which utilizes a single biometric cue, may encounter problems due to
missing information (e.g., occluded face), poor data quality (e.g. dry fingerprint), overlap
between identities (e.g., face images of twins) or limited discriminability (e.g., hand geometry).
In such situations, it may be necessary to utilize multiple biometric cues in order to improve
recognition accuracy. For example, a border control system may use both face and fingerprints to
establish the identity of an individual.
The term multibiometrics has often been used to connote biometric fusion in the literature.to
develop a multibiometric system, one must consider the following three questions, (i) what to
fuse, (ii) when to fuse, and (iii) how to fuse. What to fuse involves selecting the different sources
of information to be combined, such as multiple algorithms or multiple modalities. When to fuse

11
is answered by analyzing the different levels of fusion, that is, the various stages in the biometric
recognition pipeline at which information can be fused. How to fuse refers to the fusion method
that is used to consolidate the multiple sources of information.

data from a single modality only (say face only), the performance of a recognition system can
often be enhanced by incorporating some ancillary information. Incorporating details such as
image quality, subject demographics, soft biometric attributes, and contextual meta-data has
shown to improve the performance of recognition systems. While recognition performance is a
major metric for evaluating biometric systems, it is important to focus on the security (and
privacy) aspect of such systems as well. Information fusion is seen as a viable option for
securing the biometric templates in a multibiometric system. Cryptosystems based on multiple
modalities have been proposed to securely store biometric templates and prevent access to the
original data . Biometric systems are also susceptible to spoof attacks. That is, an adversary can
impersonate another person’s identity by presenting a fake or altered biometric trait and gain
unauthorized access. Information fusion can play a major role in the detection and deflection of
such malicious activities.
Multi-sensor systems combine information captured by multiple sensors for the same biometric
modality. For example, a face recognition module could utilize RGB data captured using a
visible spectrum camera, along with depth information captured using a 3D camera or infrared

12
data captured using an NIR camera. Using both the images for identifying a subject would result
in a multi-sensor fusion.
Multi-algorithm systems utilize multiple algorithms for processing an input sample. Data is
captured from a biometric modality using a single sensor; however, multiple algorithms are used
to process it. For example, a fingerprint recognition system could utilize both minutiae and
texture features for matching fingerprints, or a palmprint recognition system could utilize Gabor,
line, and appearance based palmprint representations for matching. Such systems benefit from
the advantage of extracting and utilizing different types of information from the same sample.
Figure presents the different levels at which fusion can be incorporated in a biometric pipeline,
viz., (i) sensor-level, (ii) feature-level, (iii) score-level, (iv) rank-level, or (v) decisionlevel.

Sensor-level fusion or data-level fusion generally corresponds to multi-sensor or multi-sample


algorithms, where data is combined immediately after its acquisition. That is, data fusion is
carried out prior to feature extraction, directly on the raw data.
Feature-level fusion refers to algorithms where fusion is performed on multiple features
extracted from the same or different input data. This could correspond to multiple feature sets
pertaining to the same biometric trait, such as textural and structural features of a face image or
different features from a hand or palm-print image. It could also correspond to features extracted
from different modalities, such as face and hand images.
Score-level fusion corresponds to algorithms where the match scores produced by different
matchers are fused together. Some of the common fusion algorithms applied at this level are
mean score fusion, max score fusion, or min score fusion, where the mean, maximum, or
minimum score of multiple matchers is considered as the final score.
Rank-level fusion is performed after comparing the input probe with the templates in the gallery
set, i.e., the database. In the task of identification, where, a given probe image is compared
against a gallery of images, a ranked list of matching identities is often generated by the matcher.
Decision-level fusion corresponds to algorithms where fusion is performed at the decision level
[66, 68, 69]. Majority voting is one of the most common fusion algorithms applied at the
decision level. Decisions taken by n matchers or classifiers are combined based on a majority
vote, resulting in a final decision.

9. Describe the facial recognition technique using shape and texture.

The face is the frontal portion of the human head, extending from the forehead to the chin and
includes the mouth, nose, cheeks, and eyes. The face is considered to be the most commonly

13
used biometric trait by humans. Hence, it has become a standard practice to incorporate face
photographs in various tokens of authentication such as ID cards, passports, and driver’s
licenses.
Face recognition can be defined as the process of establishing a person’s identity based on their
facial characteristics. In its simplest form, the problem of face recognition involves comparing
two face images and determining if they are of the same person.
Face images of a person may have variations in age, pose, illumination, and facial expressions as
well as exhibit changes in appearance due to make-up, facial hair, or accessories (e.g.,
sunglasses.

Compared to other biometric traits like fingerprint and iris, people are generally more willing to
share their face images in the public domain as evinced by the increasing interest in social media
applications (e.g., Facebook) with functionalities like face tagging. Due to the above reasons,
face recognition has a wide range of applications in law enforcement, civilian identification,
surveillance systems, and entertainment/amusement systems.

Facial features

Level 1 details consist of gross facial characteristics that are easily observable. Examples include
the general geometry of the face and global skin color. Such features can be used to quickly
discriminate between (a) a short round face and an elongated thin face; (b) faces exhibiting
predominantly male and female characteristics; or (c) faces from different races.

Level 2 details consist of localized face information such as the structure of the face components
(e.g., eyes), the relationship between facial components and the precise shape of the face.

Level 3 details consist of unstructured, micro level features on the face, which includes scars,
freckles, skin discoloration, and moles. One challenging face recognition problem where Level 3
details may be critical is the discrimination of identical twins.

Design of a face recognition system A typical face recognition system is composed of three
modules: (a) image acquisition, (b) face detection, and (c) face matching (see Figure 3.6). The
face image 104 3 Face Recognition acquired from a sensor can be categorized based on (a) the
spectral band (e.g., visible, infrared, and thermal) used to record the image and (b) the nature of
the image rendering technique (e.g., 2D, 3D, and video).

14
Face Detection

Face detection is the first step in most face-related applications including face recognition, facial
expression analysis, gender/ethnicity/age classification, and face modeling. Variations in pose
and expression, diversities in gender and skin tone, and occlusions (e.g., due to glasses) are the
typical challenges confounding face detection. While there are a number of approaches for
detecting faces in a given image, state-of-the-art face detection methods are typically based on
extracting local texture res from the given image and applying a binary (two-class) classifier to
distinguish between a face and non-face.

Feature Extraction and Matching

There are three main approaches to match the detected face images (see Figure 3.20):
appearance-based, model-based, and texture-based methods. • Appearance-based techniques
generate a compact representation of the entire face region in the acquired image by mapping the
high-dimensional face image into a lower dimensional sub-space. This sub-space is defined by a
set of representative basis vectors, which are learned using a training set of images. Though the
mapping can be either linear or non-linear, commonly used schemes such as Principal
Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Independent Component
Analysis (ICA) involve linear projections.

• Model-based techniques attempt to build 2D or 3D face models that facilitate matching of face
images in the presence of pose variations. While the Face Bunch Graphs (FBG) and Active
Appearance Model (AAM) are examples of 2D face models, the morphable model is a 3D
model. • Texture-based approaches try to find robust local features that are invariant to pose or
lighting variations. Examples of such features include gradient orientations and Local Binary
Patterns (LBP)
15
Appearance-based face recognition

Appearance-based schemes are based on the idea of representing the given face image as a
function of different face images available in the training set, or as a function of a few basis
faces. For example, the pixel value at location (x,y) in a face image can be expressed as a
weighted sum of pixel values in all the training images at (x,y). The set of training images or
basis faces forms a subspace and if the given face image is linearly projected onto this subspace,
it is referred to as linear subspace analysis. The challenge here is to find a suitable low
dimensional subspace that preserves the discriminatory information contained in the face images.
In other words, the goal in linear subspace analysis is to find a small set of most representative
basis faces. Any new face image can be represented as a weighted sum of the basis faces and two
face images can be matched by directly comparing their vector of weights.

Principal Component Analysis Principal Component Analysis (PCA) is one of the earliest
automated methods proposed for face recognition. PCA uses the training data to learn a subspace
that accounts for as much variability in the training data as possible. This is achieved by
performing an Eigen value decomposition of the covariance matrix of the data.

Model-based face recognition Model-based techniques try to derive a pose-independent


representation of the face images that can enable matching of face image across different poses.
These schemes typically require the detection of several fiducial or landmark points in the face
(e.g., corners of eyes, tip of the nose, corners of the mouth, homogeneous regions of the face, and
the chin), which leads to increased complexity compared to appearance-based techniques. Some
of the model-based techniques can be used for face recognition as well as generating realistic
face animation.

The Elastic Bunch Graph Matching (EBGM) scheme represents a face as a labeled image graph
with each node being a fiducial or landmark point on the face. While each node of the graph is
labeled with a set of Gabor coefficients (also called a jet) that characterizes the local texture
information around the landmark point, the edge connecting any two nodes of the graph is
labeled based on the distance between the corresponding fiducial points. The Gabor coefficient at
a location in the image can be obtained by convolving the image with a complex 2D Gabor filter
centered at that location. By varying the orientation and frequency of the Gabor filter, a set of
coefficients or a Gabor jet can be obtained.

Texture-based face recognition

Appearance-based schemes typically make use of the raw pixel intensity values, which are quite
sensitive to changes in ambient lighting and facial expressions. An alternative is to use more
robust feature representation schemes that characterize the texture of an image using the
distribution of local pixel values. Scale Invariant Feature Transformation (SIFT) and Local
Binary Pattern (LBP) are two most wellknown schemes for analysis of local textures. Scale
Invariant Feature Transform Scale Invariant Feature Transform (SIFT) is one of the most popular
local representation schemes used in object recognition.

16
Computation of SIFT features consists of two stages: (a) key point extraction, and (b) descriptor
calculation in a local neighborhood at each key point. Just like the fiducial points in the model-
based approach, the key points can be used to achieve tolerance against pose variations.
However, the number of key points in SIFT could be quite large (in the order of hundreds) and
finding the correspondences between the key points from two different images is a challenging
task. If we assume that the face images are roughly pre-aligned (for instance, using the eye
locations), the key point detection process can be bypassed and the descriptor can be constructed
directly from the entire face image. The descriptor is usually a histogram of gradient orientations
within a local neighborhood. The face image is typically divided with multiple patches and the
SIFT descriptor is constructed from each patch. The final descriptor is obtained by concatenating
all the descriptors from all the patches. Figure 3.28 shows a schematic diagram of the above
SIFT descriptor construction process

17

You might also like