Background Study and Literature Review
Background Study and Literature Review
Background Study and Literature Review
1.1 INTRODUCTION
Upon this continuously developing era, facial expression and recognition had become
most significant topic to be discussed over the past two decade and researcher have come out
with many ideas and methods to overcome the problems on wide variation in face
appearance. Despite the success of some of these systems in constrained scenarios, the
general task of face recognition still poses a number of challenges with respect to changes in
contaminated noise, facial expression, occlusion and pose that vary in complexity, intensity
and meaning.
Basically the task, given as input the visual image of a face, is to compare the input
face against models of faces stored in a library. By selecting correctly between a linear or
nonlinear filter for noise reduction process we could proceed towards efficient facial
extraction and effective facial emotion recognition. Facial feature detection and tracking is
important in vision related applications such as head pose estimation and facial expression
analysis. Head orientation is related to a person’s direction of attention; it can give us useful
information about what he or she is paying attention to. Feature’s pose estimation module
finds the two eyes, lips, nose, mouth and certain regions and estimates the pose rotation
parameters out of the image plane [1] . The method is template-based, with several of facial
feature templates covering different poses and different people. There are some cases when
image taken under public area containing various object making the system falsely detect a
non-face image and also problems to differentiate inter and intra class of twin individual
variability.
Figure 1: Image pre-processing
Using Empirical Mode Decomposition (EMD) to detect seven facial emotions, this method
will decompose non-stationary and non-linear data into several set of small frequency known
as intrinsic mode function (IMFs) while minimize the effect of noise when extracting facial
expression [2]. By using Principal Component Analysis (PCA) to minimize dimension of
data space, features extraction process become more easily. The last step to acquire desired
data is using SVM as the classifier for recognition of head pose. Coordination of this two
powerful tools is suspected to give a significant good result respectively.
PROBLEM STATEMENT
The other was to differentiate inter-class and intra-class. Actually every human has a
very unique individual iris pattern [4]. It is like an identification tag embedded permanently
in oneself and there is research work [5], [6], [7], [8] to prove the stability of iris over a
person’s life. Moreover, by using iris-based identification system which included the retinal
scan, this can be non-invasive. Within all the properties and robustness of the iris recognition
system, this will make a promising problem solution to security.
Partial occlusion by another object also pose another threat in many face recognition
applications. Normal human beings should able to recognize another person even when
wearing scarf, sunglasses, cap or other object. However, a machine didn’t have the adaptive
ability to extract face features when half of face region are covered and might degrade
recognition performance. Many tends to solve this problem by segmenting the occluded parts
of the image and disregarding their corresponding features [9].
OBJECTIVE
This research purpose is to build a facial recognition system based on three things:
(1) Clarify various noise such as Salt & Pepper, Gaussian, Poisson and Speckle and
appropriate denoising filter.
(2) Identify the expression transformation of emotion using 2D-Empirical Mode
Decomposition(EMD)
(3) To analysed the performance of features extraction process using KNN and Support
Vector Machine(SVM) method.
SCOPE
LITERATURE REVIEW
An overview of this paper work are based on previous researcher method to implement face
recognition in a system. Under highly contaminated environment, there are certainly global
noise reside in any digital images taken causing low accuracy in face authentication or worse
corrupted. Those was a huge setback in a progress of further pre-processing step. Denoising
method is needed to minimize an error in data segmentation and allow the system to be
precise. Benefited from previous problem, EMD redistribute the effect of noise, expression
changes, and illumination variations when the input facial image is described by the selected
IMF components. To minimize the dimension of vast features space, PCA are used. Lastly,
SVM will classify the margin of boundary separated by hyperspectral plane based on
structural risk minimization, which is the expectation of the test error for the trained machine.
As the method presented by Moghaddam and Pentland and Osuna et al. to solve only frontal
face only, Sung and Poggio, and Rowley et al. are working on face detection. Sung and
Poggio [17] also work on a NN based face detection system. They designed faces and non-
faces database in the layer between input neuron and output neuron. Then the data undergo
supervised learning to determine the appropriate weighted of these data to the output node.
Rowley et al. introduced a NN based upright frontal face detection system as mention at
project scope.
This work was later extended to rotation invariant face detection by designing an
extra network to estimate the rotation of faces in the image plane. Gong et al. [18] used
general and modified hyper basis function (HBF) networks with Gaussian mixture models to
estimate the density function of face space with large pose variation. As a result, face
recognition can be performed more successfully than with either of the linear models.
McKenna et al. presented an integrated face detection-tracking system where a motion-based
tracker is used to reduce the search space and Multi-Layer Perceptron (MLP) based face
detection is used to resolve ambiguities in tracking. Feature-based methods have also been
extensively addressed in previous work.
2..2.1 Noisy Environment: Noise Analysis
Noise is an inevitable part of real world signals. In controllable environment, noise will be
reduced dramatically, and there is no need to involve all parameter to detect and eliminate
noise. In real time application, image taken under unconstrained environment will do no good
toward signal processing. Every electrical component exhibit disturbances and ruin image
quality [19]. Before introducing suitable filtering method for images denoising, its important
to look onto previous research on optimized 2D filter for gray image denoising problem to
highlight down several scheme for further discussion.
Salt & Pepper are basically an impulse noise caused by malfunctioning pixels in camera
sensors, faulty memory locations in hardware, or disrupted transmission frequency in a noisy
channel. The objective of filtering is to minimize signal distortion and remove the impulses
so that the noise-free image is fully recovered. Median-based filters have attracted much
attention because of their simplicity and their capability of preserving image edges [20].
However, there are main drawback of a standard median filter (SMF) is that it is effective
only for low noise densities. Unfortunately, at high noise densities, SMFs often exhibit
blurring for large image scale and insufficient noise suppression for small image scale (only
appropriate image scale will result in efficient image noise suppression. Applying median
filter unconditionally across the entire image as practiced in the conventional schemes would
inevitably alter the intensities and remove the signal details (originality of images) of
uncorrupted pixels. Therefore, a noise-detection process to discriminate between uncorrupted
pixels and the corrupted pixels prior to applying nonlinear filtering is highly desirable [21].
White Gaussian noise also known as thermal noise is due to random fluctuations of electrons.
”ISO level” control the amount of amplification on digital camera and so does when the
photosensor is amplified, the noise also amplified [22]. White noise will degrade the
performance of the edge detection and insufficient result is obtained because the White-
Gaussian noise will be count as edge and edge is detected discontinuously. By implementing
proper edge detection algorithm to detect the edge for the image that is corrupted by White-
Gaussian noise it must reasonably consider White-Gaussian noise reduction and accurate
location of edge. As propose by Qinhang He et al, the first step was to choose approximate
threshold value for image segmenting as template image, by repeating image segmentation
process using different value of threshold, this will result in image’s pixel points that are
erroneously segment minimize least-squares (LS) functional [23].
Medical images are usually corrupted by noise in its acquisition and transmission. This will
cause poor quality of images generated by ultrasonic imaging which is common method used
in medical imaging procedure. Thus, proper techniques are necessary to remove such noises
while retaining as much as possible the important signal features. The existence of speckle is
unpleasant since it discarded image quality and it affects the tasks of individual interpretation
and diagnosis.
Accordingly, speckle filtering is a central pre-processing step for feature extraction, analysis,
and recognition from medical imagery measurements. Previously a number of schemes have
been proposed for speckle mitigation [24]. An appropriate method for speckle reduction is
one which enhances the signal to noise ratio while conserving the edges and lines in the
image. Filtering techniques are used as preface action before segmentation and classification.
Shot noise or Poisson noise is a type of electronic noise which can be modelled by a Poisson
process. In electronic, Poisson noise originates from the discrete nature of electric charge.
Likely with white Gaussian noise, Poisson noise also occurs in photon counting in many
optical devices where it’s resemble with particle of light. Basically the speckles have the
nature of a multiplicative noise. An aperture radar often exhibits speckle noise [25] and they
propose a simple and effective method of smoothing speckle-corrupted images by a digital
computer, based on the recently developed sigma filter. The sigma filter is motivated by the
sigma probability of a Gaussian distribution. The pixel to be processed is replaced by an
average of those neighboring pixels having their gray level within two noise standard
deviations from that of the concerned pixel. Consequently, the speckles are suppressed
without blurring edges and fine detail.
There are many appearance-based face recognition methods such as Locality Preserving
Projections (LPP) purposed by Xiaofei et al. LPP finds an embedding that preserves local
information, and obtains a face subspace that best detects the essential face manifold structure
[26]. A LDP operator computes the edge response values in all eight directions at each pixel
position and generates a code from the relative strength magnitude. Since the edge
responses are more illumination and noise insensitive than intensity values, the resultant LDP
feature describes the local primitives including different types of curves, corners, and
junctions, more stably and retains more information.
Principal Component Analysis (PCA) has been widely applied to facial images to
extract features for recognition purposes [26]. PCA is an eigenvector method designed to
model linear variation in high-dimensional data. PCA performs dimensionality reduction by
projecting the original n-dimensional data onto the k (<< n)-dimensional linear subspace
spanned by the leading eigenvectors of the data’s covariance matrix. Its goal is to find a set of
mutually orthogonal basis functions that capture the directions of maximum variance in the
data and for which the coefficients are pairwise decorrelated. For linearly embedded
manifolds, PCA is guaranteed to discover the dimensionality of the manifold and produces a
compact representation. Turk and Pentland [28] use Principal Component Analysis to
describe face images in terms of a set of basis functions, or “eigenfaces.”
As mentioned in algorithm in face recognition, the idea was to use PCA to extract features.
The reason of choosing PCA because it coordinates well with SVM, the positive true result
from previous experiment seems promising. Further discussion on working sequences in PCA
will fully describe on chapter 3.
The first part of the identification process is Face feature extraction. Before describing the
featuring process, the another important operation related to face recognition must be
mentioned. A proper image face registration is essential for a good face-recognition
performance. This face registration process performs by using some facial detection
algorithms and some image pre-processing operations may be necessary which are mentioned
further.
First, the original face images have to be converted to the grayscale form. After that,
some contrast and illumination adjustment operations are performed. All face images must
be processed with the same illumination and contrast. Therefore, some histogram
equalization operations are performed on these images to obtain a satisfactory contrast.
Also, the facial images are often corrupted by various types of noise. So, process them with
the proper low-pass filters, for noise removal and restoration. The enhanced face images are
now ready for the featuring process.
LDSP (Local Distinct Star Pattern) [31]computes local feature of two four-bit binary patterns
for each pixel surrounded by eight neighbouring pixels. 4-bit binary pattern can have at most
16 different bit combinations. Each combination is considered as a separate bin. This pattern
named as LDSP (Local Distinct Star Pattern) of that pixel. After applying LDSP on each
pixel of the input image, histogram for each block is calculated. Feature vector is calculated
concatenating all histograms from 81 blocks. Calculating histogram from block contains
more location information than calculating histogram from the whole image.
Figure 7: Facial expression recognition system (FERS) framework
Empirical Mode Decomposition (EMD) [32], was early developed by Huang et al. for
analysing data from nonstationary and nonlinear processes. The EMD has received more
terms of applications and interpretations. The major advantage of the EMD is that the
derivation of the basic function is within the signal. Thus, the analysis is adaptive in contrast
to the traditional methods where the basic functions are fixed. The EMD is based on the
sequences of extraction toward energy accumulated with various intrinsic time scales of the
signal in ascending frequency mode. The method was expected to produce equal number of
signal with total sum of the intrinsic mode functions (IMFs). Another advantage using EMD
it is ability to perform signal denoising [10]. The method reconstructs the signal with all the
IMFs that were previously thresholded, as in wavelet analysis, or filtered. The filtering
scheme is based on an idea that most structures of the signal are working in order; lower
frequency components (last IMFs) and decrease toward high-frequency modes (first IMFs).
2.6 Face Classification Method
The contour specific region of faces such as mouth, eyebrow and eye are important subject in
facial classification. The idea was to differentiate a universal set of human emotion such as
happy, surprise, contempt, sadness, fear, disgust and anger. The presence of full-face
apprehension on every detail of facial structure are presented as a boundary on every types of
emotion. In this Eigen vector are compared with the training set to classify image in different
mood. This second last step was important to attain the desired output. So, here is the list of
literature studies on several face classifiers.
Mutual Subspace Method (MSM) can be used to identify faces using set of patterns. In MSM,
a set of patterns is represented as a low-dimensional subspace [33]. After comparing the input
subspace with the reference subspace representing template are done, calculation is made to
find their similarities which is defined by the minimum angle between the input subspace and
the reference subspace. Constrained Mutual Subspace Method (CMSM) is only extended
version of MSM. In CMSM, to extract effective features for identification, the project was to
input subspace and the reference subspace onto the constraint subspace.
Canonical correlations of two different image sets of the same object acquired in different
conditions proved to be a promising measure of similarity of the two sets [34]. This suggests
that by matching based on image sets one could achieve a robust solution to the problem of
object recognition even when the observation of the object is subject to extensive data
variations. However, it is further required to suppress the contribution to similarity of
canonical vectors of two image sets due to common environmental conditions (e.g., in
lightings, viewpoints, and backgrounds) rather than object identities. The optimal
discriminant function is proposed to transform image sets so that canonical correlations of
within-class sets are maximized while canonical correlations of between-class sets are
minimized in the transformed data space.
2.6.3 Manifold Discriminant Analysis (MDA)
A function is radial basis (RBF) [36] if its output depends on (is a non- increasing function
of) the distance of the input from a given stored vector. RBFs represent local receptors, as
illustrated below, where each input X n point is a stored vector used in one RBF. In a RBF
network one hidden layer uses neurons with RBF activation functions describing local
receptors. Then one output node is used to combine linearly the outputs of the hidden
neurons. The output of the vector is “interpolated” using the N; Number of vectors, where
each vector gives a contribution that depends on its weight and on its distance to another
point.
As nonparametric sample-based method [37], k Nearest Neighbors target nearest align for
two sequences without any prior information, i.e. unsupervised image sets alignment.
Referring to a few studies, recently, Wang et al.proposed an unsupervised alignment method
without correspondence, which learns a projection transforming instances from two
subspaces to a lower dimensional space, and simultaneously matches the local geometry
structures by the k nearest neighbors. When matching k neighbors of two points, the authors
considered all k! permutations to find the best match. However, this method take time to
settled [38].
Support Vector Machines is a new classification method for both linear and nonlinear data
[39]. It uses a nonlinear mapping to transform the original training data into a higher
dimension. With the new dimension, it searches for the linear optimal separating hyperplane
(i.e., “decision boundary”) with an appropriate nonlinear mapping to a sufficiently high
dimension, data from two classes can always be separated by a hyperplane SVM finds this
hyperplane using support vectors (“essential” training tuples) and margins (defined by the
support vectors)
2.7 Reason for Choose SVM and kNN
In this case of study SVM will provide optimal hyperplane for separate training pattern. Two
independent face database will be choose consisting a face database with varying expression,
illumination and occlusion. On the journal by O. D’eniz et al, they apply cross-validation
between those two set of face database and varying the number of coefficient (1 – Number of
Training Images). The result from the journal look promising as they attain positive rates of
recognition and prove the fact that SVMs are relatively insensitive to the representation
space. Although they face problems regarding convergence when the input coefficients had a
relatively high magnitude because of algorithm used, the result however remain in acceptable
range.
References
[1] J. J. a. Z. Y. Chunmei Qing, "Empirical Mode Decomposition-based Facial Pose Estimation Inside
Video Sequences," Optical Engineering, vol. Vol. 49, no. 3, March 2010.
[2] *. S. G. J. S. H. L. Yongmin Lia, "Support vector machine based multi-view face detection and
recognition," Image and Vision Computing, p. 413–427, 2004.
[4] T. T. Y. W. a. D. Z. Li Ma, "Efficient Iris Recognition by Characterizing Key Local Variations," IEEE
Transactions on Image Processing, vol. Vol. 13, no. 6, pp. 739-750, June 2004.
[5] R. Wildes, "Iris recognition: an emerging biometric technology," Proc. IEEE, vol. vol. 85, p. 348–
1363, Sept 1997.
[7] J. Daugman, "Biometric personal identification system based on iris," U.S. Patent, no. 5, 1994.
[9] A. Nabatchian, "Human Face Recognition," Direction du Patrimoine de l'édition, Canada, 2011.
[10] S. G. J. S. H. L. Yongmin Li*, "Support vector machine based multi-view face detection and
recognition," Image and Vision Computing, no. 22, p. 413–427, 2004.
[11] A. P. B. Moghaddam, "Probabilistic visual learning for object," IEEE Transactions on Pattern
Analysis and Machine, vol. Vol. 19, no. 7, p. 137–143, 1997.
[12] W. W. B. Moghaddam, "Beyond Eigenfaces: probabilistic matching for face recognition," in IEEE
International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 1998.
[17] T. P. K. Sung, "Example-based learning for view-based human face detection," Massachusetts
Institute of Technology, AI MEMO, 1994.
[18] E. O. P. L. S. Gong, "Appearance-based face recognition under large head rotations in depth," in
Asian Conference on Computer Vision, Hong Kong, 1998.
[19] H. Guo, "Face recognition and verification in unconstrained environments," in Department of
Computer Science, 2012.
[21] V. D. K.Aiswarya, "A NEW AND EFFICIENT ALGORITHM FOR THE REMOVAL OF HIGH DENSITY
SALT AND PEPPER NOISE IN IMAGES AND VIDEOS," in Second International Conference on
Computer Modeling and Simulation, India, 2010.
[22] "Image Noise and Filtering," in Lecture Notes: Image Processing Basics, 2012.
[25] J. S. Lee, "A Simple Speckle Smoothing Algorithm for Synthetic Aperture Radar Images," IEEE
TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, Vols. VOL. SMC - 13,, no. 1, pp. 85-89,
JANUARY/FEBRUARY 1983.
[27] K. N. P. a. A. N. V. Juwei Lu, "Face Recognition Using Kernel Direct Discriminant Analysis
Algorithms," IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. Vol.14, no. 1, pp. 117-126,
JANUARY 2003.
[29] T. J. a. O. C. Hasanul Kabir, "Local Directional Pattern Variance (LDPv): A Robust Feature
Descriptor for Facial Expression Recognition," The International Arab Journal of Information
Technology, vol. Vol. 9, no. 4, pp. 382-391, July 2012.
[30] P. A. S. D. Abhijeet S. Tayde, "Facial Expression Recognition Dealing With Different Expression
Variations," International Journal of Advanced Research in Computer and Communication
Engineering, vol. Vol. 4 , no. 5, pp. 102-108, May 2015.
[33] "Kernel Mutual Subspace Method for Robust Facial Image Recognition," Fourth Infrmational
Conjbmre on knowledge--Based Intelligent Engineerhg Systems & Allied Technologies, pp. 245-
248, Sept 2000.
[37] J. K. a. R. C. Tae-Kyun Kim, "Discriminative Learning and Recognition of Image Set Classes Using
Canonical Correlations," IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE, vol. Vol. 29, no. 6, pp. 1005-1018, JUNE 2007.
[38] S. S. H. Z. S. L. X. C. Zhen Cui, "Image Sets Alignment for Video-based Face Recognition," in
Scholl of Computer Science and Technology, Beijing, China.
[39] "Face recognition using independent component analysis and support vector machines,"
Pattern Recognition Letters, p. 2153–21 57, 2003.