An Optical Flow-Based Approach To Robust Face Recognition Under Expression Variations
An Optical Flow-Based Approach To Robust Face Recognition Under Expression Variations
An Optical Flow-Based Approach To Robust Face Recognition Under Expression Variations
Abstract—Face recognition is one of the most intensively studied face recognition, such as SVM. Although applying an appro-
topics in computer vision and pattern recognition, but few are priate dimension reduction algorithm or a robust classification
focused on how to robustly recognize faces with expressions under technique may yield more accurate recognition results, they
the restriction of one single training sample per class. A con-
strained optical flow algorithm, which combines the advantages
usually require multiple training images for each subject. How-
of the unambiguous correspondence of feature point labeling and ever, multiple training images per subject may not be available
the flexible representation of optical flow computation, has been in practice.
developed for face recognition from expressional face images. In Some authors have proposed different approaches to deal
this paper, we propose an integrated face recognition system that with facial expression variations in face recognition. These
is robust against facial expressions by combining information algorithms can be roughly divided into two main categories:
from the computed intraperson optical flow and the synthesized
face image in a probabilistic framework. Our experimental results morphable model based and optical flow based. The basic idea
show that the proposed system improves the accuracy of face in the former category is to warp images to similar global
recognition from expressional face images. face geometries as the ones used for training. The concept of
Index Terms—Constrained optical flow, face recognition. separately modeling texture and geometry information has been
applied in active shape model and active appearance model
(ASM/AMM) [3], [4]. Face geometry is defined via a set of
I. INTRODUCTION feature points in ASM, while face texture can be warped to
the mean shape in AAM. Ramachandran et al. [15] presented
F ACE recognition has been studied for several decades.
Comprehensive reviews of the related works can be found
in [14], [21]. Even though the 2-D face recognition methods
preprocessing steps to convert a smiling face to a neutral face.
Li et al. [9] applied a face mask for face geometry normaliza-
tion and further calculated the eigenspaces for geometry and
have been actively studied in the past, there are still some in-
texture separately, but not all images can be well warped to a
herent problems to be resolved for practical applications. It was
neutral image because of the lack of texture in certain regions,
shown that the recognition rate can drop dramatically when the
like closed eyes. Moreover, linear warping was usually applied,
head pose and illumination variations are too large, or when
which was not consistent to the nonlinear characteristics of
the face images involve expression variations. Pose, illumina-
facial expression movements.
tion, and expression variations are three essential issues to be
The other category is to use optical flow to compute the face
dealt with in the research of face recognition. To date, there was
warping transformation. Optical flow has been used in the task
not much research effort on overcoming the expression varia-
of expression recognition [5], [18]. However, it is difficult to
tion problem in face recognition, though a number of algorithms
learn the local motion in the feature space to determine the
have been proposed to overcome the pose and illumination vari-
expression change for each face, since different persons have
ation problems.
expressions in different motion styles. Martinez [12] proposed
To improve the face recognition accuracy, researchers have
a weighting method that independently weighs the local areas
applied different dimension reduction techniques, including
which are less sensitive to expressional changes. The intensity
principle component analysis (PCA) [17], linear discriminant
variations due to expression may mislead the calculation of op-
analysis (LDA) [10], independent component analysis (ICA)
tical flow. A precise motion estimation method was proposed in
[1], discriminant common vector (DCV) [2], kernal-PCA,
[11], which can be further applied for expression recognition.
kernal-LDA [19], kernal-DCV [7], etc. In addition, several
However, the proposed motion estimation did not consider in-
learning techniques have been used to train the classifiers for
tensity changes due to different expressions.
In this paper, we focus on the problem of face recognition
from a single 2-D face image with facial expression. Note that
Manuscript received January 26, 2009; revised July 23, 2009. First published
August 28, 2009; current version published December 16, 2009. The associate this paper is not about facial expression recognition. For many
editor coordinating the review of this manuscript and approving it for publica- practical face recognition problem settings, like using a passport
tion was Dr. Margaret Cheney. photo for face identification at custom security or identifying a
C.-K. Hsieh and Y.-C. Chen are with the Department of Electrical En-
gineering, National Tsing Hua University, Hsinchu 30013, Taiwan, R.O.C.
person from a photo on the ID card, it is infeasible to gather
(e-mail: [email protected]; [email protected]). multiple training images for each subject, especially with dif-
S.-H. Lai is with the Department of Computer Science, National Tsing Hua, ferent expressions. Therefore, our goal is to solve the expressive
University, Hisnchu 30013, Taiwan, R.O.C. (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available online
face recognition problem under the condition that the training
at http://ieeexplore.ieee.org. database contains only neutral face images with one neutral face
Digital Object Identifier 10.1109/TIP.2009.2031233 image per subject. In our previous work [8], we combined the
1057-7149/$26.00 © 2009 IEEE
234 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2010
advantages of the above two approaches: the unambiguous cor- where the subscript denotes the location, is the concate-
respondence of feature point labeling and the flexible represen- nation vector of all the flow components and and all the
tation of optical flow computation. A constrained optical flow brightness variation multiplier and offset factors, and are the
algorithm was proposed, which can deal with position move- parameters controlling the degree of smoothness in the motion
ments and intensity changes at the same time when handling and brightness fields, is the set of all the discretized locations
the corresponding feature points. With our proposed constrained in the image domain, is the weight for the data constraint,
optical flow algorithm, we can calculate the expressional mo- and , , , , , , , and are the weights
tions from each neutral faces in the database to the input test for the corresponding components of the smooth constant
image, and estimate the likelihood of such a facial expression along - and - directions. In our application, we regard a neu-
movement. Using the optical flow information, neutral images tral face image and an expressional face image as two adjacent
in the database can be further warped to faces with the exact time instances in the above formulation.
expression of input image. In this paper, we propose to exploit The above quadratic energy function can be rewritten in a
the two different types of information, i.e., the computed optical matrix-vector form given by .
flow and the synthesized image, to improve the accuracy of face By setting the first order deviation to zero, minimizing this
recognition. Experimental validation on the Binghamton Uni- quadratic and convex function is equivalent to solving a large
versity 3-D Face Experssion (BU-3DFE) [20] Database is given linear system , and can be efficiently solved by
to show the improved performance of the proposed face recog- the incomplete Cholesky preconditioned conjugate gradient
nition system. Since we do not attempt to solve the automatic fa- (ICPCG) algorithm [16].
cial landmark localization problem in this paper, the facial land- However, the motion vectors of the facial feature points,
mark points in our experiment are labeled manually. which were used only as references for interpolating compu-
The remainder of this paper is organized as follows. We tation in ICPCG, cannot be guaranteed to be consistent in the
briefly review the constrained optical flow computational final converged optical flow. In order to guarantee the computed
technique in Section II. The proposed expression-invariant optical flow to be consistent with the motion vectors at these
face recognition system is presented in Section III. Section IV corresponding feature points, we modify the unconstrained
gives the experimental results by applying the proposed expres- optimization problem in the original formulation of the optical
sion-invariant face recognition algorithm. Finally, we conclude flow estimation to a constrained optimization problem [8] given
the paper in the last section.
as follows:
(6)
There are two parts in (6), i.e., the prior probability of the
expression motion , and the conditional probability of the
input image given the subject with the expression . Fig. 2. Illustration of decomposing input optical flows (OF ) to interperson
Fig. 4. Illustration of mask definition and warping: (a) reference image and
feature points, (b) initial mask, (c) input image, and (d) warped and sheared
mask for the input image.
Fig. 3. Symbolizations of (a) Syn and (b) OF 0 Syn operators. Fig. 5. Overall flowchart of the proposed system
Considering each pixel as an independent and normally dis- IV. EXPERIMENTAL RESULTS
tributed random variable, the conditional probability can be fur- Our experiments were performed on the Binghamton Uni-
ther defined as follows: versity 3-D Face Expression (BU-3DFE) Database [20]. The
BU-3DFE database contains face images and 3-D face models
of 100 subjects (56 females and 44 males), each with a neutral
(11) face and six different expressions (angry, disgust, fear, happy,
sad, and surprised) at different levels, from level 1 (weakest)
to 4 (strongest). Note that only the 2-D face images were used
where is the synthesized image , is in our experiments. All images are normalized according to the
the index of pixel, is the total number of valid pixels, and procedure described later in Section IV-A and resized to 200
is the standard deviation of the image intensities at the th pixel. 200 pixels. Fig. 6 shows the 25 normalized face images of one
Fig. 4 shows the mask definition used for specifying the valid subject after the normalization procedure.
pixels in the face images. We first defined the standard mask
image [Fig. 4(b)] from the global neutral face image [Fig. 4(a)]. A. Preprocessing
When there is an input image with expressions [Fig. 4(c)], the We manually labeled 21 feature points, including three points
mask is then warped according to the three feature points shown for each eyebrow and four points for each eye, one at the nose
HSIEH et al.: OPTICAL FLOW-BASED APPROACH TO ROBUST FACE RECOGNITION UNDER EXPRESSION VARIATIONS 237
Fig. 6. Sample images in BU-3DFEDB. The left-top most is the neutral face.
The others are the face images with angry, disgust, fear, happy, sad, and surprise
expressions in columns from left to right with increasing levels in rows from top
to bottom.
Fig. 8. (a) Recognition rates from the original images after PCA reduction
under different expressions and levels (with average recognition rate 56.88%).
(b) Recognition rates from the original images by direct subtraction under dif-
ferent expressions and levels (with average recognition rate 60.71%).
Fig. 7. (a) Face region selection. The 21 feature points on (b) a neutral face
image and (c) a surprised face image.
tip and the other six around the mouth region. With the labeled
points, the distance between the outer corners of both eyes is Fig. 9. Recognition rates from weighted optical flow result proposed by Mar-
used as the reference to normalize face images (0.5, 0.5, 0.5, and tinez [12] (with average recognition rate 67.36%).
1.5 times to left, right, top, and bottom, respectively), which is
depicted in Fig. 7.
also implemented Martinez’s method [12] with the optical flow
B. Benchmark Test obtained by our proposed method. In the experiment, we adopt
Since our goal is to solve the expressive face recognition the weighting as , where is the magni-
problem under the restriction of a single training sample per tude of optical flow at the th pixel and .
class, the training database in the benchmark contains only neu- The average recognition rate is 67.36%, and the recognition re-
tral face images with one neutral face image per subject, i.e., sults are shown in Fig. 9. Even though the time consumption is
100 training face images for 100 classes in total. There are 24 much longer, the recognition rate is only slightly improved.
expression variant images for each subject, and we used totally
2400 images for all 100 subjects for testing. C. Face Recognition With the Proposed System
In the training phase for the original data, we first use PCA In our proposed system, an extra training dataset is needed
to compute the low-dimensional vector for all the 100 neutral for constructing the prior distribution for the expression motion
images for all subjects. In the training phase, we use all the neu- as described in Section III-B. Among the BU-3DFE database,
tral images of all subjects in the training dataset for computing 34 subjects are randomly selected for intraperson optical flow
the PCA subspace by preserving 95% energy, which yields 51 training, and the remaining 66 subjects are used as the testing
eigenvectors. In the testing phase, the input image is projected set. In (9), the dimensions of intraperson optical flows are re-
to the PCA subspace and classified by the nearest neighbor clas- duced by using PCA with 99% energy preservation and all the
sifier. The average recognition rate is 56.88%, used as a bench- Gaussians are equally weighted.
mark, and the recognition results for all different expressions As described in the previous section, we follow the flow-
and levels are shown in Fig. 8(a). We can see that the surprise chart shown in Fig. 5. Some experimental images are shown
and disgust expressions give the worst results, and the result in- in Fig. 10. We apply a mask, shown in Fig. 10(b), and defined
dicates that higher facial expression levels lead to worse recog- from the global neutral face [Fig. 10(a)], to extract the re-
nition rates. This is consistent with the statement in [6] that gion of interest. Moreover, the region inside the mouth is dis-
the correlation between an image and another image is di- carded, as illustrated in Fig. 10(e). Both the optical flow and
rectly related to the Euclidean distance in the original space, the grayscales of the synthesized image within the mask are
i.e., , if all images are normalized to zero mean and unit used in the face recognition process. For an input image, as de-
variance. Fig. 8(b) shows the result of direct subtraction without picted in Fig. 10(d), we first position the corresponding mask
PCA preprocessing, which is slightly better than Fig. 8(a). We [Fig. 10(e)] to obtain the masked image [Fig. 10(f)]. After that,
238 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2010
Fig. 13. Recognition rates using the integrated information, including the syn-
thesized images and dimension-reduced intraperson optical flow (with average
recognition rate 94.44%).
Fig. 15. Recognition rates using the integrated information, including the syn-
thesized images and original intraperson optical flow (with average recognition
Fig. 11. Recognition rates using the synthesized face images only (with av- rate 91.41%).
erage recognition rate 85.86%).
TABLE I
COMPUTATIONAL COMPLEXITIES FOR A TESTING PROCEDURE.
NOTE THAT C IS THE TOTAL NUMBER OF CANDIDATES
REFERENCES
[1] M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, “Face recognition
by independent component analysis,” IEEE Trans. Neural Netw., vol.
13, no. 6, pp. 1450–1464, Nov. 2002.
[2] H. Cevikalp, M. Neamtu, M. Wilkes, and A. Barkana, “Discrimina-
tive common vectors for face recognition,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 27, no. 1, pp. 4–13, Jan. 2005.
Fig. 18. Recognition rates using feature points and normalized face images [3] T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Active shape
with noise and the integrated information (with average recognition rate models—Their training and application,” Comput. Vis. Image Under-
93.69%). stand., vol. 61, pp. 18–23, 1995.
[4] T. Cootes, G. J. Edwards, and C. Taylor, “Active appearance models,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, pp. 681–685, 2001.
[5] I. A. Essa and A. Pentland, “A vision system for observing and ex-
the training data size is increased, while the recognition rates of tracting facial action parameters,” in Proc. IEEE Conf. Computer Vi-
using only synthesized images are independent of the training sion Pattern Recognition, 1994, pp. 76–83.
data size. [6] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed.
The performance of the proposed face recognition system New York: Academic, 1990.
[7] Y. H. He, L. Zhao, and C. R. Zou, “Kernel discriminative common
with disturbances of feature point detection for image normal- vectors for face recognition,” in Proc. Int. Conf. Machine Learning and
ization is examined. To simulate the inaccuracies in feature Cybernetics, Guangzhou, China, Aug. 18–21, 2005, pp. 4605–4610.
point extraction during image normalization, we introduce an [8] C.-K. Hsieh, S.-H. Lai, and Y.-C. Chen, “Expression-invariant face
error in the outward direction for the locations of the outer recognition with accurate optical flow,” in Proc. PCM, Hong Kong,
Dec. 11–14, 2007.
corners of the eyes. This will generate faces in a smaller [9] X. Li, G. Mori, and H. Zhang, “Expression-invariant face recognition
size compared to the original normalized images (Fig. 17). with expression classification,” presented at the 3rd Canadian Conf.
With small facial feature misalignment, the recognition rate is Computer and Robot Vision, Jun. 2006.
[10] A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Trans. Pat-
slightly reduced as depicted in Fig. 18.
tern Anal. Mach. Intell., vol. 23, no. 2, pp. 228–233, Feb. 2001.
[11] A. M. Martinez, “Matching expression variant faces,” Vis. Res., vol. 43,
pp. 1047–1060, 2003.
V. CONCLUSION [12] A. M. Martinez, “Recognizing expression variant faces from a single
In this paper, we proposed a 2-D expression-invariant face sample image per class,” presented at the IEEE Conf. Computer Vision
and Pattern Recognition, Jun. 2003.
recognition system based on integrating the optical flow infor- [13] S. Negahdaripour, “Revised definition of optical flow: Integration of
mation and image synthesis. Only one neutral image for each radiometric and geometric cures for dynamic scene analysis,” IEEE
candidate subject is needed in our face recognition system. Trans. Pattern Anal. Mach. Intell., vol. 20, no. 9, pp. 961–979, Sep.
Two kinds of intraperson optical flow fields, and 1998.
[14] A. Rama and F. Tarres, “Partial LDA VS partial PCA,” presented at the
, were computed and used for expression motion Int. Conf. Multimedia and Expo., Ontario, Canada, Jul. 2006.
likelihood calculation and expressive image synthesis, re- [15] M. Ramachandran, S. K. Zhou, D. Jhalani, and R. Chellappa, “A
spectively. The proposed algorithm combines the face image method for converting a smiling face to a neutral face with applica-
tions to face recognition,” in Proc. ICASSP, Mar. 2005, pp. 18–23.
comparison and optical flow prior information in a probabilistic
[16] C.-H. Teng, S.-H. Lai, Y.-S. Chen, and W.-H. Hsu, “Accurate optical
MAP framework. As shown from the experimental results, the flow computation under non-uniform brightness variations,” Comput.
proposed face recognition system significantly improves the Vis. Image Understand., vol. 97, pp. 315–346, 2005.
240 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2010
[17] M. A. Turk and A. P. Pentland, “Face recognition using Eigenfaces,” Shang-Hong Lai (M’95) received the B.S. and
in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Maui, M.S. degrees in electrical engineering from National
HI, Jun. 1991, pp. 586–591. Tsing Hua University, Hsinchu, Taiwan, R.O.C.,
[18] Y. Yacoob and L. S. Davis, “Recognizing human facial expressions and the Ph.D. degree in electrical and computer
from long image sequences using optical flow,” IEEE Trans. Pattern engineering from University of Florida, Gainesville,
Anal. Mach. Intell., vol. 18, no. 6, pp. 636–642, Jun. 1996. in 1986, 1988, and 1995, respectively.
[19] M.-H. Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face recog- He joined Siemens Corporate Research in
nition using kernel methods,” in Proc. Int. Conf. Automatic Face and Princeton, NJ, as a member of technical staff in
1995. Since 1999, he has been a faculty member
Gesture Recognition, Washington, DC, May 2002, pp. 215–220.
in the Department of Computer Science, National
[20] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato, “A 3D facial ex-
Tsing Hua University. He is currently a Professor
pression database for facial behavior research,” in Proc. Int. Conf. Au-
in the same department. In 2004, he was a visiting scholar with Princeton
tomatic Face and Gesture Recognition, Apr. 2006, pp. 211–216. University. His research interests include computer vision, visual computing,
[21] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face recog- pattern recognition, medical imaging, and multimedia signal processing. He has
nition: A literature survey,” ACM Comput. Surv., vol. 35, no. 4, pp. authored more than 130 papers published in the related international journals
399–458, Dec. 2003. and conferences. He holds ten U.S. patents for inventions related to computer
vision and medical image analysis.
Dr. Lai has been a member of program committee of several international
conferences, including CVPR, ICCV, ECCV, ACCV, ICPR, and ICME.