A Model-Based Facial Expression Recognition
A Model-Based Facial Expression Recognition
Abstract—In this paper, we propose a new method for facial are popular because they can capture essential geometrical
expression recognition. We utilize the Candide facial grid and information for facial expression recognition [3]. An overview
apply Principal Components Analysis (PCA) to find the two of the state of the art can be found in [2]
eigenvectors of the model vertices. These eigenvectors along with
the barycenter of the vertices are used to define a new coordinate In this paper, we propose a novel model-based facial
system where vertices are mapped. Support Vector Machines expression technique which utilizes the Candide facial grid
(SVMs) are then used for the facial expression classification task. (Figure 1), deformed so as to match the apex of an expression.
The method is invariant to in-plane translation and rotation as So far, most model-based techniques use Candide vertex
well as scaling of the face and achieves very satisfactory results. displacements, like in [3] or the Euclidian distance of pairs
of Candide vertices [4]. In our case we use the location of the
Candide vertices. Such information is sufficient to describe the
I. I NTRODUCTION geometry of the face and the facial features, and thus, can lead
Facial expression recognition in video sequences and still to facial expression recognition. However, this information
images is a very important research topic with applications is also extremely vulnerable to affine transformations (e.g.
in human-centered interfaces, ambient intelligence, behavior translation) of the face. To remedy this problem, we utilize
analysis etc. In [1], Ekman has established, based on an an- principal components analysis in order to establish a new
thropological investigation, six main facial expressions (anger, coordinate system that utilizes the evaluated eigenvectors and
surprise, happiness, disgust, fear and sadness) which are used the vertices barycenter. As it will be proven later on, mapping
in order to communicate human emotions. In many cases the the vertices to the new coordinate system ensures the method’s
neutral state is included along with the six expressions. robustness to translation, rotation and isotropic scaling of the
Although humans can easily recognize these facial expres- face.
sions, this is not the case for algorithms that try to imitate The paper is organized as follows: in Section II, we present
this skill. During the last decade, many attempts involving a the PCA-based feature selection procedure and prove the
wide range of approaches have been undertaken to resolve robustness of the method towards translation, scaling and
this problem. Despite their diversity, most algorithms utilize rotation. In Section III, we describe the application of Support
information coming from the eyes, the mouth and the forehead Vector Machines (SVM) upon the selected features for the
region since these areas are considered the ones with the recognition of 6 or 6 + 1 different facial expressions and show
richest information for facial expressions recognition. experimental results on the Cohn-Kanade [5] database. Finally,
One consideration that has to be taken into account when conclusions are drawn in Section IV.
designing facial expression recognition algorithms is the fact
that a facial expression is a dynamic process that evolves
over time and includes three stages [2]: an onset (attack),
an apex (sustain) and an offset (relaxation) as described in II. P RINCIPAL C OMPONENTS A NALYSIS AND F EATURE
[2]. Many facial expression recognition algorithms operate S ELECTION
on the video frame (or still image) that corresponds to the The first step in the proposed approach is to track the
expression apex. Based on the input data type used, facial Candide facial grid from the onset to the apex of the facial
expression recognition algorithms can be classified in two expression in a video sequence. In order to do so, we perform a
main categories: image (or image feature)-based and model- manual localization of 7 vertices of the grid on the video frame
based ones. Each approach has its own merits. For instance, the corresponding to the onset of the facial expression as shown
image-based algorithms are faster, as no complex image pre- in Figure 2a. The rest of the Candide vertices are arranged
processing is usually involved. On the other hand, model-based in this frame through the application of a spring mesh-model.
approaches employ a 2-D or 3-D face model, whose fitting on Then, we use the Kanade-Lucas-Tomasi (KLT) algorithm to
the facial image [3] implies significant computational cost. track the grid vertices to the facial expression apex, as shown
Despite their computational effort, model-based approaches in Figure 2b. It has been proven that only 67 out of the
27 25 50 52 163 27
39 40
41 47 42
2667 5
6 51
28 53 49
337 58 48
31 56 44
43
17 46
55 45
30
35 4
8
5
9
32 15
57 26
10 6
(a) (b)
3302
other words, the eigenvectors of the rotated points are those Many SVMs variants exist. These include both linear and non
of the original points rotated by the same rotation matrix. linear forms, with different kernels being used in the latter.
Furthermore, the eigenvalues of the set of the rotated points Six and seven class multiclass SVMs were used in our case.
are the same as the ones of the original matrix, i.e. λi = λi . We use the Cohn-Kanade [5] facial expressions database
For the vector Xnew contining the X coordinates (in the new in order to evaluate our method. Firstly, we have initialized
coordinate system) of the rotated points, one can easily see the Candide grid onto the onset frame of the database videos
that: and tracked it till the facial expression apex frame. In our
1 experiments, we used only the apex phase of the expression.
Xnew = · P T · v2 =⇒ (8) The method was applied to 440 video frames (resulting from
λ2
1 an equal number of videos): 35 for anger 35 for disgust 55
T
Xnew = √ · (RP ) · Rv2 =⇒ (9) for fear 90 for happiness 65 for saddness 70 for surprise
λ2
and finally 90 for the neutral state. We have conducted
1
Xnew = √ · P T · RT · Rv2 =⇒ (10) experiments for the recognition of either 6 or 6+1(neutral)
λ2 facial expressions.
1
Xnew = √ · P T · v2 =⇒ (11) We shall first present the experiments for 6 facial expres-
λ2 sions. In order to establish a test and a training set, we
Xnew = Xnew (12) used a modified version of the leave-one-out cross-validation
The same holds obviously for Ynew
and Ynew . Thus the points procedure, where, in each run, we exclude 20% of the grids
coordinates in the defined coordinates system do not change for of each facial expression from the training set and use
with rotation and thus our feature are invariant to in-plane them to form the test set. Thus, in order to process all
rotation of the face. data, five runs were conducted and the average classification
Finally, invariance with respect to isotropic scaling can be accuracy was calculated. In Table I, results are drawn from
proven as follows: by scaling in both dimensions with the
Kernel Degree Recognition Rate
same factor s it is easy to see that the new set of points will RBF 3 88.69%
be P = s · P with covariance matrix Σ = s2 · Σ. The RBF 5 88.18%
eigenanalysis of Σ provides that: RBF 4 88.15%
RBF 7 87.79%
Σ · vi = λi · vi =⇒ (13) RBF 8 87.79%
2 2 RBF 6 87.59%
s · Σ · vi = (s λi ) · vi =⇒ (14) RBF 2 87.26%
Σ · vi = λi · vi (15) Polynomial 3 88.71%
Polynomial 2 88.17%
where λi = s2 · λi the eigenvalues of the scaled points. The Polynomial 4 87.70%
above equation also implies that Σ has (as expected) the same TABLE I: Results for radial basis function (RBF) and poly-
eigenvectors v1 , v2 with Σ. From (2) and (3) we have that: nomial kernels with different degrees.
1
Xnew = · (s · P )T · v2 (16)
λ2
classifiers involving different parameters and kernels. It can
1 be seen that the results are practically the same for all
= · (s · P )T · v2 (17)
s2 · λ2 tested SVM configurations. The confusion matrices for the
1 1 RBF and polynomial kernel parameters that achieved the
= · · (s · P )T · v2 (18) best performance are depicted in Tables II and III. The fact
s λ2
1 that for different kernels and different parameters we obtain
= √ · P T · v2 (19) the same results is obviously an advantage for the proposed
λ2
= Xnew (20) algorithm, since it potentially indicates good generalization
properties. This desirable behavior can be attributed to the
Thus the x coordinates of the scaled points in the new utilized features and is particularly important since many
coordinate system will remain unaltered. The same holds classification algorithms, using SVMs, suffer in generalization
obviously for Ynew . due to the known sensitivity of the SVMs with respect to their
parameters [10].
III. FACIAL E XPRESSION C LASSIFICATION E XPERIMENTS We performed experiments for the 6+1 expressions as well
We use Support Vector Machines (SVMs) classifiers for (Tables IV, V and VI). For most algorithms, the classification
recognizing facial expressions classes. SVMs were chosen accuracy when recognizing 6+1 expressions is worse than
due to their good performance in various practical pattern the one in the 6 classes case. In our case though, there is
recognition applications [6]-[9], and their solid theoretical a slight performance improvement on the overall accuracy
foundations. which can be perhaps attributed to the nature of the feature
SVMs minimize an objective function under certain con- space. Most probably in this space the neutral class is far
straints in the training phase, so as to find the support vectors, from all other classes and, thus, does not alter the 6 class
and subsequently use them to assign labels to the test set. results significantly. This fact can be noticed in the confusion
3303
Ang Dis Fea Hap Sad Sur Ang Dis Fea Hap Sad Sur Neu
Ang 84.8 0 0 0 8.9 0 Ang 81.2 3.3 0 0 11.2 0 0
Dis 6.1 93.3 2.2 2.2 2.5 1.4 Dis 6.2 93.3 2.2 1.0 3.4 1.4 0
Fea 0 0 89.1 11.0 3.8 0 Fea 0 0 88.9 11.0 3.4 0 0
Hap 3.0 6.7 6.5 86.8 3.8 1.4 Hap 6.2 3.3 6.7 88.0 3.4 1.4 0
Sad 6.1 0 0 0 81.0 0 Sad 6.2 0 0 0 77.5 0 0
Sur 0 0 2.2 0 0 97.1 Sur 0 0 2.2 0 0 97.1 0
Neu 0 0 0 0 1.1 0 100.0
TABLE II: Confusion matrix for polynomial kernel of 3rd
degree TABLE VI: Confusion for polynomial kernel of 1st degree
(6+1 expressions).
matrices where the rates for the 6 classes are practically the ACKNOWLEDGMENT
same as in the previous experiments. On the other hand, as
The research leading to these results has received funding
the neutral class exhibits 100% recognition rate the overall
from the European Community’s Seventh Framework Pro-
accuracy of the algorithm increases. The overall classification
gramme (FP7/2007-2013) under grant agreement no 211471
accuracy for the cases of Tables IV-VI is 90.22%, 90.02% and
(i3DPost).
89.49% respectively.
3304