Advances in Face Image Analysis: Theory and Applications
()
About this ebook
Advances in Face Image Analysis: Theory and applications describes several approaches to facial image analysis and recognition. Eleven chapters cover advances in computer vision and pattern recognition methods used to analyze facial data. The topics addressed in this book include automatic face detection, 3D face model fitting, robust face recognition, facial expression recognition, face image data embedding, model-less 3D face pose estimation and image-based age estimation. The chapters are also written by experts from a different research groups. Readers will, therefore, have access to contemporary knowledge on facial recognition with some diverse perspectives offered for individual techniques. The book is a useful resource for a to a wide audience such as i) researchers and professionals working in the field of face image analysis, ii) the entire pattern recognition community interested in processing and extracting features from raw face images, and iii) technical experts as well as postgraduate computer science students interested in cutting edge concepts of facial image recognition.
Related to Advances in Face Image Analysis
Related ebooks
OpenCV for Secret Agents Rating: 0 out of 5 stars0 ratingsIoT Development A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsIntel Galileo Essentials Rating: 0 out of 5 stars0 ratingsDrones and Autonomous Vehicles Third Edition Rating: 0 out of 5 stars0 ratingsConvolutional neural network Second Edition Rating: 0 out of 5 stars0 ratingsFuzzy Systems: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsPrivacy-Preserving Machine Learning Rating: 0 out of 5 stars0 ratingsRobots. The New Era. Living, working and investing in the robotics society of the future. Rating: 0 out of 5 stars0 ratingsBio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World Rating: 0 out of 5 stars0 ratingsBluetooth Low Energy (LE) The Ultimate Step-By-Step Guide Rating: 0 out of 5 stars0 ratingsAutonomous Vehicle A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsPractical Data Science with Python 3: Synthesizing Actionable Insights from Data Rating: 0 out of 5 stars0 ratingsLearning Windows 8 Game Development Rating: 0 out of 5 stars0 ratingsOpenCV Android Programming By Example Rating: 0 out of 5 stars0 ratingsLearn LLVM 12: A beginner's guide to learning LLVM compiler tools and core libraries with C++ Rating: 0 out of 5 stars0 ratingsBeagleBone for Secret Agents Rating: 5 out of 5 stars5/5The Effects of Cybercrime in the U.S. and Abroad Rating: 0 out of 5 stars0 ratingsMastering OpenCV 3 - Second Edition Rating: 0 out of 5 stars0 ratingsThe Kaggle Workbook: Self-learning exercises and valuable insights for Kaggle data science competitions Rating: 0 out of 5 stars0 ratingsFacial Recognition A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsPro TypeScript: Application-Scale JavaScript Development Rating: 4 out of 5 stars4/5Artificial General Intelligence A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsApplied Reinforcement Learning with Python: With OpenAI Gym, Tensorflow, and Keras Rating: 0 out of 5 stars0 ratingsMastering OpenCV with Practical Computer Vision Projects Rating: 0 out of 5 stars0 ratingsBeginning Anomaly Detection Using Python-Based Deep Learning: With Keras and PyTorch Rating: 0 out of 5 stars0 ratingsOpenCV 3.0 Computer Vision with Java Rating: 0 out of 5 stars0 ratingsOpenCart Theme and Module Development Rating: 0 out of 5 stars0 ratingsRust for the IoT: Building Internet of Things Apps with Rust and Raspberry Pi Rating: 0 out of 5 stars0 ratingsArtificial Intelligence Ethics and International Law: A TechnoSocial Vision of Artificial Intelligence in the International Life Rating: 0 out of 5 stars0 ratingsBare-Metal Embedded C Programming: Develop high-performance embedded systems with C for Arm microcontrollers Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5AI for Educators: AI for Educators Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 3 out of 5 stars3/5Summary of Super-Intelligence From Nick Bostrom Rating: 4 out of 5 stars4/5ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5Nexus: A Brief History of Information Networks from the Stone Age to AI Rating: 4 out of 5 stars4/5The Roadmap to AI Mastery: A Guide to Building and Scaling Projects Rating: 3 out of 5 stars3/5Writing AI Prompts For Dummies Rating: 0 out of 5 stars0 ratingsKiller ChatGPT Prompts: Harness the Power of AI for Success and Profit Rating: 2 out of 5 stars2/5Co-Intelligence: Living and Working with AI Rating: 4 out of 5 stars4/5AI Investing For Dummies Rating: 0 out of 5 stars0 ratingsChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from Rating: 5 out of 5 stars5/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Our Final Invention: Artificial Intelligence and the End of the Human Era Rating: 4 out of 5 stars4/5Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery Rating: 0 out of 5 stars0 ratingsThe Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5Coding with AI For Dummies Rating: 0 out of 5 stars0 ratings3550+ Most Effective ChatGPT Prompts Rating: 0 out of 5 stars0 ratings
Reviews for Advances in Face Image Analysis
0 ratings0 reviews
Book preview
Advances in Face Image Analysis - Bentham Science Publishers
PREFACE
Over the past two decades, many face image analysis problems have been investigated in computer vision and machine learning. The main idea and the driver of further research in this area are human-machine interaction and security applications. Face images and videos can represent an intuitive and non-intrusive channel for recognizing people, inferring their level of interest, and estimating their gaze in 3D. Although progress over the past decade has been impressive, there are significant obstacles to be overcome. It is not possible yet to design a face analysis system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Face recognition as an essential problem in pattern recognition and social media computing, attracts many researchers for decades. For instance, face recognition became one of three identification methods used in e-passports and a biometric of choice for many other security applications.
The E-Book Advances in Face Image Analysis: Theory and Applications
is oriented to a wide audience including: i) researchers and professionals working in the fields of face image analysis; ii) the entire pattern recognition community interested in processing and extracting features from raw face images; and iii) technical experts as well as postgraduate students working on face images and their related concepts. One of the key benefits of this E-Book is that the readers will have access to novel research topics. The book contains eleven chapters that address several topics including automatic face detection, 3D face model fitting, robust face recognition, facial expression recognition, face image data embedding, model-less 3D face pose estimation and image-based age estimation.
We would like to express our gratitude to all the contributing authors that have made this book a reality. We would also like to thank Prof. Denis Hamad for writing the foreword and Bentham Science Publishers for their support and efforts. A special thank goes to Dr. Ammar Assoum for providing the latex style file.
Fadi Dornaika
University of the Basque Country
Manuel Lardizabal, 1
20018 San Sebastián, Spain
Facial Expression Classification Based on Convolutional Neural Networks
Wenyun Sun¹, Zhong Jin¹, *
¹ School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Abstract
Research trends in Convolutional Neural Networks and facial expression analysis are introduced at first. A training algorithm called stochastic gradient descent with l2 regularization is employed for the facial expression classification problem, in which facial expression images are classified into six basic emotional categories of anger, disgust, fear, happiness, sadness and surprise without any complex pre-processes involved. Moreover, three types of feature generalization for solving problems with different classifiers, different datasets and different categories are discussed. By these techniques, pre-trained Convolutional Neural Networks are used as feature extractors which work quite well with Support Vector Machine classifiers. The results of experiments show that Convolutional Neural Networks not only have capability of classifying facial expression images with translational distortions, but also have capability to fulfill some feature generalization tasks.
Keywords: Alex-Net architecture, Backpropagation algorithm, CK-Regianini dataset, CK-Zheng dataset, Classification accuracy, CMU-Pittsburgh dataset, Combined features, Convolutional Neural Networks, Deep learning, Facial expression classification, Feature extraction, Feature generalization, Feature representation, Hidden layers, Pre-trained networks, Stochastic Gradient Descent, Supervised feature learning, Support Vector Machine, Trainable parameters, Translational invariance property.
* Address to Corresponding Author Zhong Jin:School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China; Tel: +86 25 84303280 Ext. 3071; Fax: +86 25 84317235; E-mail: [email protected]
INTRODUCTION
A feature extractor and a classifier are two essential modules in a conventional image pattern recognition system. A good feature extractor of image could produce a feature representation which has more discriminant information and less correlations than the original pixel data. There are quite a few popular techniques recently, e.g., the Scale-Invariant Feature Transform (SIFT) [1] and the Histogram of Oriented Gradients (HOG) [2]. On the other hand, a highly efficient classifier could perform its job well without any help of complex feature extractors. Nowadays, some highly efficient classifiers and good feature extractors based on deep learning have come out.
Convolutional Neural Networks
Remarkable achievements have been obtained by studies of classifiers for high dimensional image data in the last two decades. More and more attentions have been gotten by Convolutional Neural Networks (CNNs) which have become the representatives among other deep learning methods. Although CNNs were suggested in 1989 [3], efficient training algorithms were absent until the stochastic diagonal Levenberg-Marquardt algorithm for CNNs was proposed by LeCun et al. in 1998 [4]. A so-called LeNet-5 was designed by LeCun et al. It could classify handwritten digits and letters into categories without complex preprocesses.
There is some theoretical research which brings the state-of-the-art techniques to the classical CNNs recently, e.g. rectified linear unit [5], local contrast normalization [6], local response normalization [7] and dropout [8]. On the other hand, engineering studies have never been stopped. Handwritten character recognition [9, 10], natural image processing [7, 11], etc. are well-known engineering application of CNNs.
The most interesting work had been done by Krizhevsky et al. who won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 [7]. They achieved a top-5 error rate of 16.4% on the classification benchmark, which beat the second place result of 26.1% with handcrafted features.
In ILSVRC 2013, an approach from Zeiler et al. [11] improved the performance by visualizing hidden layers of CNNs. They found that Krizhevsky’s network has the ability of extracting features of different scale and complexity. This phenomenon shows the feature representation capability of CNNs evidently. A reliable feature extractor could be easily got by cutting the Soft-Max layer off at the end of CNNs and keeping the rest layer’s trainable parameters fixed. The features from one of each hidden layers, especially from the last one, could be used as the inputs of any other classifiers. In other words, when a classifier is trained, a feature extractor will be got at the same time. The extractor can be widely used for various purposes. Based on this view of point, Zeiler et al. proposed a theory of feature generalization. Abundant feature information is included in nature images which also have a large scale of categories. Thus, a pre trained feature extracting network for natural images could be applied to the processes of specific data conveniently. Finally, it is notable that these methods we mentioned above are usually implemented and accelerated by Graphics Processing Unit (GPU) based high performance computing techniques.
Facial Expression Analysis
In another domain, the research of classifying facial expressions was started by psychologists. In 1978, facial action coding system (FACS) [12] was proposed by Ekman et al. The well-known facial expression image dataset, called Cohn-Kanade (CK) [13, 14], built by the Robotics Institute of Carnegie Mellon University and Department of Psychology of University of Pittsburgh, contains a set of facial expression image sequences and their corresponding action unit (AU) codes.
In 1984, Ekman et al. continued their studies, and classified facial expressions into six categories by different emotions, i.e., anger, disgust, fear, happiness, sadness and surprise [15]. In the problem of facial expression analysis, especially in the classification case, feature extractors and classifiers should keep invariant to individuals and perspective projection distortions.
In the recent years, quite a few studies have been devoted to expression classification of static images or image sequences. Some are about specific handcrafted feature extraction algorithms [16, 17], and some are about classifiers which use plan 2-dimensional pixels data as their inputs [18, 19].
In the following sections, a gradient-based learning algorithm and a feature extraction technique are introduced. Then, several experiments are conducted. Finally, the entire studies are concluded, and more interesting work which is required to pay attention to is outlined in the last section.
GRADIENT-BASED LEARNING FOR CNNS
LeNet-5 [4] is a typical case of CNNs. It consists of layers called C1, S2, C3, S4, C5 and F6. Its feedforward process could be outlined as follows:
Firstly, the input image is transformed by 3-dimensional convolution with six kernels with size of 5*5*1, added by bias term, activated by tanh function. The first set of six feature maps called C1 will be got.
Secondly, a max-polling process is applied to C1, and the second set of feature maps called S2 will be obtained.
Thirdly, the layers of C3, S4 are generated by the same mechanism.
Fourthly, two fully connected layers called C5 and F6 are calculated by the same way as conventional neural networks. C5 could be considered as a convolutional layer or as a fully connected layer, since the output of C5 are feature maps with size of 1*1. In order to enable the classifier to reject unreasonable inputs, finally a Gaussian layer is use for computing the distance between 84-dimesion activation data of F6 and 10 fixed binary codes.
Finally, training progress could be simply understood as a fitting job with respect to F6 in order to minimize the Gaussian distance between F6 layer’s activation and its nearest binary code.
Fig. (1) shows the visualization of each layer for an input of digital image. From left to right, there are input layers, C1’s kernel, C2’s activation, C3’s kernel, C4’s activation, C5’s activation and F6’s activation, respectively. The activations of S2 and S4 have the same visualizations as C1’s and C3’s. In this demonstration, the image is classified into the category of 5 and the Gaussian distance between F6’s activation and its nearest cluster center is 0.237.
If we consider x0 as the inputs, {f1, f2, · · ·, fL} as the layers (including a loss layer), {w1, w2, ··· , wL} as the parameters(weights, biases etc.) and {x1, x2, ··· , xL} as the layers’ output, the feed forward calculation can be described as xn= fn(xn−1, wn), n { 1, 2, ···, L}.
Fig. (1))
Visualization of each layer in LeNet-5.
A multiple layered CNN with a loss function can be regarded as a system built by a cascade of transformation modules {f1(x0, w1), f2(x1, w2), ···, fL(xL−1, wL)} whose inputs and outputs are connected one after another.
The training process can be formulated as an optimization problem
where (x0,wL) is a pair of training data.
Each transformation is differentiable, thus the gradient-based learning algorithm can be employed for training CNNs. The gradients with respect to each transformation inputs and parameters can be formulated as follows:
where is the Jacobian of fn with respect to x evaluated at the point (xn-1 wn), and is the Jacobian of fn with respect to w evaluated at the same point. The Eq.(2) and Eq.(3) are the main ideas of Backpropagation (BP) algorithm. An improved learning algorithm called l2 regularization is commonly used for anti-overfitting. The new optimization problem is shown as follows:
The gradients for the new problem of Eq.(4) can be easily calculated on the base of Eq.(2) and Eq.(3) by
The stochastic gradient descent procedure to solve problem of Eq.(4) is summarized in Algorithm 1.
Algorithm 1 Stochastic Gradient Descent with l2 Regularization
1:Input: teaching data X, teaching labels Y, learning rate α, tolerance Є
2:Initialize:w1, w2,...,wL-1 : = tiny random values around zero
3:repeat
4:x0 := random subset of X feedforward
5:y ground truth := corresponding subset of Y
6:forn := 1 to Ldo
7:xn := fn(xn–1,wn)
8:forn := L to 1 do backpropagation
9: calculate ∂Eʹ/∂xn by Eq.(2) and Eq.(5)
10:forn := L – 1 to 1 do
11: calculate ∂Eʹ/∂wn by Eq.(3) and Eq.(6)
12:forn := 1 to L – 1do update weights
13:wn := wn – α∂Eʹ/∂wn
14:until Eʹ ≤ Є
15:returnw1,w2,...,wL-1
The Alex-Net [7] and Zeiler’s Net [11] are proposed based on LeNet-5. They may be regarded as the representative of the modern CNNs. They simplified some unnecessary processes of LeNet-5 and made some improvement such as normalization and anti-overfitting mechanisms. In the architecture of the Alex-Net, the layers are divided into two groups in which activations are calculated individually except both in the first layer and the last one. There are no relations between the two groups in order to reduce communication traffic between two GPUs. The overall network has a 150528-dimensional input layer, and the number of neurons in the remaining layers is given by 253440, 186624, 64896, 64896, 43264, 4096, 4096 and 1000 respectively. The way of dividing layer into groups, the number of neurons in each layers and the layer type are all alternative. Thus different architectures could be devised for different problems and for different GPUs. A compromise should be made between reducing the time cost of each epoch and improving the performance of each epoch, so as to make the network converges faster.
In a word, CNNs have four characteristics which are listed as follows:
Convolution / local receptive field is used for sharing weights.
Sub-sampling / pooling is used.
Network has 2-5 convolutional layers and 1-2 fully connected layers followed by.
Network can be learnt by hierarchical first order optimization algorithms, i.e. BP algorithm.
FEATURE GENERALIZATION
A supervised feature learning technique will be introduced in this section. When the input layer of a pre-trained CNN is set well, the hidden layer includes the representation of the input data. The deeper a hidden layer goes, the more complicated a feature is extracted. It may be difficult to get these meaningful information by simple handcrafted feature extractors. When a classifier is got, a corresponding feature extractor is obtained at the same time. For example, a reliable feature extractor can be got by removing the soft-max layer at the end of a pre-trained neural network.
The pre-trained network can be used in two ways. Both of the them work well, but they still have differences:
Using parts of the network as a fixed feature extractor with its parameters untouched. The extractor can work together with non-neural techniques.
Using parts of the parameters as the initial values of a new CNN [11], giving the opportunity to fine tune the transported parameters in another network.
In order to show the flexibility of this technique, the first way is used in this paper. Moreover, multiple datasets and multiply categories are used to investigate the performance of feature generalization. Here, a pseudocode of the basic version of the feature generalization is provided in Algorithm 2.
Algorithm 2 Feature Generalization + SVM Framework
1:Input: pre-trained CNN's parameters wn CNN's layer id k for feature extraction, CNN+SVM training data X, CNN+SVM training labels Y CNN+SVM test data X', CNN+SVM test labels Y'
2:x0:= X extract training features
3:forn := 1 to kdo
4:xn := fn(xn–1,wn)
5:F := xk
6:x0 := X' extract test features
7:forn := 1 to kdo
8:xn:= fn(xn–1,wn)
9:F' = xk
10: m := svmtrain(F, Y) train SVM
11:Yˆ' := svmpredict(m, F') test SVM
12: accuracy := count if(Yˆ' = Y') / size(Y') evaluate performance
13:return accuracy
It is notable that a feature extractor learnt from expression images will be only helpful to handle facial images, since the deep features contain some image structures of facial organs. These image structures had been demonstrated by Deconv-Net technique [11]. Another thing should be pay attention to is possible that using such a nice feature extractor for sex classification or age fitting problems may be not the best choice. These features have weak capacity of discrimination in these problems. This phenomenon will be shown later.
EXPERIMENTS
In this section, two sets of experiments were carried out:
Firstly, CNNs were used for classifying facial expression into six basic emotional expression categories, i.e., anger, disgust, fear, happiness, sadness and surprise.
Secondly, experiments on feature generalization had been done. The Support Vector Machine(SVM) classifiers converged rapidly with data obtained by CNN feature extractors, got good accuracies on test sets.
Datasets
CK-Regianini Dataset
The well-known CK facial expression dataset [13, 14] do not have complete emotion labels. CK-Regianini dataset which had been manually annotated by Regianini [20] was used instead. This dataset contains 97 subjects, and each subject has several expression sequences with resolution of 640*490. There are totally 487 sequences. The last 4 frames of each sequence which contain stable emotional expression were chosen. There were totally 1948 static images with their basic emotional expression labels for our experiments.
In order to reduce the computation cost, a squared region of interest (top:12 bottom:431 left:110 right:530) was simply specified to crop images in advance. Thus useless background data were mainly removed. Small translational distortions still existed. Finally, the cropped images were resized to 96*96 (see Fig. 2(a)).
Fig. (2))
Cropped samples.
As shown in Table 1, these images were divided into training set, validation set and test set by 70%, 15% and 15% respectively. The validation set was just used for observing the progress of training, and selecting the best training snapshot to prevent overfitting. Finally, 1948 samples were divided into 3 sets and 6 categories.
Table 1 Subsets Division of CK-Regianini dataset.
CK-Zheng Dataset
CK-Zheng dataset is cropped from CK dataset manually by Zheng et al. [19]. It contains 97 subjects. Each subject has several expression sequences with resolution of 96*96. The total count of the sequences is 415.
Similarly, the last 4 frames of each sequence were selected. 1660 static images were divided into training set, validation set and test set by 70%, 15% and 15% respectively.
CK-Zheng dataset and CK-Regianini datasets are basically identical except their cropping methods. Unlike CK-Regianini, all faces in the CK-Zheng dataset are registered carefully (see Fig. 2b).
CMU-Pittsburgh dataset
CMU-Pittsburgh dataset [21] played an important role in the experiments of feature generalization. It consists of 463 facial expression images with resolution of 60*70. All these images had been carefully annotated and cropped, and the background had been totally removed.
In order to keep the dimensions same in all datasets, CMU-Pittsburgh dataset is additionally resized to 96*96 and kept its aspect ratio unchanged (see Fig. 2c).
Experiments on CNN-based Facial Expression Classification
This series of experiments was designed for validating the performance of CNNs to solve facial expression classification problem.
Design
Alex-Net architecture was employed with its number of channels in the input layer modified for handling grayscale images. The size of mini-batch was defined as large as possible under the limitation of the memory. l2 regularization with a weight decay parameter of 0.0005 was used here. Momentum technique was not used. Learning rate was set at 0.01, and it would be scaled by 0.9773 per 100 iterations.
Although, CK-Zheng dataset can be easily handled since it has already been well registered, It was used just for comparing our methods with conventional approaches which reported in Zheng’s paper [19]. CK-Regianini dataset was also employed for examining the translational invariance property of the classifier. The classification accuracies were analyzed not only on entire sets but also on each sub-category (see Table 2 and Table 3).
Table 2 Classification results on CK-Zheng dataset.
Results and Analysis
Two networks were trained on CK-Zheng dataset and CK-Regianini dataset by the method we mentioned above, and converged at epoch 2673 and 1125 respectively. Accuracies of 98.79% and 98.62% on each test set were obtained. The accuracies on each sub-set and each sub-category are listed in (Table 2 and Table 3).
Table 3 Classification results on CK-Regianini dataset.
The results shown in Table 4 is much better than those reported in Zheng’s paper [19]. In total, 4 classification errors occurred on each test set of CK-Zheng dataset and CK-Regianini dataset. The error images, actual labels and predicted labels of these cases are illustrated in (Fig. 3).
Table 4 Results of type I feature generalization (generalizing to different classifier).