0% found this document useful (0 votes)
95 views

CISC 6080 Capstone Project in Data Science

Uploaded by

Yepu Wang
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

CISC 6080 Capstone Project in Data Science

Uploaded by

Yepu Wang
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

CISC 6080 Capstone Project in Data

Science
Yepu Wang
Fordham university
(86)13722799391

Abstract: With the rapid development of science and technology, face expression recognition is
applied in many intelligent fields. The scientific basis for rapid and accurate identification of a
person's identity through computer operation is the uniqueness of each person's facial
expression. Some scientists pointed out that there are seven kinds of facial expressions: happy,
sad, angry, disgusted, surprise, afraid and neutral. Facial expression recognition technology can
better realize the communication between intelligent machine and people. The data sets for this
project include Jaffe and FER-2013. For Jaffe, a simple CNN model is applied and achieve results
with 90.00% accuracy. For FER-2013, Visual Geometry Group – 16 (VGG-16) and mini_XCEPTION
are used.The classification is performed on the publicly available FER-2013 dataset of over
35,000 face images with in-the-wild setting for 7 distinct emotions with the provided 80%
training, and 20% testing data distributions. For model VGG-16, the test accuracy is 69.87%. For
mini_XCEPTION, the test accuracy is 64.82%. Finally, a simple
UI interface is designed using PyQt5. It includes three functions.First, the model file can be
selected and identified based on the model. Second, opening the camera to recognize the facial
expression. Third, selecting a face image to recognize the expression

Keywords:Jaffe,FER2013,CNN,VGG-16,mini_XCEPTION,UI interface

Introduction

这个项目主要做人脸表情识别。 用到的算法是 CNN, 主要使用的工具是 tensorflow 和


Keras. 数据集使用的是 JAFFE 和 FER2013. 关于 JAFFE,我设计了一个简单的 CNN 模型,并取得
了很好的表现。关于 FER2013,数据量要大很多,而且识别难度更大,所以我用了 VGG-16
和 mini_XCEPTION 两种模型,并得到了两个训练好的模型。考虑到数据量很大,而且模型
很复杂,所以我在 colab 和 Keggle 上使用 GPU 来进行运算。最后我设计了一个 UI 界面(使用
的 PyQt5), 可以选择模型,并根据模型来做人脸表情识别。

Background

人脸识别
With the rapid development of science and technology, face expression recognition is applied in
many intelligent fields. The scientific basis for rapid and accurate identification of a person's
identity through computer operation is the uniqueness of each person's facial expression. Some
scientists pointed out that there are seven kinds of facial expressions: happy, sad, angry,
disgusted 【 disˈɡəstəd 】 , surprised and afraid neutral 。 Facial expression recognition
technology can better realize the communication between intelligent machine and people. The
realization of computer understanding and recognition of facial expression can greatly change the
relationship between human and computer, so that the computer can better serve human
beings, so as to achieve better human-computer interaction.In application, facial expression
recognition technology has great potential application value in psychology, intelligent robot,
intelligent monitoring, virtual reality and synthetic animation.For example, in the retail industry,
we can identify the customer's expression to obtain his preference for goods, or we can
intelligently recommend appropriate advertisements through expression recognition to achieve
precision marketing.

关于 CNN 和分析工具

The main algorithm for this project is convolutional neural network. CNN is a class of deep
neural network, most commonly applied to analyze visual imagery. A convolutional neural
network consists of an input layer, hidden layers and an output layer. In any feed-forward neural
network, any middle layers are called hidden because their inputs and outputs are masked by the
activation function and final convolution. In a convolutional neural network, the hidden layers
include layers that perform convolutions. Typically this includes a layer that performs a dot
product of the convolution kernel with the layer's input matrix. This product is usually the
Frobenius inner product, and its activation function is commonly ReLU. As the convolution kernel
slides along the input matrix for the layer, the convolution operation generates a feature map,
which in turn contributes to the input of the next layer. This is followed by other layers such as
pooling layers, fully connected layers, and normalization layers.

TensorFlow™ is an open source software library for numerical computation using data flow
graphs. Nodes in the graph represent mathematical operations, while the graph edges represent
the multidimensional data arrays (tensors) communicated between them. The flexible
architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server,
or mobile device with a single API. TensorFlow was originally developed by researchers and
engineers working on the Google Brain Team within Google's Machine Intelligence research
organization for the purposes of conducting machine learning and deep neural networks
research, but the system is general enough to be applicable in a wide variety of other domains as
well.

Keras
Keras is a deep learning API written in Python, running on top of the machine learning
platform TensorFlow. It was developed with a focus on enabling fast experimentation. Being able
to go from idea to result as fast as possible is key to doing good research.

Data sets
Jaffe
The JAFFE dataset consists of 213 images of different facial expressions from 10 different
Japanese female subjects. Each subject was asked to do 7 facial expressions (6 basic facial
expressions and neutral).

Fer2013 contains approximately 30,000 facial RGB images of different expressions with size
restricted to 48×48, and the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust,
2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The Disgust expression has the minimal number
of images - 600, while other labels have nearly 5,000 samples each. Most of the images in the
dataset are rotated in plane and non plane, and many images are blocked by occlusions such as
hands, hair and scarves. Because there are errors for this data set, the artificial recognization
accuracy of this database is 65% ± 5%.

Jaffe FER2013

Experiments

Experiments for JAFFE


The face region is cut out to remove redundant interference, and then the obtained face region is
scaled to an image of 48 * 48 size and stored as. CSV format data. We can view the saved
expression data and the number of partial expressions from face.csv.
As for Jaffe, a simple CNN model is adopted. The following figure shows the
structure of this model.
In order to prevent the network from fitting too fast, some image
transformations can be done, such as flipping, rotation, cutting, etc. This is
called data enhancement. Another advantage of such data operations is to
expand the amount of data in the database and make the trained network
more robust. I used the first 200 pictures of the data set as training data and
the last 13 as test data set.

Experiments for FER2013

Compared with Jaffe, fer2013 has a much larger amount of data, greater recognition difficulty and
lower recognition accuracy, so the previous simple CNN model performed poorly for fer2013.After
reading the relevant papers, I decided to use two models of vgg and mini_ Xception . These two
models were trained respectively. Due to the large amount of data and the long training time of
the model, GPU acceleration (kaggle, Colab) is used.

Some important used functions:


ImageDataGenerator. In order to make the most of our few training examples, we ‘augment’
them via a number of random transformations, so that our model would never see twice the
exact same picture. This helps prevent overfitting and helps the model generalize better.In Keras
this can be done via the keras.preprocessing.image.ImageDataGenerator class. This class allows
us to:configure random transformations and normalization operations to be done on your image
data during training. instantiate generators of augmented image batches (and their labels)
via .flow(data, labels) or .flow_from_directory(directory). These generators can then be used with
the Keras model methods that accept data generators as
inputs, fit_generator, evaluate_generator and predict_generator.

EarlyStopping
Earlystopping is a kind of callbacks. Callbacks are used to specify which specific operations are
performed at the beginning and end of each epoch. There are some set interfaces in callbacks
that can be used directly, such as ' acc ', 'val_acc' , 'loss' and ' val_loss'. Earlystopping is the
callbacks used to stop training in advance. Specifically, it is possible to stop continuing training
when the loss on the training set is not decreasing (that is, the degree of reduction is less than a
certain threshold). The early stop method aims to solve the problem that the number of epochs
needs to be set manually. It can also be regarded as a regularization method that can avoid over
fitting of the network.After each epoch (or after every n epochs): obtain test results on the
verification set. With the increase of epoch, if the test error is found to increase on the
verification set, stop training. Take the weight after stopping as the final parameter of the
network.

ReduceLROnPlateau
Reduce learning rate when a metric has stopped improving. After defining the learning rate, the
effect of the model will not be improved after a certain epoch iteration, and the learning rate
may no longer be suitable for the model. It is necessary to reduce the learning rate in the training
process, so as to improve the model. How to reduce the learning rate during training? We can
use the callback function ReduceLROnPlateau in keras. It is very convenient to use with
EarlyStopping. If the initial learning rate is too small, it will take many iterations to make the
model reach the optimal state, and the training is slow. If the learning rate is continuously
reduced in the training process, the optimal model can be obtained quickly and accurately.

ModelCheckpoint
ModelCheckpoint callback is used in conjunction with training using model.fit() to save a model
or weights (in a checkpoint file) at some interval, so the model or weights can be loaded later to
continue the training from the state saved.A few options this callback provides include:Whether
to only keep the model that has achieved the "best performance" so far, or whether to save the
model at the end of every epoch regardless of performance.Definition of 'best'; which quantity to
monitor and whether it should be maximized or minimized.The frequency it should save at.
Currently, the callback supports saving at the end of every epoch, or after a fixed number of
training batches.Whether only weights are saved, or the whole model is saved.

VGG16 models
VGg is improved from alexnet. VGg uses several consecutive 3x3 convolution cores to replace the
larger convolution cores (11x11, 7x7, 5x5) in alexnet. For a given receptive field (the local size of
the input picture related to the output), the use of stacked small convolution kernel is better than
the use of large convolution kernel, because multi-layer nonlinear layers can increase the
network depth to ensure the learning of more complex patterns, and the cost is relatively small
(fewer parameters).

In brief, in VGg, three 3x3 convolution cores are used to replace the 7x7 convolution core and two
3x3 convolution cores are used to replace the 5 * 5 convolution core. The main purpose of this is
to improve the depth of the network and the effect of the divine meridian network to a certain
extent under the condition of ensuring the same perception field.

mini_ Xception architecture is a full convolution neural network, which contains four residual
depth separable convolutions. For each convolution, batch standardization and relu excitation
function are added. The last layer uses global average pool and soft Max excitation function to
make prediction.

报错 笔记
mini_XCEPTION models

mini_ Xception Is Inspired by xception architecture, it combines the use of residual module and
deep separable convolution. The residual module modifies the desired mapping between the
following two layers, and the learned features become the difference between the original
feature map and the desired feature map. mini_ Xception structure removes the last full
connection layer and further reduces the number of parameters by eliminating the parameters in
the convolution layer. The architecture has about 60000 parameters, which is 80 times less than
the traditional convolution neural network.

Results
Jaffe
Train loss: 0.0002448219747748226
Train accuracy: 100.0
Test loss: 0.404933363199234
Test accuracy: 89.99999761581421
(keras)
(tensorflow)

FER2013

VGG16 mini_XCEPTION
reducing learning rate to1.000e-7 reducing learning rate to1.000e-8
early stopping at Epoch 83 early stopping at Epoch 113
Train loss:0.354729 Train loss:0.781446
Train accuracy:87.293184 Train accuracy:71.144938
Test loss:0.989961 Test loss:0.966201
Test accuracy: 69.866258 Test accuracy: 64.823073

VGG16
UI interface design
VGG-16
Experiments for FER2013

VGG16
mini_XCEPTION

Related Work
Conclusion
References

You might also like