0% found this document useful (0 votes)
46 views4 pages

Pan2016 PDF

Uploaded by

Marcelo Cobias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views4 pages

Pan2016 PDF

Uploaded by

Marcelo Cobias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Recognition Oriented Facial Image Quality Assessment via

Deep Convolutional Neural Network

Cenhui Pan, Bingbing Ni, Yi Xu, Xiaokang Yang


Shanghai Jiao Tong University
Shanghai, China
[email protected]

ABSTRACT cial image is often the main reason for low recognition accu-
Quality of facial images significantly impacts the perfor- racy. Provided that modern facial recognition system often
mance of face recognition algorithms. Being able to predict takes video (a sequence of facial images) as input, one might
“which facial image is good for recognition” is of great impor- image that whether we can automatically assess the face im-
tance for real application scenarios, where a sequence of fa- age quality for every input frame and only keep the “best”
cial images are always presented and one can select “the best one for the subsequent face recognition task. Note that this
quality” image frame for the subsequent matching and recog- facial imagine quality assessment task is significantly differ-
nition task. To this end, we introduce a novel facial image ent with previous general image quality evaluation metric.
quality automatic assessment framework directly targeting Namely, conventional image quality assessment is based
on “selecting better face image for better face recognition”. on user experience. In contrast, the facial image quality
For such as purpose, a deep convolutional neural network metric considered in this work directly targets on the fi-
(DCNN) is trained to output a general facial quality metric nal recognition performance. To the best of our knowledge,
which comprehensively considers various quality factors in- there is no comparative study on such techniques before.
cluding brightness, contrast, blurriness, occlusion, pose etc. A large body of research have been devoted to biometric
Based on this trained facial quality metric network, we are image quality assessment, especially face, during the decades.
able to sort the input face images accordingly and “select” Based on ISO/IEC 19794-5 [10], the quality factors of face
good face images for recognition. Our method is evaluated image include instructions for brightness, facial pose, con-
on the Color FERET and KinectFace face datasets. Results trast, occlusion etc. Gao and Yang proposed a facial sym-
show that the proposed facial image quality metric network metry based method of image quality degradation caused
well distinguish “good” images from “bad” ones during face by non-frontal lighting and improper facial poses [12]. In
recognition. [10][13][4], Gabor wavelet feature, Haar-type features and
LBP features are used to evaluate changing illumination and
improper posture. In [1][6], some face image quality fusion
Categories and Subject Descriptors systems were designed to study the relationship between the
I.5.2 [Pattern Recognition]: Design Methodology—fea- measures and matching score prediction of face recognition.
ture evaluation and selection However, most of the previous work only consider single
or only a few factor in designing the facial image quality
metrics. This is suboptimal since we all know that the per-
General Terms formance of face recognition is affected by the combination
Design, Experimentation, Performance of all factors including image blur, contrast, pose and oc-
clusion etc. Therefore, the goals of this proposed work are
Keywords two fold. First, we propose a novel procedure to design a
reasonable and comprehensive face image quality metric by
Face image quality, face selection, face recognition, convolu- integrating various factors, which seamlessly follows various
tional network factors affecting the process of face recognition. The ad-
vantage is that we are able to get a general quality metric
1. INTRODUCTION instead of considering every face image factor separately (es-
The quality of face images has significant impact on the pecially occlusion and pose, which have not been included by
performance of face recognition system. Poor quality of a fa- [1]), which is not practical for a real face recognition system.
Second, to automatically predict this metric given arbitrary
Permission to make digital or hard copies of all or part of this work for personal or
input facial image, we train an end-to-end deep convolution-
classroom use is granted without fee provided that copies are not made or distributed al neural network for such a purpose. In particular, we adopt
for profit or commercial advantage and that copies bear this notice and the full cita- the Deep Face network [8] as well as Probabilistic Linear Dis-
tion on the first page. Copyrights for components of this work owned by others than criminant Analysis (PLDA) [3]. In this way, our end-to-end
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission system directly outputs a single overall general metric inte-
and/or a fee. Request permissions from [email protected]. grates brightness, contrast, blurriness, occlusion and pose,
ICIMCS ’16, August 19-21, 2016, Xian, China which is very easily applicable for a working system.

c 2016 ACM. ISBN 978-1-4503-4850-8/16/08. . . $15.00 The rest of the paper is organized as follows: In Section 2,
DOI: http://dx.doi.org/10.1145/3007669.3007700
we define the face quality metric for recognition aware face
quality assessment. In Section 3, we describe our CNN based
face quality prediction in detail. In Section 4, we provide
the comparative experiments and evaluate the performance
of our approach. Conclusions are drawn in Section 5. 4.2116 4.0735 3.6811 4.1923 1.8501

2. RECOGNITION ORIENTED FACIAL


QUALITY METRIC
In literature, face quality metric was defined by consid-
ering only a single (or few) face quality factor, for exam- 4.2116 3.6598 2.2128 2.026 3.781
ple, brightness, contrast, blurriness, facial symmetry etc. In
[1], Ayman combined five quality measures (contrast, sharp-
ness, focus, brightness and illumination) into one face image
quality metric. However, some of the key face quality fac-
tors, such as face pose and occlusion, were ignored. The
main goal of this study is to improve the performance of 3.8768 2.1884 1.35 3.446 2.18
face recognition algorithms by designing a new overall face
quality metric by considering all important facts affecting Figure 1: Face quality values are under each face
the face recognition procedure. image. The first to second lines show the training
The most important challenges is to how to find a good image of different brightness, contrast, blurriness or
metric which can systematically indicate the influence on occlusion in Color FERET dataset [9]; The last line
the final face recognition performance. Previous subjective shows the training image of different poses and some
quality annotation is not appropriate for this task because of them contain more than one quality factors, such
the subjective quality scores might not well correlate with as brightness, contrast, blurriness, occlusion, pose
the recognition performance. Our key observation is that for etc.
face recognition, high quality face image usually associates
with a high matching score with face recognition algorithm.
Accordingly, our idea is to assign each face image a quali- in our experiment). Sample images and the corresponding
ty value by adopting its matching score with the reference quality value including different brightness, contrast, blur-
image. In our experiment, faces are firstly detected using riness, occlusion and pose are shown in Fig 1. The higher
the method mentioned in [7]. Secondly, we obtain the face score means the better quality. It is noted that the designed
feature representation using Deep Face network [8], which face image quality value is able to describe the overall face
achieves state of the art results on standard face datasets image quality.
recognition. Finally, we get the face matching score by cal-
culating the distance between the test face image and the
reference image with PLDA [3]. In [3], a face is represented
3. DCNN FOR END-TO-END FACE IMAGE
by the sum of two independent Gaussian variables: QUALITY ASSESSMENT
Once we design the overall recognition oriented metric for
x=µ+ε (1)
facial image quality assessment, the next task is to devel-
where µ ∼ N (0, Sµ ) represents face identity and ε ∼ N (0, Sε ) op an algorithm to automatic predict this metric value giv-
is the face variation within the same identity. P (x1 , x2 |HI ) en any input facial image. Inspired by the great success
and P (x1 , x2 |HE ) are the intra- and extra-personal variation of deep convolutional neural network in various computer
hypothesis. It is readily shown that these two probabilities vision tasks such as image recognition, segmentation, super-
are also Gaussian with variations resolution, tracking, etc., in this work, we adopt a DCNN
[ ] architecture for face image quality prediction. We adopt
∑ Sµ + Sε Sµ
= (2) the VGG-16 network[11] as our network prototype (with its
I Sµ Sµ + Sε ImageNet pre-trained parameters), which is proved to work
well in image classification, localization etc. The Network
and
[ ] configuration is outlined in Table 1. The VGG-16 layer-
∑ Sµ + Sε 0 s network contains thirteen 3 × 3 convolutional layers and
= (3)
E 0 Sµ + Sε three full-connected layers. The input image size is fixed
to 224 × 224, which is the same as the input image size of
respectively. Sµ and Sε can be learned with EM algorith- Deep Face network. The ReLU activation function follows
m. Then the verification score can be computed as the log each convolutional layer. For the last full-connected layer,
likelihood ratio which has a closed-form solution. we replace the logistic regression objective with a Euclidean
P (x1 , x2 |HI ) loss:
r(x1 , x2 ) = log (4)
P (x1 , x2 |HE )
1 ∑
N

Note that for face recognition, the neutral frontal face image L= ∥yi − f (xi ; θ t )∥l2
N i=1 (5)
always achieves better recognition performance. Therefore,
we regard the neutral frontal face image as the reference im- θ ′ = argmin L
θ
age. We synthetically vary the contrast, brightness, blurri-
ness and occlusion on the face image dataset (i.e., FERET [9] where N is the number of images and f (xi : θ t ) denotes
Table 1: Network configuration
layer 0 1 2 3 4 5
type input conv conv maxpool conv conv
filt size - 3 3 2 3 3
num filt - 64 64 - 128 128
stride
pad
-
-
1
1
1
1
2
0
1
1
1
1
(1) (2) (3) (4) (5)
layer 6 7 8 9 10 11
type maxpool conv conv conv maxpool conv
filt size 2 3 3 3 2 3
num filt - 256 256 256 - 512
stride 2 1 1 1 2 1
pad 0 1 1 1 0 1
layer 12 13 14 15 16 17
type conv conv maxpool conv conv conv
filt size 3 3 2 3 3 3 (6) (7) (8) (9) (10)
num filt 512 512 - 512 512 512
stride 1 1 2 1 1 1
pad
layer
1
18
1
19
0
20
1
21
1
22
1 Figure 2: Sample images of Color FERET dataset
type maxpool fc fc fc loss
filt size 2 - - - -
num filt - 4096 4096 - -
stride 2 1 1 1 1

the estimated score of the input patch xi with our CNN’s


parameters θ, and yi denotes the ground truth score.
We update the parameters θ with momentum as follows:
∆θ t = γ∆θ t−1 + ϵt θ ′
(6)
θ t = θ t−1 + ∆θ t

where θ t is the parameters at epoch t. The momentum γ


stays at 0.9 as the training process. Figure 3: Comparison of the several face image qual-
We explore both fine-tuning the convolutional layers (called ity factors and the corresponding quality
“network 1”) and fine-tuning the first seven layers (five con-
volutional layers and two full-connected layers, called “net-
work 2”). The rest full-connected layers are initialized ran-
domly. We use the rectification non-linearity on the both
convolution layers and full-connected layers. The initial
learning rate is ϵ = 10−4 . We update the learning rate ϵt as 1 2 3 4 5
follow:
ϵt = 0.0001 × 0.1⌊t/30000⌋ (7)
where “⌊⌋” means to round the element to the nearest integer
less than or equal to that element. 6 7 8 9

Figure 4: Sample images of KinectFace dataset


4. EXPERIMENT
Extensive experiments are performed on Color FERET
database [9] and KinectFaceDB [5]. The Color FERET
database [9] contains a total of 11338 facial images, which
were collected by photographing 994 subjects at various an-
gles, over the course of 15 sessions between 1993 and 1996.
Each image contains a single face. It is divided into train-
ing data(725 subjects) or testing data (269 subjects). The
KinectFace database [5] consists of the multi-modal facial
images of 52 people, which is captured in two session and
9 states of different facial expressions, different lighting and
occlusion conditions: neutral, smile, open mouth, left pro- Figure 5: Comparison of the several face image qual-
file, right profile, occlusion eyes, occlusion mouth, occlusion ity factors and the corresponding quality values by
paper and light on. our designed overall metric
The objective of our experimental evaluations is to vali-
date 1) where our newly defined overall facial image quality
metric well corresponds to various factors; and 2) where the 4.1 Within-database Performance
proposed end-to-end quality prediction network can accu- The purpose of this experiment is to test whether our
rately predict facial image quality. To this end, we design designed overall (non-factor-specific) facial quality metric
two experiments: first, we train our model on Color FERET is reasonable and how well the trained End-to-End facial
training database and test on the test dataset. Second, we quality score prediction network works. In the process of
train our model on Color FERET dataset and test on the preparing the training data, we obtain the face feature rep-
KinectFace database to test the robustness of our method. resentation using Deep Face network [8] and calculate the
prediction. Experimental results show that our new facial
Table 2: SROCC and LCC on the Color FERET and quality metric can well distinguish the high quality face im-
KinectFace database age from poor quality ones, and our facial quality prediction
Color FERET KinectFace
network works properly. It is evident that using our facial
method SROCC LCC SROCC LCC
quality metric to select the ”best” face image to recognize
brightness 0.1364 0.0919 0.3943 0.4000
can improve the performance of face recognition algorithms.
contrast 0.3689 0.3877 0.4611 0.4583
sharpness 0.0014 0.0027 0.0191 0.0363
asymmetry 0.4906 0.4618 0.5266 0.5587 6. ACKNOWLEDGEMENTS
network 1 0.7991 0.7784 0.7244 0.7208 The work was supported by State Key Research and De-
network 2 0.8466 0.8429 0.7586 0.7482 velopment Program (2016YFB1001003), NSFC (61527804,
61521062, 61502301), STCSM (14XD1402100), the 111 Pro-
gram (B07022) and China’s Thousand Youth Talents Plan.
face matching scores with PLDA [3]. The final scores are
normalized to 0-10. After that, we randomly select 80% 7. REFERENCES
subjects of the training data and their all corresponding im- [1] A. Abaza, M. A. Harrison, T. Bourlai, and A. Ross.
ages as the training set, while the remaining 20% as the Design and evaluation of photometric image quality
validation set. We use the testing data to evaluate our mod- measures for effective face revognition. Biometrics
el. We try to both fine-tuning the “network 1” and “network IET, 3(4):314–324, 2014.
2”, and the sample result face images are listed in Fig.2. [2] S. Bezryadin, P. Bourov, and D. Ilinih. Brightness
Their face image quality values and other face image qual- calculation in digital image processing. In Digital
ity factors (brightness [2],contrast [12],sharpness [12] and Image Processing, 2007.
asymmetry [12]) are plotted in Fig.3. The two measures
[3] D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun.
of Spearman Rank Order Correlation Coefficient (SROCC)
Bayesian face revisited: A joint formulation. European
and Linear Correlation Coefficient (LCC) are used to eval-
Conference on Computer Vision, ECCV, pages
uate the performance of these methods. Spearman Rank
566–579, 2012.
Order Correlation Coefficient assesses how well the relation-
[4] Z. G and W. Y. Asymmetry-based quality assessment
ship between two variables can be described using a mono-
of face images. In International Symposium on Visual
tonic function and Linear Correlation Coefficient measures
Computing, ISVC, pages 499–508, Las Vegas, Nv, Usa,
the degree of linear dependence between two variables. The
2009.
results are shown in Table 2. It is obvious that our mod-
els (“network 1” and “network 2”) are able to describe the [5] R. Min, N. Kose, and J.-L. Dugelay. Kinectfacedb: A
face image quality for face recognition and distinguish high kinect database for face recognition. Systems, Man,
quality facial images from low quality ones, i.e., the predict- and Cybernetics: Systems, IEEE Transactions on,
ed quality values highly correlate with the defined quality 44(11):1534–1548, 2014.
values. In contrast, the comparing algorithms have lower [6] Nasrollahi and M. T. B. Face quality assessment
correlation to the matching scores. Meanwhile, we find out system in video sequences. In Biometrics and Identity
that the “network 1” and “network 2” could both distinguish Management, First European Workshop, BIOID,
the high quality face image from the poor quality well. pages 10–18, Roskilde, Denmark, May 7-9 2008.
[7] V. P. and J. M. J. Robust real-time face detection.
4.2 Cross-database Performance International journal of computer vision,
To test the generalization capability of our facial image 57(2):137–154, 2004.
quality prediction method. We train our CNN on Color [8] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep
FERET dataset [9] and test its performance on KinectFace face recognition. In The British Machine Vision
dataset [5], i.e., cross-database experiment. The sample im- Conference, BMVC, 2015.
ages are shown in Fig.4. Their face image quality values [9] P. Phillips, H. Moon, S. Rizvi, and P. Rauss. The feret
and other face image quality factors (brightness [2],contrast evaluation methodology for face recognition
[12],sharpness [12] and asymmetry [12]) are plotted in Fig.5. algorithms. IEEE Trans. Pattern Analysis and
We also calculate the SROCC and LCC to evaluate the per- Machine Intelligence, 22:1090–1104, 2000.
formance. The results are listed in Table 2. It is noted [10] J. Sang, Z. Lei, and S. Z. Li. Face image quality
that our method could approximately rank the face images evaluation for iso/iec standards 19794-5 and 29794-5.
by their face image quality. In the meantime, our method In International Conference ICB, Sassari, Italy, 2009.
considers comprehensively all quality factors and therefore [11] K. Simonyan and A. Zisserman. Very deep
it outperforms the comparing methods that only considers convolutional networks for large-scale image
brightness, contrast, sharpness, occlusion and pose separate- recognition. Eprint Arxiv, 2014.
ly. The results show that our method has good generaliza- [12] R. L. Xiufeng Gao, Stan Z. Li and P. Zhang.
tion capability. Standardization of face image sample quality. In
International Conference ICB, Seoul, Korea, 2007.
5. CONCLUSION [13] Z. Yang, H. Ai, B. Wu, S. Lao, and L. Cai. Face pose
In this paper, we propose a generally applicable facial estimation and its application in video shot selection.
quality metric that directly targets on face recognition per- In Pattern Recognition, International Conference on.
formance. Then, we propose an End-to-End deep convo- IEEE Computer Society, pages 322–325, 2004.
lutional neural network for automatic face image quality

You might also like