Pan2016 PDF
Pan2016 PDF
ABSTRACT cial image is often the main reason for low recognition accu-
Quality of facial images significantly impacts the perfor- racy. Provided that modern facial recognition system often
mance of face recognition algorithms. Being able to predict takes video (a sequence of facial images) as input, one might
“which facial image is good for recognition” is of great impor- image that whether we can automatically assess the face im-
tance for real application scenarios, where a sequence of fa- age quality for every input frame and only keep the “best”
cial images are always presented and one can select “the best one for the subsequent face recognition task. Note that this
quality” image frame for the subsequent matching and recog- facial imagine quality assessment task is significantly differ-
nition task. To this end, we introduce a novel facial image ent with previous general image quality evaluation metric.
quality automatic assessment framework directly targeting Namely, conventional image quality assessment is based
on “selecting better face image for better face recognition”. on user experience. In contrast, the facial image quality
For such as purpose, a deep convolutional neural network metric considered in this work directly targets on the fi-
(DCNN) is trained to output a general facial quality metric nal recognition performance. To the best of our knowledge,
which comprehensively considers various quality factors in- there is no comparative study on such techniques before.
cluding brightness, contrast, blurriness, occlusion, pose etc. A large body of research have been devoted to biometric
Based on this trained facial quality metric network, we are image quality assessment, especially face, during the decades.
able to sort the input face images accordingly and “select” Based on ISO/IEC 19794-5 [10], the quality factors of face
good face images for recognition. Our method is evaluated image include instructions for brightness, facial pose, con-
on the Color FERET and KinectFace face datasets. Results trast, occlusion etc. Gao and Yang proposed a facial sym-
show that the proposed facial image quality metric network metry based method of image quality degradation caused
well distinguish “good” images from “bad” ones during face by non-frontal lighting and improper facial poses [12]. In
recognition. [10][13][4], Gabor wavelet feature, Haar-type features and
LBP features are used to evaluate changing illumination and
improper posture. In [1][6], some face image quality fusion
Categories and Subject Descriptors systems were designed to study the relationship between the
I.5.2 [Pattern Recognition]: Design Methodology—fea- measures and matching score prediction of face recognition.
ture evaluation and selection However, most of the previous work only consider single
or only a few factor in designing the facial image quality
metrics. This is suboptimal since we all know that the per-
General Terms formance of face recognition is affected by the combination
Design, Experimentation, Performance of all factors including image blur, contrast, pose and oc-
clusion etc. Therefore, the goals of this proposed work are
Keywords two fold. First, we propose a novel procedure to design a
reasonable and comprehensive face image quality metric by
Face image quality, face selection, face recognition, convolu- integrating various factors, which seamlessly follows various
tional network factors affecting the process of face recognition. The ad-
vantage is that we are able to get a general quality metric
1. INTRODUCTION instead of considering every face image factor separately (es-
The quality of face images has significant impact on the pecially occlusion and pose, which have not been included by
performance of face recognition system. Poor quality of a fa- [1]), which is not practical for a real face recognition system.
Second, to automatically predict this metric given arbitrary
Permission to make digital or hard copies of all or part of this work for personal or
input facial image, we train an end-to-end deep convolution-
classroom use is granted without fee provided that copies are not made or distributed al neural network for such a purpose. In particular, we adopt
for profit or commercial advantage and that copies bear this notice and the full cita- the Deep Face network [8] as well as Probabilistic Linear Dis-
tion on the first page. Copyrights for components of this work owned by others than criminant Analysis (PLDA) [3]. In this way, our end-to-end
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission system directly outputs a single overall general metric inte-
and/or a fee. Request permissions from [email protected]. grates brightness, contrast, blurriness, occlusion and pose,
ICIMCS ’16, August 19-21, 2016, Xian, China which is very easily applicable for a working system.
⃝
c 2016 ACM. ISBN 978-1-4503-4850-8/16/08. . . $15.00 The rest of the paper is organized as follows: In Section 2,
DOI: http://dx.doi.org/10.1145/3007669.3007700
we define the face quality metric for recognition aware face
quality assessment. In Section 3, we describe our CNN based
face quality prediction in detail. In Section 4, we provide
the comparative experiments and evaluate the performance
of our approach. Conclusions are drawn in Section 5. 4.2116 4.0735 3.6811 4.1923 1.8501
Note that for face recognition, the neutral frontal face image L= ∥yi − f (xi ; θ t )∥l2
N i=1 (5)
always achieves better recognition performance. Therefore,
we regard the neutral frontal face image as the reference im- θ ′ = argmin L
θ
age. We synthetically vary the contrast, brightness, blurri-
ness and occlusion on the face image dataset (i.e., FERET [9] where N is the number of images and f (xi : θ t ) denotes
Table 1: Network configuration
layer 0 1 2 3 4 5
type input conv conv maxpool conv conv
filt size - 3 3 2 3 3
num filt - 64 64 - 128 128
stride
pad
-
-
1
1
1
1
2
0
1
1
1
1
(1) (2) (3) (4) (5)
layer 6 7 8 9 10 11
type maxpool conv conv conv maxpool conv
filt size 2 3 3 3 2 3
num filt - 256 256 256 - 512
stride 2 1 1 1 2 1
pad 0 1 1 1 0 1
layer 12 13 14 15 16 17
type conv conv maxpool conv conv conv
filt size 3 3 2 3 3 3 (6) (7) (8) (9) (10)
num filt 512 512 - 512 512 512
stride 1 1 2 1 1 1
pad
layer
1
18
1
19
0
20
1
21
1
22
1 Figure 2: Sample images of Color FERET dataset
type maxpool fc fc fc loss
filt size 2 - - - -
num filt - 4096 4096 - -
stride 2 1 1 1 1