0% found this document useful (0 votes)
34 views31 pages

WANTED3

Uploaded by

sanjanashiv04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views31 pages

WANTED3

Uploaded by

sanjanashiv04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Author’s Accepted Manuscript

Age invariant face recognition and retrieval by


coupled auto-encoder networks

Chenfei Xu, Qihe Liu, Mao Ye

www.elsevier.com/locate/neucom

PII: S0925-2312(16)31172-9
DOI: http://dx.doi.org/10.1016/j.neucom.2016.10.010
Reference: NEUCOM17622
To appear in: Neurocomputing
Received date: 21 June 2016
Revised date: 3 October 2016
Accepted date: 8 October 2016
Cite this article as: Chenfei Xu, Qihe Liu and Mao Ye, Age invariant face
recognition and retrieval by coupled auto-encoder networks, Neurocomputing,
http://dx.doi.org/10.1016/j.neucom.2016.10.010
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Age Invariant Face Recognition and Retrieval by
Coupled Auto-encoder Networks

Chenfei Xu, Qihe Liu, Mao Ye∗


School of Computer Science and Engineering, University of Electronic Science and
Technology of China, Chengdu 611731, P. R. China

Abstract

Recently many promising results have been shown on face recognition related
problems. However, age-invariant face recognition and retrieval remains a chal-
lenge. Inspired by the observation that age variation is a nonlinear but smooth
transform and the ability of auto-encoder network to learn latent representa-
tions from inputs, in this paper, we propose a new neural network model called
coupled auto-encoder networks (CAN) to handle age-invariant face recognition
and retrieval problem. CAN is a couple of two auto-encoders which bridged by
two shallow neural networks used to fit complex nonlinear aging and de-aging
process. We further propose a nonlinear factor analysis method to nonlinearly
decompose one given face image into three components which are identity fea-
ture, age feature and noise, where identity feature is age-invariant and can be
used for face recognition and retrieval. Experiments on three public available
face aging datasets: FGNET, CACD and CACD-VS show the effectiveness of
the proposed approach.
Keywords: Face recognition, Age invariant, Auto-encoder

1. Introduction

Age-invariant face recognition and retrieval is a challenging problem on face


recognition research because one person can exhibit substantially different ap-

∗ Corresponding author
Email address: [email protected] (Chenfei Xu, Qihe Liu, Mao Ye)

Preprint submitted to Elsevier October 18, 2016


1 5 9 11 15

3 16 21 25 31

Figure 1: Example images from FGNET. Images of the same row are of the same subject.
The number at the bottom shows the age of the image.

pearance at different ages which significantly increase the recognition difficulty.


5 And it is becoming increasingly important and has a wide application, such
as finding missing children, identifying criminals and passport verification. A
traditional method proposed in [1, 2] is to synthesis a face image to match the
image at a target age before recognition. They try to construct a 2D/3D model
to compensate for the age variation degrading the face recognition performance.
10 However, these generative models strongly depend on parameters assumptions,
accurate age labels and relatively clean training data, so they do not work well
in real-world face recognition.
To address this problem, some discriminative methods [3, 4, 5, 6, 7] are
proposed. Most of these methods attempt to design an appropriate feature
15 representation and an effective matching framework. Typically, Li et al. [6]
combined scale invariant feature transform (SIFT) [8] and multi-scale local bi-
nary pattern (MLBP) [9] as local feature representations for recognition but
this method does not consider age information. Recently, [4, 5] proposed an
approach based on factor analysis. It considers the face image feature of one
20 person can be expressed as combination of an identity-specific component and
an age-related component. In the test phase, this method computes matching
score of a given pair of images based on identity component (age-invariant).

2
However, these methods are all linear models that their expressive power is
limited and need a complex inference.
25 Motivated by the ability of auto-encoder to learn latent representations from
inputs and the observation that age variation is a nonlinear but smooth trans-
form, we propose a new neural network model called coupled auto-encoder
networks (CAN). Given a pair of images of one person, we first choose two
auto-encoders to accept these two images respectively as inputs to reconstruct
30 them. Then, we leverage two shallow neural networks as a bridge to connect
these two auto-encoders. We fit aging and de-aging process by these shallow
neural networks due to the fact that any one single-hidden-layer neural network
can fit any complex smooth function [10]. Further, a nonlinear factor analysis
method is applied to the hidden layers of CAN, in which the representation of
35 a face image is decomposed into three components: identity feature which is
age-invariant, age feature which is identity-independent and noise. In the end,
we apply PCA and LDA method [11] on the identity feature to form a more
compressed and discriminative feature as the final age-invariant representation
for face recognition and retrieval.
40 Our main contributions are: 1) A new model of age invariant face recognition
and retrieval is proposed based on a couple of auto-encoder networks. Our
approach is evaluated on three public face aging dataset, FGNET [12], CACD
[13] and CACD-VS [14]; 2) We propose a nonlinear factor analysis method to
separate identity feature from face representation. Compared with the similar
45 methods based on linear factor analysis proposed in [4, 5], our method can
obtain better identity feature.
The rest of this paper is organized as follows. Section 2 discusses related
works. Section 3 describes the proposed approaches and details the coupled
auto-encoder networks (CAN). Section 4 provides the experimental results. Sec-
50 tion 5 concludes the paper.

3
2. Related works

Most existing works on age-related face analysis problems focus on age es-
timation [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26] and age simulation
[27, 28, 29, 30, 1, 2]. The works on age-invariant recognition are limited and
55 traditional methods fall into two categories. Generative methods proposed in
[1, 2] try to construct a 2D/3D face aging pattern space to synthesis a face im-
age to match the target face image before recognition. However, these methods
strongly depend on parameters assumptions, accurate age labels and relatively
clean training data, so they do not work well in real-world face recognition.
60 Recently, some discriminative methods [3, 31, 4, 5, 6, 7] are proposed and get
good results. Ling et al. [7] use gradient oriented pyramid with support vector
machine for face verification. Li et al. [6] design a densely sampled local feature
description scheme combining scale invariant feature transform and multi-scale
local binary pattern to improve face matching accuracy. Gong et al. [4] propose
65 hidden factor analysis that tries to separate age variation from person-specific
features for face recognition, and further propose a maximum entropy feature
descriptor with identity factor analysis in [5] to improve this method. Lu et
al. [31] propose a compact binary face descriptor for face representation and
recognition. In [3], a new feature descriptor called local pattern selection (LPS)
70 is proposed for aging face recognition.
Data-driven methods based on a reference set also have been used to improve
age-invariant face recognition and retrieval. The authors in [13, 14] propose a
coding framework called Cross-Age Reference Coding (CARC) using CACD
[13], a new large-scale face aging dataset, as a reference set to encode the low
75 features of a face image into age-invariant representations.
Some deep learning models [32, 33, 34, 35, 36, 37, 38, 39] also have been pro-
posed. Wen et al. [32] propose a deep face recognition framework called latent
factor guided convolutional neural network (LF-CNN) to significantly improve
age invariant face recognition performance. With a model called latent identity
80 analysis (LIA), they extract the age invariant features. Similarly, [33, 34, 35] re-

4
spectively propose different convolutional neural network architecture to address
the age invariant face recognition problem. [36] presents a generalized similar-
ity model (GSM) and integrate it with the feature representation learning via
deep convolutional neural networks for age-invariant face recognition. In [37],
85 a deep aging face verification (DAFV) architecture is proposed including two
modules called aging pattern synthesis module and aging face verification mod-
ule. [39] combines deep convolutional neural networks with local binary pattern
histograms (DCNN+LBPH) for face verification across aging. [38] presents a
new joint feature learning (JFL) approach and stacks this model into a deep
90 architecture to exploit hierarchical information for face representation.
Auto-encoder attempts to learn hidden representations automatically from
inputs. It has been successfully applied in many computer vision problems. As
a typical unsupervised learning method, auto-encoder [40, 41] has shown its
efficiency in many face-related recognition problems. Kan et al. [42] propose
95 a stacked progressive auto-encoder for face recognition across poses. Liu et al.
[43] use a sparse auto-encoder for facial expression recognition and Zhang et al.
[44] propose an iterative stacked de-noising auto-encoder to recognize faces with
partial occlusions. Liu et al. [37] use deep aging-aware de-noising auto-encoder
for aging pattern synthesis.

100 3. Proposed approaches

In this section, we describe the proposed approaches. We first overview the


CAN model and then detail it. Next we present our training algorithm followed
by the face matching method.

3.1. Overview
105 An overview of CAN is shown in Fig. 2. Structurally CAN is composed of
two identical auto-encoders and two single-hidden-layer neural networks as a
bi-directional bridge.
Inputs of CAN are training facial image pairs of different persons denoted
as T = {xi1 , xi2 }(xi1 , xi2 ∈ Rn , i = 1, 2, 3, · · ·, N ) where N is the total number of

5
x̃1 bridge x̃2

Ŵn1 Ŵu1 Ŵv1 Ha1 Ha2 Ŵv2 Ŵu2 Ŵn2


ξ1 I1 A1 A2 I2 ξ2

Wn1 Wu1 Wv1 Hd2 Hd1 Wv2 Wu2 Wn2

x1 x2
De-aging

Aging

… Training Image Pairs



id 3 id 2 id 1 id 1 id 2 id 3
id N id N

Figure 2: The overview of CAN. CAN is composed of two identical auto-encoders and a bridge
network. Given a pair of input images (x1, x2) of one person, first we leverage auto-encoders
to reconstruct inputs to project them into a high-dimensional feature space in hidden layers.
Second, we add constraints in the above feature space to decompose it into three components
where (I1 , I2 ) as identity features can be used as age-invariant representations for recognition
and retrieval. Note here different id can refer to the same person. Details of CAN are described
in Section 3.2 and 3.3.

training image pairs. For one person, our goal is to encode age-invariant feature
from inputs for recognition and retrieval, and a nonlinear factor analysis model
is given as:
x = σ(I, A, ξ), (1)

where x represents inputs and σ(·) is a nonlinear function defined by CAN.


The equation above means that a facial image can be decomposed into three
110 components nonlinearly, i.e., I represents identity feature which is age-invariant,
A represents age feature which is identity-independent and ξ represents noise
which could be any factors deviate from our model.
Concretely, as shown in Fig. 2, x1 and x2 represent the younger and older
facial image inputs of the same person. Ij , Aj and ξj , for j = 1, 2, respectively
115 represent the decomposed components according to Eq. (1). x̃1 and x̃2 are basic
reconstructed outputs of CAN (see Section 3.2). We call x1 -to-x2 is an aging
direction and vice versa, x2 -to-x1 is a de-aging direction. Two single-hidden-

6
layer neural networks as a bi-directional bridge are chosen to connect A1 and
A2 to fit aging and de-aging process. We limit the age gap of each training
120 image pair in T within a certain range according to different datasets. This is
used to guarantee the aging and de-aging fitting process effective. To encode
age-invariant features I1 and I2 from inputs x1 and x2 , in our model, specifically
we have two steps:

1. Basic Reconstruction: This step respectively reconstructs the facial im-


125 age inputs x1 and x2 independently by two auto-encoders to capture
as much as main factors of inputs. Inputs are projected into a high-
dimensional feature space in hidden layers.
2. Transfer: This step imposes constraints in the above feature space to non-
linearly decompose it into three feature subspaces: identity feature space
130 which is age-invariant, age feature space which is identity-independent and
a noise space.

The two steps above build our CAN model. We detail the two steps based on
their cost functions in the following two sections.

3.2. Basic reconstruction

In this step, given a pair of facial images of the same person, we respectively
reconstruct these two images by CAN. The cost function is defined as:

1 
N
min Lr = (||xi1 − x̃i1 ||22 + ||xi2 − x̃i2 ||22 ), (2)
θ1 2N i=1

135 where parameters θ1 = {Wj , Ŵj , bj , cj } where Wj = {Wuj , Wvj , Wnj } and
Ŵj = {Ŵuj , Ŵvj , Ŵnj }, for j = 1, 2, as shown in Fig. 2. x̃i1 and x̃i2 are outputs
of auto-encoders to reconstruct the corresponding inputs xi1 and xi2 . Eq. (2) is a
typical auto-encoder training objective, i.e., a squared error function. In the rest
1
of this paper, we ignore the average processing (like 2N in Eq. (2)) to analyze
140 the cost functions for simplicity.
Here we only choose the first term to analyze since the two terms in Eq. (2)
are completely similar. A basic auto-encoder has two main blocks called encoder

7
and decoder. Input xi1 can be encoded by a function h1 = f1 (xi1 ), which can be
written as:
hi1 = f1 (xi1 ) = s(W1 xi1 + b1 ), (3)

for i = 1, 2, · · ·, N . W1 ∈ Rm×n is a weight matrix where m is the number


of neurons in the hidden layer and b1 ∈ Rm×1 is a hidden layer bias vector .
s(z) = (1 + e−z )−1 is a sigmoid function. hi1 is the hidden layer representation.
In the decoding stage, hi1 as the input is decoded by another function g1 to
get x̃i1 :
x̃i1 = g1 (hi1 ) = sl (Ŵ1 hi1 + c1 ), (4)

where x̃i1 is a reconstructed output to be close to the input xi1 . Ŵ1 ∈ Rn×m
145 is a weight matrix and c1 ∈ Rn×1 is an output layer bias vector. sl (z) = z is
an identity function (i.e., linear activation function). Minimizing this term can
update {W1 , Ŵ1 , b1 , c1 } ⊂ θ1 . Similarly, through minimizing the second term,
we can update {W2 , Ŵ2 , b2 , c2 } ⊂ θ1 . After solving Eq. (2) we fix the updated
parameters θ1 for step 2.

150 3.3. Transfer

After performing step 1, we impose constraints in the hidden layer repre-


sentation hj to decompose it into three feature subspaces: Ij , Aj and ξj , for
j = 1, 2, as shown in Fig. 2. Below we formulate the cost function of this step
as:

1 
N
min Lt = (||Ai2 − Âi2 ||22 + ||Ai1 − Âi1 ||22 +
θ2 2N i=1
||Ii2 − Ii1 ||22 + ||xi2 − x̂i2 ||22 + ||xi1 − x̂i1 ||22 ), (5)

155 where parameters θ2 = {Wuj , Ŵuj , Wvj , Ŵvj , buj , bvj , cj , Haj , Hdj , baj , bdj },
for j = 1, 2. Âi2 is an aging fitting output encouraged to be equal to the target
older age feature Ai2 . In the de-aging direction, Âi1 is encouraged to be equal
to Ai1 , the target younger age feature. Ii1 and Ii2 are identity features of the
same person which are age-invariant. Here x̂i2 and x̂i1 we call them transfer
160 reconstruction outputs to approximate inputs xi1 and xi2 , respectively.

8
Aging

cost
Ha1 Ha2

bridge
A1 Â2 A2
Wv1 Wv2

x1 x2

Figure 3: Aging fitting neural network. Â2 is an aging fitting output encouraged to be equal
to the target older age term A2 . A1 is the younger age feature. We leverage bridge network
to fit this aging process.

Minimizing the first squared error term ||Ai2 − Âi2 ||22 in Eq. (5) is used to
fit aging process between the two age features Ai2 and Ai1 . We choose a single-
hidden-layer neural network to connect Ai2 and Ai1 to fit this process because
of the fact that any single-hidden-layer neural work can fit any complex smooth
function [10]. Further, we observe that aging (and de-aging) process is a highly
complex but smooth transform process. Optimizing this term in fact is to train
an aging fitting neural network separated from CAN as shown in Fig. 3, and
Ai2 can be expressed as follows:

Ai2 = fv2 (xi2 ) = s(Wv2 xi2 + bv2 ), (6)

for i = 1, 2, · · ·, N , where fv2 is a nonlinear function forced to encode age feature


from input xi2 . Wv2 ∈ Rq×n is a weight matrix where q is age feature dimension
(i.e., the number of neurons). bv2 ∈ Rq×1 is age feature bias vector.
Before continuing our analysis, we first define two functions called aging and
de-aging function, Fa and Fd , as:

⎨ Fa (z) = fa2 (fa1 (z))
, (7)

Fd (z) = fd2 (fd1 (z))

9
where faj and fdj are defined as:

⎨ faj (z) = s(Haj z + baj )
, (8)

fdj (z) = s(Hdj z + bdj )

for j = 1, 2, where Ha1 , Hd1 ∈ Rk×q and Ha2 , Hd2 ∈ Rq×k are weight matrices.
ba1 , bd1 ∈ Rk×1 are middle layer bias vectors. We make age feature bias vectors,
bv1 and bv2 , to be adaptive, i.e., used for both encoding age feature and fitting
aging and de-aging process, so here ba2 = bv2 , bd2 = bv1 . k is the number of
bridge neurons. Thus, as shown in Fig. 2, bridge networks can be formulated
as Fa and Fd which are highly nonlinear due to the composite of two sigmoid
functions. Note the input of Fa in our model is Ai1 for aging while Fd is of the
older input Ai2 for de-aging. Now we can formulate Âi2 according to Eq. (7) as:

Âi2 = Fa (Ai1 ), (9)

where Ai1 has a similar definition as Ai2 in Eq. (6). The second term has a
165 similar analysis because it only fits from an opposite direction for de-aging.
Therefore, to minimize the first two terms in Eq. (5) technically is encouraged
to extract age-related information from inputs.
The third term is to make sure the error between the two encoded identity
features of the same person Ii1 and Ii2 is small. This term is based on the
observation that facial images of the same person contain stable identity feature
that is age-invariant. Here Iij can be formulated as below:

Iij = fuj (xij ) = s(Wuj xij + buj ), (10)

for i = 1, 2, · · ·, N, j = 1, 2. In the above equation, fuj is an identity encoding


function, Wuj ∈ Rp×n is a weight matrix where p is identity feature dimen-
170 sion and buj ∈ Rp×1 is identity feature bias vector. Minimizing ||Ii2 − Ii1 ||22 is
encouraged to extract common identity information from inputs of the same
person. Further we will use Ii1 and Ii2 as age-invariant representations for face
recognition and retrieval.
The fourth term is a transfer reconstruction squared error. Here we actually
train a transfer reconstruction neural network separated from CAN as shown

10
bridge x̂2

Ha1 Ha2
Ŵv2 Ŵu2
cost
A1 Â2 I2
Wv1 Wu2

x1 x2

Figure 4: Transfer reconstruction neural network. Given inputs (x1 , x2 ) of one person at
different ages, we use aging fitting output Â2 combined with target identity feature Ii2 to
reconstruct the older facial image input x2 .

in Fig. 4. For the inputs (xi1 , xi2 ) of one person at different ages, our idea is to
use the aging fitting output Âi2 combined with Ii2 to reconstruct xi2 , the target
older facial image input. We call this process as transfer reconstruction. Here
x̂i2 is formulated as:

x̂i2 = sl (Ŵv2 Âi2 + Ŵu2 Ii2 + c2 ), (11)

for i = 1, 2, · · ·, N , where Ŵv2 ∈ Rn×q and Ŵu2 ∈ Rn×p are weight matrices.
sl (z) is an identity function. Similarly, x̂i1 in the fifth term can be formulated
as:
x̂i1 = sl (Ŵv1 Âi1 + Ŵu1 Ii1 + c1 ). (12)

Minimizing the last two terms in Eq. (5) can make as much as useful personal
175 information concentrated on parameters θ2 . Combined with constraints brought
by other terms in Eq. (5), we simultaneously separate identity-related and age-
related information as we need. For the noises ξ1 and ξ2 , we choose to separate
them from inputs indirectly by our CAN training algorithm(see Section 3.4).

3.4. Training

180 Training CAN involves two steps as discussed above and we alternately per-
form these two training steps. We describe our training procedure in Algo-
rithm 1. In transfer step, we add constraints on identity-related and age-related

11
parameters Wuj , Wvj ⊂ Wj , Ŵuj , Ŵvj ⊂ Ŵj , buj , bvj ⊂ bj , for j = 1, 2,
to encode identity and age features. Combined with basic reconstruction step,
185 this overall training, we separate other irrelevant information in noise-related
parameters Wnj ⊂ Wj , Ŵnj ⊂ Ŵj , bnj ⊂ bj , for j = 1, 2. Therefore we indi-
rectly encode noise ξ1 and ξ2 in hidden layers. Here to solve Eq. (2) and (5), we
adopt stochastic gradient descent (SGD) using standard back-propagation [45].

Algorithm 1 CAN Training


Input: training set T = {xi1 , xi2 }(xi1 , xi2 ∈ Rn , i = 1, 2, 3, · · ·, N ); feature di-
mension p, q, r and the number of bridge neurons k; mini-batch size m and
iteration epoch maxEpoch; learning rate α.

Output: identity encoder parameters Wuj , b∗uj (j = 1, 2)
1: Set t = 1. Initialize Wj , Ŵj , Haj , Hdj (j = 1, 2) ∼ N (0, 10−4 ) and
baj , bdj , bj , cj (j = 1, 2) to be all 0.
2: repeat
3: Shuffle T .
4: repeat
5: Pick a mini-batch T  from T without overlapping.
6: Compute Lr in Eq. (2).
7: Update parameters θ1 by solving Eq. (2).
8: Compute Lt in Eq. (5).
9: Update parameters θ2 by solving Eq. (5).
10: until T is looped over.
11: t = t + 1.
12: until maxEpoch is met.

Since our CAN training is an unsupervised learning method, the extracted


190 age-invariant features I1 and I2 in hidden layers are not discriminative, so they
can not be directly used for face recognition and retrieval. Here as same as
the strategy of [4, 5], we employ PCA [46] on extracted I1 and I2 followed
by LDA [47, 11], a supervised dimension reduction technique, to make them
more compressed and discriminative as the final age-invariant features for face

12
195 recognition and retrieval.

3.5. Matching method

After CAN training and dimension reduction, we need the learned identity

encoder parameters Wuj , b∗uj (j = 1, 2) and trained PCA and LDA transform
matrices Mp , Ml to obtain the final age-invariant features. Concretely, given
a pair of probe and gallery facial image inputs (xp , xg ), according to Eq. (10),
corresponding age-invariant features are computed as follows:

Ip = MT T ∗ T T ∗ ∗
l Mp fu1 (xp ) = Ml Mp s(Wu1 xp + bu1 ), (13)

Ig = MT T ∗ T T ∗ ∗
l Mp fu2 (xg ) = Ml Mp s(Wu2 xg + bu2 ), (14)

T
where the superscript means a transposition of a matrix. Then we use cosine
distance to compute matching scores between Ip and Ig for age-invariant face
recognition and retrieval.

200 4. Experiment

4.1. Datasets

We evaluate our approach on three public aging face datasets: FGNET [12],
CACD [13] and CACD-VS [14]. FGNET contains 1,002 images of 82 different
people, with each one has about 13 images on average taken at different ages
205 from 0 to 69. CACD is a new large-scale dataset collected from the Internet
which consists 163,446 face images of 2,000 people with age ranged from 16 to
62. To the best of our knowledge, CACD is the largest public available face aging
dataset. Compared to CACD, FGNET has larger age gap and more younger
images, but CACD has a larger number of images and more images at other
210 ages. Fig. 5 shows the age range distribution of these two datasets.
Further, we conduct an experiment on CACD-VS, a subset of CACD, for face
verification. CACD-VS dataset contains 2,000 positive pairs and 2,000 negative

13
45

40 FGNET
CACD
35

30
Percentage
25

20

15

10

0
0-10 11-20 21-30 31-40 41-50 51-60 61+

Figure 5: Age range distribution (%) of FGNET and CACD.

pairs and is carefully annotated by checking both of the associated image and
surrounding web contents.
215 In our problem, all facial images are preprocessed as follows: (1) convert the
images into gray ones if they are RGB images; (2) detect the locations of the
faces in the images using Viola-Jones face detector [48] and locate the 83 land-
marks using Face++ API [49]; (3) align the images to make their eyes located
at the same horizontal positions; (4) crop the images to remove the background
220 and hair region; (5) rescale them by bicubic interpolation and reshape them into
one-dimension vector. All the data are then mapped into [0, 1] and normalized
to have zero mean.

4.2. Parameters setting

In our approach, there are several hyper-parameters to select: input di-


225 mension n, identity feature dimension p, age feature dimension q, noise-related
feature dimension r, the number of bridge neurons k in CAN and dimension of
PCA [46] and LDA [47, 11]. These hyper-parameters setting is given in Table
1. For FGNET and CACD datasets, input dimension n is 35 × 32 and in CAN
training, we choose a fixed learning rate α = 0.0001, mini-batch size m to be 10
230 to perform SGD. For CACD-VS, we use the same parameters setting in CACD.

14
Table 1: The parameters setting in our experiments.
Dataset FGNET CACD
p 2100 2800
q 600 800
CAN
r 300 400
k 500 800
PCA 400 500
Dimension Reduction
LDA 100 120

Table 2: The setting of age feature dimension p, identity feature dimension q, noise-related
feature dimension r based on different choices of the number of hidden layer neurons m.
m 1000 2000 3000 4000
p 700 1400 2100 2800
q 200 400 600 800
r 100 200 300 400

For the three datasets, we respectively set iteration epoch maxEpoch to 1,000,
500 and 800. All parameters updating use momentum of 0.9.

4.3. Experiment on FGNET dataset

FGNET is a challenging face aging dataset because it is relatively small and


235 suffers from other significant variations such as pose, illumination and expres-
sion. Some examples of them are shown in Fig. 1. Following the training and
testing split scheme in [6], we use leave-one-image-out strategy for performance
evaluation. In each cross, we use all the remaining face images choosing pairs
from them to form our training set (1,800 training image pairs for each cross
240 and different pairs can refer to the same person). We constrain age gap of each
image pair to be less than 10 years. All the training data are used to learn PCA
and LDA subspaces in each cross.

15
4.3.1. Evaluation metrics
In our experiment on FGNET, we use leave-one-image-out strategy with
245 rank-k identification rates for performance evaluation. Specifically, we leave
one image as the test sample and train the model by using the remaining 1,001
images from which training pairs are selected. We repeat this procedure 1,002
times and took the average as the final identification rates. Cosine similarity
is used to compute matching scores between the test example and remaining
250 images. For rank-k, we sort the matching results from top-1 to top-k for each
test example. Then we can get rank-k identification rates after averaging these
results.

4.3.2. Parameters exploration


There are some parameters influencing the performance of our approach:
255 the number of hidden layer neurons m of each auto-encoder in CAN where
m = p + q + r, PCA dimension dp and LDA dimension dl . We use rank-k
identification rates on FGNET to decide these parameters.
For the number of hidden layer neurons m, we run experiments from 1,000
to 4,000. For each choice of m, the parameters p, q and r are selected with
260 exhaustive research. We give them in Table 2 and keep the number of bridge
neurons k to be 500. Here we use raw age-invariant feature I1 and I2 (extracted
from hidden layers) directly for identification to choose m. Theoretically, more
hidden layer neurons means more complex encoding functions we can learn and
can catch more useful information from inputs. Fig. 6(a) shows face recognition
265 performance on FGNET of different choices of m. As we expect, the less neurons,
the worse performance. We can observe that when m = 3, 000, the performance
is slightly better than that of m = 4, 000, hence we choose m = 3, 000.
Different choices of PCA and LDA parameters are further investigated based
on the above setting with m = 3, 000 in Table 2. For PCA dimension dp , we
270 select it from 100 to 1,000 and for LDA dimension dl we select it from 60 to
300. And we run experiments to seek the best combination of them using rank-1
identification rates on FGNET as metric. Results of different PCA dimensions

16
Table 3: Rank-1 recognition rates (%) of different PCA dimensions and LDA dimensions on
FGNET.
dl 60 100 140 180 220 260 300
dp
100 79.0 − − − − − −
200 83.5 82.5 80.8 78.1 − − −
300 83.9 86.0 82.9 80.3 78.3 71.4 −
400 82.2 86.5 85.4 80.6 77.0 71.7 65.4
500 81.4 84.8 82.0 78.9 76.5 72.2 66.4
600 80.6 80.7 81.4 78.4 75.4 74.9 69.5
700 78.3 80.3 80.6 77.5 72.0 73.6 71.3
800 71.7 76.3 78.5 74.0 71.4 70.3 72.9
900 68.2 70.1 73.3 72.4 70.9 67.4 69.3
1000 63.1 66.2 69.7 70.8 67.2 65.9 65.7

and LDA dimensions of rank-1 recognition rates are given in Table 3. Here we
choose (dp , dl ) = (400, 100) as the best settings for dimension reduction and
275 testing.

4.3.3. Effects of dimension reduction strategies


We also study the performance of our approach with different dimension
reduction strategies. We test our model on FGNET with PCA only, LDA only
and PCA+LDA applied on raw age-invariant features I1 and I2 . Concretely, for
280 PCA only, we follow the above setting dp = 400, for LDA only we tune dl to 200,
and for PCA+LDA, we set (dp , dl ) = (400, 100) as above. Cumulative Match
Characteristic (CMC) curves on FGNET of different dimension reduction strate-
gies are shown in Fig. 6(b). We have several observations. First, raw feature
performs badly mainly due to lack of supervised information. Second, there are
285 significant performance improvements after LDA applied, whether or not PCA
is applied, and rank-1 identification rate based on raw features can be improved
from 38.46% to 75.25% after only LDA applied. This demonstrates the effec-
tiveness of supervised learning method combined with our unsupervised CAN
model. Finally, PCA technique can further improve our performance. There-
290 fore, we use raw feature with PCA+LDA strategy in our following experiments.

17
0.65 1

0.6 0.9
Cumulative accuracy

Cumulative accuracy
0.55 0.8

0.5 0.7

0.45 0.6

0.4 0.5

m=1000 raw feature


0.35 m=2000 0.4 raw feature+PCA
m=3000 raw feature+LDA
m=4000 raw feature+PCA+LDA
0.3 0.3
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Rank Rank

(a) (b)

Figure 6: (a) Cumulative Match Characteristic(CMC) curves on FGNET with different choices
of the number of hidden layer neurons m of CAN. (b) CMC curves on FGNET with different
dimension reduction strategies.

4.3.4. Comparison with state-of-the-art algorithms


We compare our approach with state-of-the-art algorithms including: (1)
a generative model to build face aging space for age-invariant face recognition
[1]; (2) a discriminative model [6]; (3) hidden factor analysis [4], a linear fac-
295 tor analysis method for face recognition; (4) a discriminative method using a
maximum entropy feature descriptor based on identity factor analysis [5]; (5) a
deep face recognition framework called latent factor guided convolutional neural
network (LF-CNN) [32]. (6) a CNN baseline model [32] with same networks as
LF-CNNs without latent identity analysis (LIA). Comparative results are shown
300 in Table 4.
Among all the compared algorithms in Table 4, it can be seen that our ap-
proach obtains competitive results. As we can see, our nonlinear factor analysis
method with CAN is superior over other linear factor analysis methods in [4, 5].
The performance of our approach is inferior to that of LF-CNNs [32] which has
305 the top performance. One possible reason is that although we leverage LDA
technique to make the identity features more discriminative, the proposed un-
supervised CAN model is still less discriminative than the supervised LF-CNNs.
Another possible reason is that our CAN adopts a shallow neural network ar-

18
Table 4: Rank-1 recognition rates of our approach compared with state-of-the-art algorithms
on FGNET.
Algorithms Recognition Rate
Park et al. (2010) [1] 37.4%
Li et al. (2011) [6] 47.5%
HFA (2013) [4] 69.0%
MEFA (2015) [5] 76.2%
CNN-baseline (2016) [32] 84.4%
LF-CNNs (2016) [32] 88.1%
Our approach 86.5%

Table 5: Performance of our approach compared with state-of-the-art algorithms on different


age groups in FGNET.
Age group Amount CNN-baseline LF-CNNs Ours
0-4 193 51.81% 60.10% 60.27%
5-10 218 84.86% 88.53% 87.39%
11-16 201 91.04% 94.03% 92.63%
17-24 182 94.51% 97.80% 95.47%
25-69 208 99.04% 99.52% 98.01%

chitecture. Generally speaking, the performance of shallow networks is worse


310 than that of deep networks.
Further, it is desirable to investigate our approach on different age groups
in FGNET. Following the age groups setting in [32], rank-1 recognition rates
of our approach on different age groups in FGNET are given in Table 5. From
Table 5, we can see that our approach still yields good performance. This proves
315 the effectiveness of our approach.
Finally, some failed retrieval results in FGNET are given in Fig. 7. We can
see some incorrect rank-1 results are even more similar to the probe images than
the corresponding ground-truth images. On the other hand, face recognition
fails due to some other variations like illumination, expression, etc.

19
Probe Images

47 22 0 10 3 9 20

Rank-1 Results

26 29 1 12 2 8 13

Ground Truth

51 26 2 15 5 11 23

Figure 7: Some failed retrieval results in FGNET. The first row shows the probe faces and
the second row shows the incorrect rank-1 retrieval results using our approach. The third row
presents the ground-truth images corresponding to the probes.

320 4.3.5. Effects of aging and de-aging operator


We leverage transfer reconstruction neural network (see Fig. 4) to investigate
aging and de-aging operator of CAN. For aging operator, given an image pair
as input (x1 , x2 ), we replace the decomposed age feature A2 of x2 with aging
fitting output Â2 to reconstruct x2 . The goodness of the age feature from the
325 hidden layer can be shown from the output of the reconstructed result x̂2 . For
de-aging operator, the process is completely similar from an opposite direction.
Here we use CAN trained with CACD dataset (see Section 4.4) to visualize
some reconstructed results in FGNET. They are shown in Fig. 8. From the
results, we can see the reconstructions are similar to the ground-truth images.
330 This intuitively demonstrates the effectiveness of CAN to fit complex nonlinear
aging and de-aging process.

4.4. Experiment on CACD dataset

In this experiment, we conduct a face retrieval experiment on CACD [13],


the newly largest public available face aging dataset. CACD dataset includes
335 varying illumination, pose variation and makeup.
We follow the experimental settings in [13]. In CACD, 120 celebrities with

20
de-aging aging

5 10 16

de-aging aging

21 28 40

de-aging aging

7 16 21

Figure 8: Some aging and de-aging visualization results in FGNET. Each row represents the
same person. The second and fourth column show the reconstructed outputs. The first and
last column show the ground-truth images.

rank 3-5 are chosen as test sets where images taken at 2013 are used as query
images. The remaining images are split into three subsets respectively taken in
2004-2006, 2007-2009 and 2010-2012 as database images. In training, for each
340 one of the remaining 1,880 celebrities in CACD, there are about 80 images taken
at different years while the age gap is about 0-10 years. For these remaining
images, we select 20 image pairs of each person to aggregate them as training
set (37,600 pairs). Note that the use of makeup in CACD may confound the age
of an individual. In order to avoid the impact brought by this on our algorithm,
345 we carefully check the corresponding image contents and age labels to get our
training set. All training images are then used to learn PCA+LDA subspaces.
Age gap of each training image pair is constrained between 2 and 7 years.

4.4.1. Evaluation metrics


In our experiment on CACD, we use mean average precision (MAP) as eval-
uation metrics. Cosine distance is used to compute the similarity of two images.
Concretely, let qi ∈ Q be the query images and Q is the query database. For qi ,

21
0.8
HFA
CARC
0.75 GSM-1
GSM-2
Our approach

0.7
MAP
0.65

0.6

0.55

0.5
2004-2006 2007-2009 2010-2012

Figure 9: Face retrieval performance in terms of MAP of our approach compared with state-
of-the-art algorithms on CACD.

the positive images can be expressed as Y1 , Y2 , · · ·, Ymi . We define Eic as the


retrieval results of qi in a descending order, from the top to Yc . We first give
average precision (AP) of qi as below:
1 
mi
AP (qi ) = P recision(Eic ), (15)
mi c=1

where P recision(Eic ) means the ratio of positive images in Eic . Then the MAP
of Q can be computed as:
|Q|
1 
M AP (Q) = AP (qi ), (16)
|Q| i=1

which is the average of the AP of all query images.

350 4.4.2. Comparison with state-of-the-art algorithms


We compare our approach with state-of-the-art algorithms including HFA [4],
CARC [14] and a generalized similarity model [36] (GSM-1 and GSM-2). Com-
pared with GSM-1, GSM-2 only uses more training samples. Fig. 9 reports
the comparative results. All methods in Fig. 9 are tuned to the best settings
355 according to their papers. The results in Fig. 9 show that our approach out-
performs the others in all three subsets. Note that compared with HFA [4],

22
CARC [14] and GSM-1[36] on the subset with small age gap, both our method
and GSM-2 [36] can achieve competitive performance on the subset with large
age gap. This confirms the superiority of our approach.

360 4.5. Experiment on CACD-VS dataset

CACD-VS dataset contains 4,000 images pairs from 2,000 celebrities. Fol-
lowing the configuration in [14] for face verification, we split CACD-VS into ten
folds and each fold has 400 images pairs (200 positive pairs and 200 negative
pairs) from 200 celebrities. We use one fold for testing and the other nine folds
365 for training. We repeat our experiment on each of the ten folds and report av-
erage results. Concretely, for each run, we use the other nine folds (3,600 image
pairs) to train CAN and learn PCA+LDA subspaces. After we get the identity
feature for each image, cosine similarity is used to compute matching scores be-
tween pairs. The optimal classification threshold is decided by the nine training
370 folds. Performance of our method compared with state-of-the-art algorithms is
reported in Table 6.
From the results reported in Table 6, although our method significantly im-
proves verification accuracy from 85.7% to 92.3% compared with human average
performance, combining the decisions from multiple human can get a higher ac-
375 curacy of 94.2%. It proves that there is still a gap for our method to achieve
human performance. We also add two general deep face recognition methods
for comparison, Deepface [50] and DeepID2 [51]. The result of Deepface is bor-
rowed from [36]. DeepID2 model is pretrained with CACD dataset. As seen
in Table 6, our method still outperforms them. This further demonstrates the
380 specific effectiveness of CAN on face verification with aging variations.

5. Conclusions

In this paper, we propose coupled auto-encoder networks (CAN) and a non-


linear factor analysis method, to address age-invariant face recognition and re-
trieval problem. Through CAN, we can nonlinearly separate identity feature to

23
Table 6: Verification accuracy on the CACD-VS dataset.
Method Accuracy
HD-LBP [52] 81.6%
HFA (2013) [4] 84.4%
CARC (2014) [14] 87.6%
Deepface (2014) [50] 85.4%
DeepID2 (2014) [51] 87.2%
DCNN+LBPH (2015) [39] 89.5%
Human, Average (2013) 85.7%
Human, Voting (2015) 94.2%
LF-CNNs (2016) [32] 98.5%
GSM (2016) [36] 89.8%
Our approach 92.3%

385 be age-invariant from one given face image. Experiments on FGNET, CACD
and CACD-VS confirm the effectiveness of our approach.
In the future, we will attempt to incorporate supervised information in CAN
and refine our networks architecture. Cross-database evaluation will be investi-
gated. We will also extend our CAN model to tackle face recognition problems
390 with other variations like expression, illumination and pose.

Acknowledgements

This work was supported in part by the National Natural Science Foundation
of China (61375038) and Applied Basic Research Programs of Sichuan Science
and Technology Department (2016JY0088).

395 References

[1] U. Park, Y. Tong, A. K. Jain, Age-invariant face recognition, Pattern Anal-


ysis and Machine Intelligence, IEEE Transactions on 32 (5) (2010) 947–954.

24
[2] J.-X. Du, C.-M. Zhai, Y.-Q. Ye, Face aging simulation and recognition
based on nmf algorithm with sparseness constraints, Neurocomputing 116
400 (2013) 250–259.

[3] Z. Li, D. Gong, X. Li, D. Tao, Aging face recognition: A hierarchical


learning model based on local patterns selection, IEEE Transactions on
Image Processing 25 (5) (2016) 2146–2154.

[4] D. Gong, Z. Li, D. Lin, J. Liu, X. Tang, Hidden factor analysis for age
405 invariant face recognition, in: Proceedings of the IEEE International Con-
ference on Computer Vision, 2013, pp. 2872–2879.

[5] D. Gong, Z. Li, D. Tao, J. Liu, X. Li, A maximum entropy feature descriptor
for age invariant face recognition, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2015, pp. 5289–5297.

410 [6] Z. Li, U. Park, A. K. Jain, A discriminative model for age invariant face
recognition, Information Forensics and Security, IEEE Transactions on 6 (3)
(2011) 1028–1037.

[7] H. Ling, S. Soatto, N. Ramanathan, D. W. Jacobs, Face verification across


age progression using discriminative methods, Information Forensics and
415 Security, IEEE Transactions on 5 (1) (2010) 82–91.

[8] D. G. Lowe, Distinctive image features from scale-invariant keypoints, In-


ternational journal of computer vision 60 (2) (2004) 91–110.

[9] T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary


patterns: Application to face recognition, Pattern Analysis and Machine
420 Intelligence, IEEE Transactions on 28 (12) (2006) 2037–2041.

[10] A. R. Barron, Universal approximation bounds for superpositions of a sig-


moidal function, Information Theory, IEEE Transactions on 39 (3) (1993)
930–945.

25
[11] P. N. Belhumeur, J. P. Hespanha, D. J. Kriegman, Eigenfaces vs. fisher-
425 faces: Recognition using class specific linear projection, Pattern Analysis
and Machine Intelligence, IEEE Transactions on 19 (7) (1997) 711–720.

[12] FG-NET Aging Database. http://www.fgnet.rsunit.com.

[13] B.-C. Chen, C.-S. Chen, W. H. Hsu, Cross-age reference coding for age-
invariant face recognition and retrieval, in: Computer Vision–ECCV 2014,
430 Springer, 2014, pp. 768–783.

[14] B.-C. Chen, C.-S. Chen, W. H. Hsu, Face recognition and retrieval using
cross-age reference coding with cross-age celebrity dataset, Multimedia,
IEEE Transactions on 17 (6) (2015) 804–815.

[15] A. Montillo, H. Ling, Age regression from faces using random forests, in:
435 Image Processing (ICIP), 2009 16th IEEE International Conference on,
IEEE, 2009, pp. 2465–2468.

[16] G. Guo, G. Mu, Y. Fu, T. S. Huang, Human age estimation using bio-
inspired features, in: Computer Vision and Pattern Recognition, 2009.
CVPR 2009. IEEE Conference on, IEEE, 2009, pp. 112–119.

440 [17] Y. Fu, T. S. Huang, Human age estimation with regression on discrimi-
native aging manifold, Multimedia, IEEE Transactions on 10 (4) (2008)
578–584.

[18] S. K. Zhou, B. Georgescu, X. S. Zhou, D. Comaniciu, Image based regres-


sion using boosting method, in: Computer Vision, 2005. ICCV 2005. Tenth
445 IEEE International Conference on, Vol. 1, IEEE, 2005, pp. 541–548.

[19] S. Yan, H. Wang, X. Tang, T. S. Huang, Learning auto-structured regressor


from uncertain nonnegative labels, in: Computer Vision, 2007. ICCV 2007.
IEEE 11th International Conference on, IEEE, 2007, pp. 1–8.

[20] J. Wang, Y. Shang, G. Su, X. Lin, Age simulation for face recognition, in:
450 Pattern Recognition, 2006. ICPR 2006. 18th International Conference on,
Vol. 3, IEEE, 2006, pp. 913–916.

26
[21] N. Ramanathan, R. Chellappa, Face verification across age progression,
Image Processing, IEEE Transactions on 15 (11) (2006) 3349–3361.

[22] G. Guo, Y. Fu, C. R. Dyer, T. S. Huang, Image-based human age esti-


455 mation by manifold learning and locally adjusted robust regression, Image
Processing, IEEE Transactions on 17 (7) (2008) 1178–1188.

[23] X. Geng, Z.-H. Zhou, K. Smith-Miles, Automatic age estimation based


on facial aging patterns, Pattern Analysis and Machine Intelligence, IEEE
Transactions on 29 (12) (2007) 2234–2240.

460 [24] Y. H. Kwon, N. D. V. Lobo, Age classification from facial images, in:
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94.,
1994 IEEE Computer Society Conference on, IEEE, 1994, pp. 762–767.

[25] A. Lanitis, C. Draganova, C. Christodoulou, Comparing different classifiers


for automatic age estimation, Systems, Man, and Cybernetics, Part B:
465 Cybernetics, IEEE Transactions on 34 (1) (2004) 621–628.

[26] J. Lu, V. E. Liong, J. Zhou, Cost-sensitive local binary feature learning


for facial age estimation, IEEE Transactions on Image Processing 24 (12)
(2015) 5356–5368.

[27] J. Suo, X. Chen, S. Shan, W. Gao, Learning long term face aging patterns
470 from partially dense aging databases, in: Computer Vision, 2009 IEEE
12th International Conference on, IEEE, 2009, pp. 622–629.

[28] A. Lanitis, C. J. Taylor, T. F. Cootes, Toward automatic simulation of


aging effects on face images, Pattern Analysis and Machine Intelligence,
IEEE Transactions on 24 (4) (2002) 442–455.

475 [29] J. Suo, S.-C. Zhu, S. Shan, X. Chen, A compositional and dynamic model
for face aging, Pattern Analysis and Machine Intelligence, IEEE Transac-
tions on 32 (3) (2010) 385–401.

27
[30] N. Tsumura, N. Ojima, K. Sato, M. Shiraishi, H. Shimizu, H. Nabeshima,
S. Akazaki, K. Hori, Y. Miyake, Image-based skin color and texture anal-
480 ysis/synthesis by extracting hemoglobin and melanin information in the
skin, ACM Transactions on Graphics (TOG) 22 (3) (2003) 770–779.

[31] J. Lu, V. E. Liong, X. Zhou, J. Zhou, Learning compact binary face de-
scriptor for face recognition, IEEE transactions on pattern analysis and
machine intelligence 37 (10) (2015) 2041–2056.

485 [32] Y. Wen, Z. Li, Y. Qiao, Age invariant deep face recognition, in: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
2016.

[33] Y. Li, G. Wang, L. Lin, H. Chang, A deep joint learning approach for
age invariant face verification, in: Computer Vision, Springer, 2015, pp.
490 296–305.

[34] S. Bianco, Large age-gap face verification by feature injection in deep net-
works, arXiv preprint arXiv:1602.06149.

[35] H. El Khiyari, H. Wechsler, et al., Face recognition across time lapse us-
ing convolutional neural networks, Journal of Information Security 7 (03)
495 (2016) 141.

[36] L. Lin, G. Wang, W. Zuo, F. Xiangchu, L. Zhang, Cross-domain visual


matching via generalized similarity measure and feature learning.

[37] L. Liu, C. Xiong, H. Zhang, Z. Niu, M. Wang, S. Yan, Deep aging face ver-
ification with large gaps, Multimedia, IEEE Transactions on 18 (1) (2016)
500 64–75.

[38] J. Lu, V. E. Liong, G. Wang, P. Moulin, Joint feature learning for face
recognition, IEEE Transactions on Information Forensics and Security
10 (7) (2015) 1371–1383.

28
[39] H. Zhai, C. Liu, H. Dong, Y. Ji, Y. Guo, S. Gong, Face verification across
505 aging based on deep convolutional networks and local binary patterns, in:
Intelligence Science and Big Data Engineering. Image and Video Data En-
gineering, Springer, 2015, pp. 341–350.

[40] G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data


with neural networks, Science 313 (5786) (2006) 504–507.

510 [41] Y. Bengio, Learning deep architectures for ai, Foundations and trends
R in

Machine Learning 2 (1) (2009) 1–127.

[42] M. Kan, S. Shan, H. Chang, X. Chen, Stacked progressive auto-encoders


(spae) for face recognition across poses, in: Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition, 2014, pp. 1883–1890.

515 [43] Y. Liu, X. Hou, J. Chen, C. Yang, G. Su, W. Dou, Facial expression
recognition and generation using sparse autoencoder, in: Smart Comput-
ing (SMARTCOMP), 2014 International Conference on, IEEE, 2014, pp.
125–130.

[44] Y. Zhang, R. Liu, S. Zhang, M. Zhu, Occlusion-robust face recognition


520 using iterative stacked denoising autoencoder, in: Neural Information Pro-
cessing, Springer, 2013, pp. 352–359.

[45] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning ap-


plied to document recognition, Proceedings of the IEEE 86 (11) (1998)
2278–2324.

525 [46] M. A. Turk, A. P. Pentland, Face recognition using eigenfaces, in: Com-
puter Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE
Computer Society Conference on, IEEE, 1991, pp. 586–591.

[47] X. Wang, X. Tang, A unified framework for subspace face recognition,


Pattern Analysis and Machine Intelligence, IEEE Transactions on 26 (9)
530 (2004) 1222–1228.

29
[48] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple
features, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001.
Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1,
IEEE, 2001, pp. I–511.

535 [49] Megvii: Face++. http://www.faceplusplus.com. Accessed 2014-3-7.

[50] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, Deepface: Closing the gap to


human-level performance in face verification, in: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2014, pp. 1701–
1708.

540 [51] Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by
joint identification-verification, in: Advances in Neural Information Pro-
cessing Systems, 2014, pp. 1988–1996.

[52] D. Chen, X. Cao, F. Wen, J. Sun, Blessing of dimensionality: High-


dimensional feature and its efficient compression for face verification, in:
545 Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2013, pp. 3025–3032.

30

You might also like