WANTED3
WANTED3
www.elsevier.com/locate/neucom
PII: S0925-2312(16)31172-9
DOI: http://dx.doi.org/10.1016/j.neucom.2016.10.010
Reference: NEUCOM17622
To appear in: Neurocomputing
Received date: 21 June 2016
Revised date: 3 October 2016
Accepted date: 8 October 2016
Cite this article as: Chenfei Xu, Qihe Liu and Mao Ye, Age invariant face
recognition and retrieval by coupled auto-encoder networks, Neurocomputing,
http://dx.doi.org/10.1016/j.neucom.2016.10.010
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Age Invariant Face Recognition and Retrieval by
Coupled Auto-encoder Networks
Abstract
Recently many promising results have been shown on face recognition related
problems. However, age-invariant face recognition and retrieval remains a chal-
lenge. Inspired by the observation that age variation is a nonlinear but smooth
transform and the ability of auto-encoder network to learn latent representa-
tions from inputs, in this paper, we propose a new neural network model called
coupled auto-encoder networks (CAN) to handle age-invariant face recognition
and retrieval problem. CAN is a couple of two auto-encoders which bridged by
two shallow neural networks used to fit complex nonlinear aging and de-aging
process. We further propose a nonlinear factor analysis method to nonlinearly
decompose one given face image into three components which are identity fea-
ture, age feature and noise, where identity feature is age-invariant and can be
used for face recognition and retrieval. Experiments on three public available
face aging datasets: FGNET, CACD and CACD-VS show the effectiveness of
the proposed approach.
Keywords: Face recognition, Age invariant, Auto-encoder
1. Introduction
∗ Corresponding author
Email address: [email protected] (Chenfei Xu, Qihe Liu, Mao Ye)
3 16 21 25 31
Figure 1: Example images from FGNET. Images of the same row are of the same subject.
The number at the bottom shows the age of the image.
2
However, these methods are all linear models that their expressive power is
limited and need a complex inference.
25 Motivated by the ability of auto-encoder to learn latent representations from
inputs and the observation that age variation is a nonlinear but smooth trans-
form, we propose a new neural network model called coupled auto-encoder
networks (CAN). Given a pair of images of one person, we first choose two
auto-encoders to accept these two images respectively as inputs to reconstruct
30 them. Then, we leverage two shallow neural networks as a bridge to connect
these two auto-encoders. We fit aging and de-aging process by these shallow
neural networks due to the fact that any one single-hidden-layer neural network
can fit any complex smooth function [10]. Further, a nonlinear factor analysis
method is applied to the hidden layers of CAN, in which the representation of
35 a face image is decomposed into three components: identity feature which is
age-invariant, age feature which is identity-independent and noise. In the end,
we apply PCA and LDA method [11] on the identity feature to form a more
compressed and discriminative feature as the final age-invariant representation
for face recognition and retrieval.
40 Our main contributions are: 1) A new model of age invariant face recognition
and retrieval is proposed based on a couple of auto-encoder networks. Our
approach is evaluated on three public face aging dataset, FGNET [12], CACD
[13] and CACD-VS [14]; 2) We propose a nonlinear factor analysis method to
separate identity feature from face representation. Compared with the similar
45 methods based on linear factor analysis proposed in [4, 5], our method can
obtain better identity feature.
The rest of this paper is organized as follows. Section 2 discusses related
works. Section 3 describes the proposed approaches and details the coupled
auto-encoder networks (CAN). Section 4 provides the experimental results. Sec-
50 tion 5 concludes the paper.
3
2. Related works
Most existing works on age-related face analysis problems focus on age es-
timation [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26] and age simulation
[27, 28, 29, 30, 1, 2]. The works on age-invariant recognition are limited and
55 traditional methods fall into two categories. Generative methods proposed in
[1, 2] try to construct a 2D/3D face aging pattern space to synthesis a face im-
age to match the target face image before recognition. However, these methods
strongly depend on parameters assumptions, accurate age labels and relatively
clean training data, so they do not work well in real-world face recognition.
60 Recently, some discriminative methods [3, 31, 4, 5, 6, 7] are proposed and get
good results. Ling et al. [7] use gradient oriented pyramid with support vector
machine for face verification. Li et al. [6] design a densely sampled local feature
description scheme combining scale invariant feature transform and multi-scale
local binary pattern to improve face matching accuracy. Gong et al. [4] propose
65 hidden factor analysis that tries to separate age variation from person-specific
features for face recognition, and further propose a maximum entropy feature
descriptor with identity factor analysis in [5] to improve this method. Lu et
al. [31] propose a compact binary face descriptor for face representation and
recognition. In [3], a new feature descriptor called local pattern selection (LPS)
70 is proposed for aging face recognition.
Data-driven methods based on a reference set also have been used to improve
age-invariant face recognition and retrieval. The authors in [13, 14] propose a
coding framework called Cross-Age Reference Coding (CARC) using CACD
[13], a new large-scale face aging dataset, as a reference set to encode the low
75 features of a face image into age-invariant representations.
Some deep learning models [32, 33, 34, 35, 36, 37, 38, 39] also have been pro-
posed. Wen et al. [32] propose a deep face recognition framework called latent
factor guided convolutional neural network (LF-CNN) to significantly improve
age invariant face recognition performance. With a model called latent identity
80 analysis (LIA), they extract the age invariant features. Similarly, [33, 34, 35] re-
4
spectively propose different convolutional neural network architecture to address
the age invariant face recognition problem. [36] presents a generalized similar-
ity model (GSM) and integrate it with the feature representation learning via
deep convolutional neural networks for age-invariant face recognition. In [37],
85 a deep aging face verification (DAFV) architecture is proposed including two
modules called aging pattern synthesis module and aging face verification mod-
ule. [39] combines deep convolutional neural networks with local binary pattern
histograms (DCNN+LBPH) for face verification across aging. [38] presents a
new joint feature learning (JFL) approach and stacks this model into a deep
90 architecture to exploit hierarchical information for face representation.
Auto-encoder attempts to learn hidden representations automatically from
inputs. It has been successfully applied in many computer vision problems. As
a typical unsupervised learning method, auto-encoder [40, 41] has shown its
efficiency in many face-related recognition problems. Kan et al. [42] propose
95 a stacked progressive auto-encoder for face recognition across poses. Liu et al.
[43] use a sparse auto-encoder for facial expression recognition and Zhang et al.
[44] propose an iterative stacked de-noising auto-encoder to recognize faces with
partial occlusions. Liu et al. [37] use deep aging-aware de-noising auto-encoder
for aging pattern synthesis.
3.1. Overview
105 An overview of CAN is shown in Fig. 2. Structurally CAN is composed of
two identical auto-encoders and two single-hidden-layer neural networks as a
bi-directional bridge.
Inputs of CAN are training facial image pairs of different persons denoted
as T = {xi1 , xi2 }(xi1 , xi2 ∈ Rn , i = 1, 2, 3, · · ·, N ) where N is the total number of
5
x̃1 bridge x̃2
x1 x2
De-aging
Aging
Figure 2: The overview of CAN. CAN is composed of two identical auto-encoders and a bridge
network. Given a pair of input images (x1, x2) of one person, first we leverage auto-encoders
to reconstruct inputs to project them into a high-dimensional feature space in hidden layers.
Second, we add constraints in the above feature space to decompose it into three components
where (I1 , I2 ) as identity features can be used as age-invariant representations for recognition
and retrieval. Note here different id can refer to the same person. Details of CAN are described
in Section 3.2 and 3.3.
training image pairs. For one person, our goal is to encode age-invariant feature
from inputs for recognition and retrieval, and a nonlinear factor analysis model
is given as:
x = σ(I, A, ξ), (1)
6
layer neural networks as a bi-directional bridge are chosen to connect A1 and
A2 to fit aging and de-aging process. We limit the age gap of each training
120 image pair in T within a certain range according to different datasets. This is
used to guarantee the aging and de-aging fitting process effective. To encode
age-invariant features I1 and I2 from inputs x1 and x2 , in our model, specifically
we have two steps:
The two steps above build our CAN model. We detail the two steps based on
their cost functions in the following two sections.
In this step, given a pair of facial images of the same person, we respectively
reconstruct these two images by CAN. The cost function is defined as:
1
N
min Lr = (||xi1 − x̃i1 ||22 + ||xi2 − x̃i2 ||22 ), (2)
θ1 2N i=1
135 where parameters θ1 = {Wj , Ŵj , bj , cj } where Wj = {Wuj , Wvj , Wnj } and
Ŵj = {Ŵuj , Ŵvj , Ŵnj }, for j = 1, 2, as shown in Fig. 2. x̃i1 and x̃i2 are outputs
of auto-encoders to reconstruct the corresponding inputs xi1 and xi2 . Eq. (2) is a
typical auto-encoder training objective, i.e., a squared error function. In the rest
1
of this paper, we ignore the average processing (like 2N in Eq. (2)) to analyze
140 the cost functions for simplicity.
Here we only choose the first term to analyze since the two terms in Eq. (2)
are completely similar. A basic auto-encoder has two main blocks called encoder
7
and decoder. Input xi1 can be encoded by a function h1 = f1 (xi1 ), which can be
written as:
hi1 = f1 (xi1 ) = s(W1 xi1 + b1 ), (3)
where x̃i1 is a reconstructed output to be close to the input xi1 . Ŵ1 ∈ Rn×m
145 is a weight matrix and c1 ∈ Rn×1 is an output layer bias vector. sl (z) = z is
an identity function (i.e., linear activation function). Minimizing this term can
update {W1 , Ŵ1 , b1 , c1 } ⊂ θ1 . Similarly, through minimizing the second term,
we can update {W2 , Ŵ2 , b2 , c2 } ⊂ θ1 . After solving Eq. (2) we fix the updated
parameters θ1 for step 2.
1
N
min Lt = (||Ai2 − Âi2 ||22 + ||Ai1 − Âi1 ||22 +
θ2 2N i=1
||Ii2 − Ii1 ||22 + ||xi2 − x̂i2 ||22 + ||xi1 − x̂i1 ||22 ), (5)
155 where parameters θ2 = {Wuj , Ŵuj , Wvj , Ŵvj , buj , bvj , cj , Haj , Hdj , baj , bdj },
for j = 1, 2. Âi2 is an aging fitting output encouraged to be equal to the target
older age feature Ai2 . In the de-aging direction, Âi1 is encouraged to be equal
to Ai1 , the target younger age feature. Ii1 and Ii2 are identity features of the
same person which are age-invariant. Here x̂i2 and x̂i1 we call them transfer
160 reconstruction outputs to approximate inputs xi1 and xi2 , respectively.
8
Aging
cost
Ha1 Ha2
bridge
A1 Â2 A2
Wv1 Wv2
x1 x2
Figure 3: Aging fitting neural network. Â2 is an aging fitting output encouraged to be equal
to the target older age term A2 . A1 is the younger age feature. We leverage bridge network
to fit this aging process.
Minimizing the first squared error term ||Ai2 − Âi2 ||22 in Eq. (5) is used to
fit aging process between the two age features Ai2 and Ai1 . We choose a single-
hidden-layer neural network to connect Ai2 and Ai1 to fit this process because
of the fact that any single-hidden-layer neural work can fit any complex smooth
function [10]. Further, we observe that aging (and de-aging) process is a highly
complex but smooth transform process. Optimizing this term in fact is to train
an aging fitting neural network separated from CAN as shown in Fig. 3, and
Ai2 can be expressed as follows:
9
where faj and fdj are defined as:
⎧
⎨ faj (z) = s(Haj z + baj )
, (8)
⎩
fdj (z) = s(Hdj z + bdj )
for j = 1, 2, where Ha1 , Hd1 ∈ Rk×q and Ha2 , Hd2 ∈ Rq×k are weight matrices.
ba1 , bd1 ∈ Rk×1 are middle layer bias vectors. We make age feature bias vectors,
bv1 and bv2 , to be adaptive, i.e., used for both encoding age feature and fitting
aging and de-aging process, so here ba2 = bv2 , bd2 = bv1 . k is the number of
bridge neurons. Thus, as shown in Fig. 2, bridge networks can be formulated
as Fa and Fd which are highly nonlinear due to the composite of two sigmoid
functions. Note the input of Fa in our model is Ai1 for aging while Fd is of the
older input Ai2 for de-aging. Now we can formulate Âi2 according to Eq. (7) as:
where Ai1 has a similar definition as Ai2 in Eq. (6). The second term has a
165 similar analysis because it only fits from an opposite direction for de-aging.
Therefore, to minimize the first two terms in Eq. (5) technically is encouraged
to extract age-related information from inputs.
The third term is to make sure the error between the two encoded identity
features of the same person Ii1 and Ii2 is small. This term is based on the
observation that facial images of the same person contain stable identity feature
that is age-invariant. Here Iij can be formulated as below:
10
bridge x̂2
Ha1 Ha2
Ŵv2 Ŵu2
cost
A1 Â2 I2
Wv1 Wu2
x1 x2
Figure 4: Transfer reconstruction neural network. Given inputs (x1 , x2 ) of one person at
different ages, we use aging fitting output Â2 combined with target identity feature Ii2 to
reconstruct the older facial image input x2 .
in Fig. 4. For the inputs (xi1 , xi2 ) of one person at different ages, our idea is to
use the aging fitting output Âi2 combined with Ii2 to reconstruct xi2 , the target
older facial image input. We call this process as transfer reconstruction. Here
x̂i2 is formulated as:
for i = 1, 2, · · ·, N , where Ŵv2 ∈ Rn×q and Ŵu2 ∈ Rn×p are weight matrices.
sl (z) is an identity function. Similarly, x̂i1 in the fifth term can be formulated
as:
x̂i1 = sl (Ŵv1 Âi1 + Ŵu1 Ii1 + c1 ). (12)
Minimizing the last two terms in Eq. (5) can make as much as useful personal
175 information concentrated on parameters θ2 . Combined with constraints brought
by other terms in Eq. (5), we simultaneously separate identity-related and age-
related information as we need. For the noises ξ1 and ξ2 , we choose to separate
them from inputs indirectly by our CAN training algorithm(see Section 3.4).
3.4. Training
180 Training CAN involves two steps as discussed above and we alternately per-
form these two training steps. We describe our training procedure in Algo-
rithm 1. In transfer step, we add constraints on identity-related and age-related
11
parameters Wuj , Wvj ⊂ Wj , Ŵuj , Ŵvj ⊂ Ŵj , buj , bvj ⊂ bj , for j = 1, 2,
to encode identity and age features. Combined with basic reconstruction step,
185 this overall training, we separate other irrelevant information in noise-related
parameters Wnj ⊂ Wj , Ŵnj ⊂ Ŵj , bnj ⊂ bj , for j = 1, 2. Therefore we indi-
rectly encode noise ξ1 and ξ2 in hidden layers. Here to solve Eq. (2) and (5), we
adopt stochastic gradient descent (SGD) using standard back-propagation [45].
12
195 recognition and retrieval.
After CAN training and dimension reduction, we need the learned identity
∗
encoder parameters Wuj , b∗uj (j = 1, 2) and trained PCA and LDA transform
matrices Mp , Ml to obtain the final age-invariant features. Concretely, given
a pair of probe and gallery facial image inputs (xp , xg ), according to Eq. (10),
corresponding age-invariant features are computed as follows:
Ip = MT T ∗ T T ∗ ∗
l Mp fu1 (xp ) = Ml Mp s(Wu1 xp + bu1 ), (13)
Ig = MT T ∗ T T ∗ ∗
l Mp fu2 (xg ) = Ml Mp s(Wu2 xg + bu2 ), (14)
T
where the superscript means a transposition of a matrix. Then we use cosine
distance to compute matching scores between Ip and Ig for age-invariant face
recognition and retrieval.
200 4. Experiment
4.1. Datasets
We evaluate our approach on three public aging face datasets: FGNET [12],
CACD [13] and CACD-VS [14]. FGNET contains 1,002 images of 82 different
people, with each one has about 13 images on average taken at different ages
205 from 0 to 69. CACD is a new large-scale dataset collected from the Internet
which consists 163,446 face images of 2,000 people with age ranged from 16 to
62. To the best of our knowledge, CACD is the largest public available face aging
dataset. Compared to CACD, FGNET has larger age gap and more younger
images, but CACD has a larger number of images and more images at other
210 ages. Fig. 5 shows the age range distribution of these two datasets.
Further, we conduct an experiment on CACD-VS, a subset of CACD, for face
verification. CACD-VS dataset contains 2,000 positive pairs and 2,000 negative
13
45
40 FGNET
CACD
35
30
Percentage
25
20
15
10
0
0-10 11-20 21-30 31-40 41-50 51-60 61+
pairs and is carefully annotated by checking both of the associated image and
surrounding web contents.
215 In our problem, all facial images are preprocessed as follows: (1) convert the
images into gray ones if they are RGB images; (2) detect the locations of the
faces in the images using Viola-Jones face detector [48] and locate the 83 land-
marks using Face++ API [49]; (3) align the images to make their eyes located
at the same horizontal positions; (4) crop the images to remove the background
220 and hair region; (5) rescale them by bicubic interpolation and reshape them into
one-dimension vector. All the data are then mapped into [0, 1] and normalized
to have zero mean.
14
Table 1: The parameters setting in our experiments.
Dataset FGNET CACD
p 2100 2800
q 600 800
CAN
r 300 400
k 500 800
PCA 400 500
Dimension Reduction
LDA 100 120
Table 2: The setting of age feature dimension p, identity feature dimension q, noise-related
feature dimension r based on different choices of the number of hidden layer neurons m.
m 1000 2000 3000 4000
p 700 1400 2100 2800
q 200 400 600 800
r 100 200 300 400
For the three datasets, we respectively set iteration epoch maxEpoch to 1,000,
500 and 800. All parameters updating use momentum of 0.9.
15
4.3.1. Evaluation metrics
In our experiment on FGNET, we use leave-one-image-out strategy with
245 rank-k identification rates for performance evaluation. Specifically, we leave
one image as the test sample and train the model by using the remaining 1,001
images from which training pairs are selected. We repeat this procedure 1,002
times and took the average as the final identification rates. Cosine similarity
is used to compute matching scores between the test example and remaining
250 images. For rank-k, we sort the matching results from top-1 to top-k for each
test example. Then we can get rank-k identification rates after averaging these
results.
16
Table 3: Rank-1 recognition rates (%) of different PCA dimensions and LDA dimensions on
FGNET.
dl 60 100 140 180 220 260 300
dp
100 79.0 − − − − − −
200 83.5 82.5 80.8 78.1 − − −
300 83.9 86.0 82.9 80.3 78.3 71.4 −
400 82.2 86.5 85.4 80.6 77.0 71.7 65.4
500 81.4 84.8 82.0 78.9 76.5 72.2 66.4
600 80.6 80.7 81.4 78.4 75.4 74.9 69.5
700 78.3 80.3 80.6 77.5 72.0 73.6 71.3
800 71.7 76.3 78.5 74.0 71.4 70.3 72.9
900 68.2 70.1 73.3 72.4 70.9 67.4 69.3
1000 63.1 66.2 69.7 70.8 67.2 65.9 65.7
and LDA dimensions of rank-1 recognition rates are given in Table 3. Here we
choose (dp , dl ) = (400, 100) as the best settings for dimension reduction and
275 testing.
17
0.65 1
0.6 0.9
Cumulative accuracy
Cumulative accuracy
0.55 0.8
0.5 0.7
0.45 0.6
0.4 0.5
(a) (b)
Figure 6: (a) Cumulative Match Characteristic(CMC) curves on FGNET with different choices
of the number of hidden layer neurons m of CAN. (b) CMC curves on FGNET with different
dimension reduction strategies.
18
Table 4: Rank-1 recognition rates of our approach compared with state-of-the-art algorithms
on FGNET.
Algorithms Recognition Rate
Park et al. (2010) [1] 37.4%
Li et al. (2011) [6] 47.5%
HFA (2013) [4] 69.0%
MEFA (2015) [5] 76.2%
CNN-baseline (2016) [32] 84.4%
LF-CNNs (2016) [32] 88.1%
Our approach 86.5%
19
Probe Images
47 22 0 10 3 9 20
Rank-1 Results
26 29 1 12 2 8 13
Ground Truth
51 26 2 15 5 11 23
Figure 7: Some failed retrieval results in FGNET. The first row shows the probe faces and
the second row shows the incorrect rank-1 retrieval results using our approach. The third row
presents the ground-truth images corresponding to the probes.
20
de-aging aging
5 10 16
de-aging aging
21 28 40
de-aging aging
7 16 21
Figure 8: Some aging and de-aging visualization results in FGNET. Each row represents the
same person. The second and fourth column show the reconstructed outputs. The first and
last column show the ground-truth images.
rank 3-5 are chosen as test sets where images taken at 2013 are used as query
images. The remaining images are split into three subsets respectively taken in
2004-2006, 2007-2009 and 2010-2012 as database images. In training, for each
340 one of the remaining 1,880 celebrities in CACD, there are about 80 images taken
at different years while the age gap is about 0-10 years. For these remaining
images, we select 20 image pairs of each person to aggregate them as training
set (37,600 pairs). Note that the use of makeup in CACD may confound the age
of an individual. In order to avoid the impact brought by this on our algorithm,
345 we carefully check the corresponding image contents and age labels to get our
training set. All training images are then used to learn PCA+LDA subspaces.
Age gap of each training image pair is constrained between 2 and 7 years.
21
0.8
HFA
CARC
0.75 GSM-1
GSM-2
Our approach
0.7
MAP
0.65
0.6
0.55
0.5
2004-2006 2007-2009 2010-2012
Figure 9: Face retrieval performance in terms of MAP of our approach compared with state-
of-the-art algorithms on CACD.
where P recision(Eic ) means the ratio of positive images in Eic . Then the MAP
of Q can be computed as:
|Q|
1
M AP (Q) = AP (qi ), (16)
|Q| i=1
22
CARC [14] and GSM-1[36] on the subset with small age gap, both our method
and GSM-2 [36] can achieve competitive performance on the subset with large
age gap. This confirms the superiority of our approach.
CACD-VS dataset contains 4,000 images pairs from 2,000 celebrities. Fol-
lowing the configuration in [14] for face verification, we split CACD-VS into ten
folds and each fold has 400 images pairs (200 positive pairs and 200 negative
pairs) from 200 celebrities. We use one fold for testing and the other nine folds
365 for training. We repeat our experiment on each of the ten folds and report av-
erage results. Concretely, for each run, we use the other nine folds (3,600 image
pairs) to train CAN and learn PCA+LDA subspaces. After we get the identity
feature for each image, cosine similarity is used to compute matching scores be-
tween pairs. The optimal classification threshold is decided by the nine training
370 folds. Performance of our method compared with state-of-the-art algorithms is
reported in Table 6.
From the results reported in Table 6, although our method significantly im-
proves verification accuracy from 85.7% to 92.3% compared with human average
performance, combining the decisions from multiple human can get a higher ac-
375 curacy of 94.2%. It proves that there is still a gap for our method to achieve
human performance. We also add two general deep face recognition methods
for comparison, Deepface [50] and DeepID2 [51]. The result of Deepface is bor-
rowed from [36]. DeepID2 model is pretrained with CACD dataset. As seen
in Table 6, our method still outperforms them. This further demonstrates the
380 specific effectiveness of CAN on face verification with aging variations.
5. Conclusions
23
Table 6: Verification accuracy on the CACD-VS dataset.
Method Accuracy
HD-LBP [52] 81.6%
HFA (2013) [4] 84.4%
CARC (2014) [14] 87.6%
Deepface (2014) [50] 85.4%
DeepID2 (2014) [51] 87.2%
DCNN+LBPH (2015) [39] 89.5%
Human, Average (2013) 85.7%
Human, Voting (2015) 94.2%
LF-CNNs (2016) [32] 98.5%
GSM (2016) [36] 89.8%
Our approach 92.3%
385 be age-invariant from one given face image. Experiments on FGNET, CACD
and CACD-VS confirm the effectiveness of our approach.
In the future, we will attempt to incorporate supervised information in CAN
and refine our networks architecture. Cross-database evaluation will be investi-
gated. We will also extend our CAN model to tackle face recognition problems
390 with other variations like expression, illumination and pose.
Acknowledgements
This work was supported in part by the National Natural Science Foundation
of China (61375038) and Applied Basic Research Programs of Sichuan Science
and Technology Department (2016JY0088).
395 References
24
[2] J.-X. Du, C.-M. Zhai, Y.-Q. Ye, Face aging simulation and recognition
based on nmf algorithm with sparseness constraints, Neurocomputing 116
400 (2013) 250–259.
[4] D. Gong, Z. Li, D. Lin, J. Liu, X. Tang, Hidden factor analysis for age
405 invariant face recognition, in: Proceedings of the IEEE International Con-
ference on Computer Vision, 2013, pp. 2872–2879.
[5] D. Gong, Z. Li, D. Tao, J. Liu, X. Li, A maximum entropy feature descriptor
for age invariant face recognition, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2015, pp. 5289–5297.
410 [6] Z. Li, U. Park, A. K. Jain, A discriminative model for age invariant face
recognition, Information Forensics and Security, IEEE Transactions on 6 (3)
(2011) 1028–1037.
25
[11] P. N. Belhumeur, J. P. Hespanha, D. J. Kriegman, Eigenfaces vs. fisher-
425 faces: Recognition using class specific linear projection, Pattern Analysis
and Machine Intelligence, IEEE Transactions on 19 (7) (1997) 711–720.
[13] B.-C. Chen, C.-S. Chen, W. H. Hsu, Cross-age reference coding for age-
invariant face recognition and retrieval, in: Computer Vision–ECCV 2014,
430 Springer, 2014, pp. 768–783.
[14] B.-C. Chen, C.-S. Chen, W. H. Hsu, Face recognition and retrieval using
cross-age reference coding with cross-age celebrity dataset, Multimedia,
IEEE Transactions on 17 (6) (2015) 804–815.
[15] A. Montillo, H. Ling, Age regression from faces using random forests, in:
435 Image Processing (ICIP), 2009 16th IEEE International Conference on,
IEEE, 2009, pp. 2465–2468.
[16] G. Guo, G. Mu, Y. Fu, T. S. Huang, Human age estimation using bio-
inspired features, in: Computer Vision and Pattern Recognition, 2009.
CVPR 2009. IEEE Conference on, IEEE, 2009, pp. 112–119.
440 [17] Y. Fu, T. S. Huang, Human age estimation with regression on discrimi-
native aging manifold, Multimedia, IEEE Transactions on 10 (4) (2008)
578–584.
[20] J. Wang, Y. Shang, G. Su, X. Lin, Age simulation for face recognition, in:
450 Pattern Recognition, 2006. ICPR 2006. 18th International Conference on,
Vol. 3, IEEE, 2006, pp. 913–916.
26
[21] N. Ramanathan, R. Chellappa, Face verification across age progression,
Image Processing, IEEE Transactions on 15 (11) (2006) 3349–3361.
460 [24] Y. H. Kwon, N. D. V. Lobo, Age classification from facial images, in:
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94.,
1994 IEEE Computer Society Conference on, IEEE, 1994, pp. 762–767.
[27] J. Suo, X. Chen, S. Shan, W. Gao, Learning long term face aging patterns
470 from partially dense aging databases, in: Computer Vision, 2009 IEEE
12th International Conference on, IEEE, 2009, pp. 622–629.
475 [29] J. Suo, S.-C. Zhu, S. Shan, X. Chen, A compositional and dynamic model
for face aging, Pattern Analysis and Machine Intelligence, IEEE Transac-
tions on 32 (3) (2010) 385–401.
27
[30] N. Tsumura, N. Ojima, K. Sato, M. Shiraishi, H. Shimizu, H. Nabeshima,
S. Akazaki, K. Hori, Y. Miyake, Image-based skin color and texture anal-
480 ysis/synthesis by extracting hemoglobin and melanin information in the
skin, ACM Transactions on Graphics (TOG) 22 (3) (2003) 770–779.
[31] J. Lu, V. E. Liong, X. Zhou, J. Zhou, Learning compact binary face de-
scriptor for face recognition, IEEE transactions on pattern analysis and
machine intelligence 37 (10) (2015) 2041–2056.
485 [32] Y. Wen, Z. Li, Y. Qiao, Age invariant deep face recognition, in: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
2016.
[33] Y. Li, G. Wang, L. Lin, H. Chang, A deep joint learning approach for
age invariant face verification, in: Computer Vision, Springer, 2015, pp.
490 296–305.
[34] S. Bianco, Large age-gap face verification by feature injection in deep net-
works, arXiv preprint arXiv:1602.06149.
[35] H. El Khiyari, H. Wechsler, et al., Face recognition across time lapse us-
ing convolutional neural networks, Journal of Information Security 7 (03)
495 (2016) 141.
[37] L. Liu, C. Xiong, H. Zhang, Z. Niu, M. Wang, S. Yan, Deep aging face ver-
ification with large gaps, Multimedia, IEEE Transactions on 18 (1) (2016)
500 64–75.
[38] J. Lu, V. E. Liong, G. Wang, P. Moulin, Joint feature learning for face
recognition, IEEE Transactions on Information Forensics and Security
10 (7) (2015) 1371–1383.
28
[39] H. Zhai, C. Liu, H. Dong, Y. Ji, Y. Guo, S. Gong, Face verification across
505 aging based on deep convolutional networks and local binary patterns, in:
Intelligence Science and Big Data Engineering. Image and Video Data En-
gineering, Springer, 2015, pp. 341–350.
510 [41] Y. Bengio, Learning deep architectures for ai, Foundations and trends
R in
515 [43] Y. Liu, X. Hou, J. Chen, C. Yang, G. Su, W. Dou, Facial expression
recognition and generation using sparse autoencoder, in: Smart Comput-
ing (SMARTCOMP), 2014 International Conference on, IEEE, 2014, pp.
125–130.
525 [46] M. A. Turk, A. P. Pentland, Face recognition using eigenfaces, in: Com-
puter Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE
Computer Society Conference on, IEEE, 1991, pp. 586–591.
29
[48] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple
features, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001.
Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1,
IEEE, 2001, pp. I–511.
540 [51] Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by
joint identification-verification, in: Advances in Neural Information Pro-
cessing Systems, 2014, pp. 1988–1996.
30