0% found this document useful (0 votes)

34 views31 pages

WANTED3

Uploaded by

sanjanashiv04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views31 pages

WANTED3

Uploaded by

sanjanashiv04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Author’s Accepted Manuscript

Age invariant face recognition and retrieval by

coupled auto-encoder networks

Chenfei Xu, Qihe Liu, Mao Ye

www.elsevier.com/locate/neucom

PII: S0925-2312(16)31172-9
DOI: http://dx.doi.org/10.1016/j.neucom.2016.10.010
Reference: NEUCOM17622
To appear in: Neurocomputing
Received date: 21 June 2016
Revised date: 3 October 2016
Accepted date: 8 October 2016
Cite this article as: Chenfei Xu, Qihe Liu and Mao Ye, Age invariant face
recognition and retrieval by coupled auto-encoder networks, Neurocomputing,
http://dx.doi.org/10.1016/j.neucom.2016.10.010
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Age Invariant Face Recognition and Retrieval by
Coupled Auto-encoder Networks

Chenfei Xu, Qihe Liu, Mao Ye∗

School of Computer Science and Engineering, University of Electronic Science and
Technology of China, Chengdu 611731, P. R. China

Abstract

Recently many promising results have been shown on face recognition related
problems. However, age-invariant face recognition and retrieval remains a chal-
lenge. Inspired by the observation that age variation is a nonlinear but smooth
transform and the ability of auto-encoder network to learn latent representa-
tions from inputs, in this paper, we propose a new neural network model called
coupled auto-encoder networks (CAN) to handle age-invariant face recognition
and retrieval problem. CAN is a couple of two auto-encoders which bridged by
two shallow neural networks used to ﬁt complex nonlinear aging and de-aging
process. We further propose a nonlinear factor analysis method to nonlinearly
decompose one given face image into three components which are identity fea-
ture, age feature and noise, where identity feature is age-invariant and can be
used for face recognition and retrieval. Experiments on three public available
face aging datasets: FGNET, CACD and CACD-VS show the eﬀectiveness of
the proposed approach.
Keywords: Face recognition, Age invariant, Auto-encoder

1. Introduction

Age-invariant face recognition and retrieval is a challenging problem on face

recognition research because one person can exhibit substantially diﬀerent ap-

∗ Corresponding author
Email address: [email protected] (Chenfei Xu, Qihe Liu, Mao Ye)

Preprint submitted to Elsevier October 18, 2016

1 5 9 11 15

3 16 21 25 31

Figure 1: Example images from FGNET. Images of the same row are of the same subject.
The number at the bottom shows the age of the image.

pearance at different ages which significantly increase the recognition difficulty.

5 And it is becoming increasingly important and has a wide application, such
as finding missing children, identifying criminals and passport verification. A
traditional method proposed in [1, 2] is to synthesis a face image to match the
image at a target age before recognition. They try to construct a 2D/3D model
to compensate for the age variation degrading the face recognition performance.
10 However, these generative models strongly depend on parameters assumptions,
accurate age labels and relatively clean training data, so they do not work well
in real-world face recognition.
To address this problem, some discriminative methods [3, 4, 5, 6, 7] are
proposed. Most of these methods attempt to design an appropriate feature
15 representation and an effective matching framework. Typically, Li et al. [6]
combined scale invariant feature transform (SIFT) [8] and multi-scale local bi-
nary pattern (MLBP) [9] as local feature representations for recognition but
this method does not consider age information. Recently, [4, 5] proposed an
approach based on factor analysis. It considers the face image feature of one
20 person can be expressed as combination of an identity-specific component and
an age-related component. In the test phase, this method computes matching
score of a given pair of images based on identity component (age-invariant).

2
However, these methods are all linear models that their expressive power is
limited and need a complex inference.
25 Motivated by the ability of auto-encoder to learn latent representations from
inputs and the observation that age variation is a nonlinear but smooth trans-
form, we propose a new neural network model called coupled auto-encoder
networks (CAN). Given a pair of images of one person, we first choose two
auto-encoders to accept these two images respectively as inputs to reconstruct
30 them. Then, we leverage two shallow neural networks as a bridge to connect
these two auto-encoders. We fit aging and de-aging process by these shallow
neural networks due to the fact that any one single-hidden-layer neural network
can fit any complex smooth function [10]. Further, a nonlinear factor analysis
method is applied to the hidden layers of CAN, in which the representation of
35 a face image is decomposed into three components: identity feature which is
age-invariant, age feature which is identity-independent and noise. In the end,
we apply PCA and LDA method [11] on the identity feature to form a more
compressed and discriminative feature as the final age-invariant representation
for face recognition and retrieval.
40 Our main contributions are: 1) A new model of age invariant face recognition
and retrieval is proposed based on a couple of auto-encoder networks. Our
approach is evaluated on three public face aging dataset, FGNET [12], CACD
[13] and CACD-VS [14]; 2) We propose a nonlinear factor analysis method to
separate identity feature from face representation. Compared with the similar
45 methods based on linear factor analysis proposed in [4, 5], our method can
obtain better identity feature.
The rest of this paper is organized as follows. Section 2 discusses related
works. Section 3 describes the proposed approaches and details the coupled
auto-encoder networks (CAN). Section 4 provides the experimental results. Sec-
50 tion 5 concludes the paper.

3
2. Related works

Most existing works on age-related face analysis problems focus on age es-
timation [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26] and age simulation
[27, 28, 29, 30, 1, 2]. The works on age-invariant recognition are limited and
55 traditional methods fall into two categories. Generative methods proposed in
[1, 2] try to construct a 2D/3D face aging pattern space to synthesis a face im-
age to match the target face image before recognition. However, these methods
strongly depend on parameters assumptions, accurate age labels and relatively
clean training data, so they do not work well in real-world face recognition.
60 Recently, some discriminative methods [3, 31, 4, 5, 6, 7] are proposed and get
good results. Ling et al. [7] use gradient oriented pyramid with support vector
machine for face verification. Li et al. [6] design a densely sampled local feature
description scheme combining scale invariant feature transform and multi-scale
local binary pattern to improve face matching accuracy. Gong et al. [4] propose
65 hidden factor analysis that tries to separate age variation from person-specific
features for face recognition, and further propose a maximum entropy feature
descriptor with identity factor analysis in [5] to improve this method. Lu et
al. [31] propose a compact binary face descriptor for face representation and
recognition. In [3], a new feature descriptor called local pattern selection (LPS)
70 is proposed for aging face recognition.
Data-driven methods based on a reference set also have been used to improve
age-invariant face recognition and retrieval. The authors in [13, 14] propose a
coding framework called Cross-Age Reference Coding (CARC) using CACD
[13], a new large-scale face aging dataset, as a reference set to encode the low
75 features of a face image into age-invariant representations.
Some deep learning models [32, 33, 34, 35, 36, 37, 38, 39] also have been pro-
posed. Wen et al. [32] propose a deep face recognition framework called latent
factor guided convolutional neural network (LF-CNN) to significantly improve
age invariant face recognition performance. With a model called latent identity
80 analysis (LIA), they extract the age invariant features. Similarly, [33, 34, 35] re-

4
spectively propose different convolutional neural network architecture to address
the age invariant face recognition problem. [36] presents a generalized similar-
ity model (GSM) and integrate it with the feature representation learning via
deep convolutional neural networks for age-invariant face recognition. In [37],
85 a deep aging face verification (DAFV) architecture is proposed including two
modules called aging pattern synthesis module and aging face verification mod-
ule. [39] combines deep convolutional neural networks with local binary pattern
histograms (DCNN+LBPH) for face verification across aging. [38] presents a
new joint feature learning (JFL) approach and stacks this model into a deep
90 architecture to exploit hierarchical information for face representation.
Auto-encoder attempts to learn hidden representations automatically from
inputs. It has been successfully applied in many computer vision problems. As
a typical unsupervised learning method, auto-encoder [40, 41] has shown its
efficiency in many face-related recognition problems. Kan et al. [42] propose
95 a stacked progressive auto-encoder for face recognition across poses. Liu et al.
[43] use a sparse auto-encoder for facial expression recognition and Zhang et al.
[44] propose an iterative stacked de-noising auto-encoder to recognize faces with
partial occlusions. Liu et al. [37] use deep aging-aware de-noising auto-encoder
for aging pattern synthesis.

100 3. Proposed approaches

In this section, we describe the proposed approaches. We ﬁrst overview the

CAN model and then detail it. Next we present our training algorithm followed
by the face matching method.

3.1. Overview
105 An overview of CAN is shown in Fig. 2. Structurally CAN is composed of
two identical auto-encoders and two single-hidden-layer neural networks as a
bi-directional bridge.
Inputs of CAN are training facial image pairs of diﬀerent persons denoted
as T = {xi1 , xi2 }(xi1 , xi2 ∈ Rn , i = 1, 2, 3, · · ·, N ) where N is the total number of

5
x̃1 bridge x̃2

Ŵn1 Ŵu1 Ŵv1 Ha1 Ha2 Ŵv2 Ŵu2 Ŵn2

ξ1 I1 A1 A2 I2 ξ2

Wn1 Wu1 Wv1 Hd2 Hd1 Wv2 Wu2 Wn2

x1 x2
De-aging

Aging

… Training Image Pairs

…
id 3 id 2 id 1 id 1 id 2 id 3
id N id N

Figure 2: The overview of CAN. CAN is composed of two identical auto-encoders and a bridge
network. Given a pair of input images (x1, x2) of one person, ﬁrst we leverage auto-encoders
to reconstruct inputs to project them into a high-dimensional feature space in hidden layers.
Second, we add constraints in the above feature space to decompose it into three components
where (I1 , I2 ) as identity features can be used as age-invariant representations for recognition
and retrieval. Note here diﬀerent id can refer to the same person. Details of CAN are described
in Section 3.2 and 3.3.

training image pairs. For one person, our goal is to encode age-invariant feature
from inputs for recognition and retrieval, and a nonlinear factor analysis model
is given as:
x = σ(I, A, ξ), (1)

where x represents inputs and σ(·) is a nonlinear function deﬁned by CAN.

The equation above means that a facial image can be decomposed into three
110 components nonlinearly, i.e., I represents identity feature which is age-invariant,
A represents age feature which is identity-independent and ξ represents noise
which could be any factors deviate from our model.
Concretely, as shown in Fig. 2, x1 and x2 represent the younger and older
facial image inputs of the same person. Ij , Aj and ξj , for j = 1, 2, respectively
115 represent the decomposed components according to Eq. (1). x̃1 and x̃2 are basic
reconstructed outputs of CAN (see Section 3.2). We call x1 -to-x2 is an aging
direction and vice versa, x2 -to-x1 is a de-aging direction. Two single-hidden-

6
layer neural networks as a bi-directional bridge are chosen to connect A1 and
A2 to fit aging and de-aging process. We limit the age gap of each training
120 image pair in T within a certain range according to different datasets. This is
used to guarantee the aging and de-aging fitting process effective. To encode
age-invariant features I1 and I2 from inputs x1 and x2 , in our model, specifically
we have two steps:

1. Basic Reconstruction: This step respectively reconstructs the facial im-

125 age inputs x1 and x2 independently by two auto-encoders to capture
as much as main factors of inputs. Inputs are projected into a high-
dimensional feature space in hidden layers.
2. Transfer: This step imposes constraints in the above feature space to non-
linearly decompose it into three feature subspaces: identity feature space
130 which is age-invariant, age feature space which is identity-independent and
a noise space.

The two steps above build our CAN model. We detail the two steps based on
their cost functions in the following two sections.

3.2. Basic reconstruction

In this step, given a pair of facial images of the same person, we respectively
reconstruct these two images by CAN. The cost function is deﬁned as:

1
N
min Lr = (||xi1 − x̃i1 ||22 + ||xi2 − x̃i2 ||22 ), (2)
θ1 2N i=1

135 where parameters θ1 = {Wj , Ŵj , bj , cj } where Wj = {Wuj , Wvj , Wnj } and
Ŵj = {Ŵuj , Ŵvj , Ŵnj }, for j = 1, 2, as shown in Fig. 2. x̃i1 and x̃i2 are outputs
of auto-encoders to reconstruct the corresponding inputs xi1 and xi2 . Eq. (2) is a
typical auto-encoder training objective, i.e., a squared error function. In the rest
1
of this paper, we ignore the average processing (like 2N in Eq. (2)) to analyze
140 the cost functions for simplicity.
Here we only choose the ﬁrst term to analyze since the two terms in Eq. (2)
are completely similar. A basic auto-encoder has two main blocks called encoder

7
and decoder. Input xi1 can be encoded by a function h1 = f1 (xi1 ), which can be
written as:
hi1 = f1 (xi1 ) = s(W1 xi1 + b1 ), (3)

for i = 1, 2, · · ·, N . W1 ∈ Rm×n is a weight matrix where m is the number

of neurons in the hidden layer and b1 ∈ Rm×1 is a hidden layer bias vector .
s(z) = (1 + e−z )−1 is a sigmoid function. hi1 is the hidden layer representation.
In the decoding stage, hi1 as the input is decoded by another function g1 to
get x̃i1 :
x̃i1 = g1 (hi1 ) = sl (Ŵ1 hi1 + c1 ), (4)

where x̃i1 is a reconstructed output to be close to the input xi1 . Ŵ1 ∈ Rn×m
145 is a weight matrix and c1 ∈ Rn×1 is an output layer bias vector. sl (z) = z is
an identity function (i.e., linear activation function). Minimizing this term can
update {W1 , Ŵ1 , b1 , c1 } ⊂ θ1 . Similarly, through minimizing the second term,
we can update {W2 , Ŵ2 , b2 , c2 } ⊂ θ1 . After solving Eq. (2) we ﬁx the updated
parameters θ1 for step 2.

150 3.3. Transfer

After performing step 1, we impose constraints in the hidden layer repre-

sentation hj to decompose it into three feature subspaces: Ij , Aj and ξj , for
j = 1, 2, as shown in Fig. 2. Below we formulate the cost function of this step
as:

1
N
min Lt = (||Ai2 − Âi2 ||22 + ||Ai1 − Âi1 ||22 +
θ2 2N i=1
||Ii2 − Ii1 ||22 + ||xi2 − x̂i2 ||22 + ||xi1 − x̂i1 ||22 ), (5)

155 where parameters θ2 = {Wuj , Ŵuj , Wvj , Ŵvj , buj , bvj , cj , Haj , Hdj , baj , bdj },
for j = 1, 2. Âi2 is an aging ﬁtting output encouraged to be equal to the target
older age feature Ai2 . In the de-aging direction, Âi1 is encouraged to be equal
to Ai1 , the target younger age feature. Ii1 and Ii2 are identity features of the
same person which are age-invariant. Here x̂i2 and x̂i1 we call them transfer
160 reconstruction outputs to approximate inputs xi1 and xi2 , respectively.

8
Aging

cost
Ha1 Ha2

bridge
A1 Â2 A2
Wv1 Wv2

x1 x2

Figure 3: Aging fitting neural network. Â2 is an aging fitting output encouraged to be equal
to the target older age term A2 . A1 is the younger age feature. We leverage bridge network
to fit this aging process.

Minimizing the first squared error term ||Ai2 − Âi2 ||22 in Eq. (5) is used to
fit aging process between the two age features Ai2 and Ai1 . We choose a single-
hidden-layer neural network to connect Ai2 and Ai1 to fit this process because
of the fact that any single-hidden-layer neural work can fit any complex smooth
function [10]. Further, we observe that aging (and de-aging) process is a highly
complex but smooth transform process. Optimizing this term in fact is to train
an aging fitting neural network separated from CAN as shown in Fig. 3, and
Ai2 can be expressed as follows:

Ai2 = fv2 (xi2 ) = s(Wv2 xi2 + bv2 ), (6)

for i = 1, 2, · · ·, N , where fv2 is a nonlinear function forced to encode age feature

from input xi2 . Wv2 ∈ Rq×n is a weight matrix where q is age feature dimension
(i.e., the number of neurons). bv2 ∈ Rq×1 is age feature bias vector.
Before continuing our analysis, we ﬁrst deﬁne two functions called aging and
de-aging function, Fa and Fd , as:
⎧
⎨ Fa (z) = fa2 (fa1 (z))
, (7)
⎩
Fd (z) = fd2 (fd1 (z))

9
where faj and fdj are deﬁned as:
⎧
⎨ faj (z) = s(Haj z + baj )
, (8)
⎩
fdj (z) = s(Hdj z + bdj )

for j = 1, 2, where Ha1 , Hd1 ∈ Rk×q and Ha2 , Hd2 ∈ Rq×k are weight matrices.
ba1 , bd1 ∈ Rk×1 are middle layer bias vectors. We make age feature bias vectors,
bv1 and bv2 , to be adaptive, i.e., used for both encoding age feature and ﬁtting
aging and de-aging process, so here ba2 = bv2 , bd2 = bv1 . k is the number of
bridge neurons. Thus, as shown in Fig. 2, bridge networks can be formulated
as Fa and Fd which are highly nonlinear due to the composite of two sigmoid
functions. Note the input of Fa in our model is Ai1 for aging while Fd is of the
older input Ai2 for de-aging. Now we can formulate Âi2 according to Eq. (7) as:

Âi2 = Fa (Ai1 ), (9)

where Ai1 has a similar definition as Ai2 in Eq. (6). The second term has a
165 similar analysis because it only fits from an opposite direction for de-aging.
Therefore, to minimize the first two terms in Eq. (5) technically is encouraged
to extract age-related information from inputs.
The third term is to make sure the error between the two encoded identity
features of the same person Ii1 and Ii2 is small. This term is based on the
observation that facial images of the same person contain stable identity feature
that is age-invariant. Here Iij can be formulated as below:

Iij = fuj (xij ) = s(Wuj xij + buj ), (10)

for i = 1, 2, · · ·, N, j = 1, 2. In the above equation, fuj is an identity encoding

function, Wuj ∈ Rp×n is a weight matrix where p is identity feature dimen-
170 sion and buj ∈ Rp×1 is identity feature bias vector. Minimizing ||Ii2 − Ii1 ||22 is
encouraged to extract common identity information from inputs of the same
person. Further we will use Ii1 and Ii2 as age-invariant representations for face
recognition and retrieval.
The fourth term is a transfer reconstruction squared error. Here we actually
train a transfer reconstruction neural network separated from CAN as shown

10
bridge x̂2

Ha1 Ha2
Ŵv2 Ŵu2
cost
A1 Â2 I2
Wv1 Wu2

x1 x2

Figure 4: Transfer reconstruction neural network. Given inputs (x1 , x2 ) of one person at
diﬀerent ages, we use aging ﬁtting output Â2 combined with target identity feature Ii2 to
reconstruct the older facial image input x2 .

in Fig. 4. For the inputs (xi1 , xi2 ) of one person at diﬀerent ages, our idea is to
use the aging ﬁtting output Âi2 combined with Ii2 to reconstruct xi2 , the target
older facial image input. We call this process as transfer reconstruction. Here
x̂i2 is formulated as:

x̂i2 = sl (Ŵv2 Âi2 + Ŵu2 Ii2 + c2 ), (11)

for i = 1, 2, · · ·, N , where Ŵv2 ∈ Rn×q and Ŵu2 ∈ Rn×p are weight matrices.
sl (z) is an identity function. Similarly, x̂i1 in the ﬁfth term can be formulated
as:
x̂i1 = sl (Ŵv1 Âi1 + Ŵu1 Ii1 + c1 ). (12)

Minimizing the last two terms in Eq. (5) can make as much as useful personal
175 information concentrated on parameters θ2 . Combined with constraints brought
by other terms in Eq. (5), we simultaneously separate identity-related and age-
related information as we need. For the noises ξ1 and ξ2 , we choose to separate
them from inputs indirectly by our CAN training algorithm(see Section 3.4).

3.4. Training

180 Training CAN involves two steps as discussed above and we alternately per-
form these two training steps. We describe our training procedure in Algo-
rithm 1. In transfer step, we add constraints on identity-related and age-related

11
parameters Wuj , Wvj ⊂ Wj , Ŵuj , Ŵvj ⊂ Ŵj , buj , bvj ⊂ bj , for j = 1, 2,
to encode identity and age features. Combined with basic reconstruction step,
185 this overall training, we separate other irrelevant information in noise-related
parameters Wnj ⊂ Wj , Ŵnj ⊂ Ŵj , bnj ⊂ bj , for j = 1, 2. Therefore we indi-
rectly encode noise ξ1 and ξ2 in hidden layers. Here to solve Eq. (2) and (5), we
adopt stochastic gradient descent (SGD) using standard back-propagation [45].

Algorithm 1 CAN Training

Input: training set T = {xi1 , xi2 }(xi1 , xi2 ∈ Rn , i = 1, 2, 3, · · ·, N ); feature di-
mension p, q, r and the number of bridge neurons k; mini-batch size m and
iteration epoch maxEpoch; learning rate α.
∗
Output: identity encoder parameters Wuj , b∗uj (j = 1, 2)
1: Set t = 1. Initialize Wj , Ŵj , Haj , Hdj (j = 1, 2) ∼ N (0, 10−4 ) and
baj , bdj , bj , cj (j = 1, 2) to be all 0.
2: repeat
3: Shuﬄe T .
4: repeat
5: Pick a mini-batch T from T without overlapping.
6: Compute Lr in Eq. (2).
7: Update parameters θ1 by solving Eq. (2).
8: Compute Lt in Eq. (5).
9: Update parameters θ2 by solving Eq. (5).
10: until T is looped over.
11: t = t + 1.
12: until maxEpoch is met.

Since our CAN training is an unsupervised learning method, the extracted

190 age-invariant features I1 and I2 in hidden layers are not discriminative, so they
can not be directly used for face recognition and retrieval. Here as same as
the strategy of [4, 5], we employ PCA [46] on extracted I1 and I2 followed
by LDA [47, 11], a supervised dimension reduction technique, to make them
more compressed and discriminative as the ﬁnal age-invariant features for face

12
195 recognition and retrieval.

3.5. Matching method

After CAN training and dimension reduction, we need the learned identity
∗
encoder parameters Wuj , b∗uj (j = 1, 2) and trained PCA and LDA transform
matrices Mp , Ml to obtain the ﬁnal age-invariant features. Concretely, given
a pair of probe and gallery facial image inputs (xp , xg ), according to Eq. (10),
corresponding age-invariant features are computed as follows:

Ip = MT T ∗ T T ∗ ∗
l Mp fu1 (xp ) = Ml Mp s(Wu1 xp + bu1 ), (13)

Ig = MT T ∗ T T ∗ ∗
l Mp fu2 (xg ) = Ml Mp s(Wu2 xg + bu2 ), (14)

T
where the superscript means a transposition of a matrix. Then we use cosine
distance to compute matching scores between Ip and Ig for age-invariant face
recognition and retrieval.

200 4. Experiment

4.1. Datasets

We evaluate our approach on three public aging face datasets: FGNET [12],
CACD [13] and CACD-VS [14]. FGNET contains 1,002 images of 82 different
people, with each one has about 13 images on average taken at different ages
205 from 0 to 69. CACD is a new large-scale dataset collected from the Internet
which consists 163,446 face images of 2,000 people with age ranged from 16 to
62. To the best of our knowledge, CACD is the largest public available face aging
dataset. Compared to CACD, FGNET has larger age gap and more younger
images, but CACD has a larger number of images and more images at other
210 ages. Fig. 5 shows the age range distribution of these two datasets.
Further, we conduct an experiment on CACD-VS, a subset of CACD, for face
verification. CACD-VS dataset contains 2,000 positive pairs and 2,000 negative

13
45

40 FGNET
CACD
35

30
Percentage
25

0
0-10 11-20 21-30 31-40 41-50 51-60 61+

Figure 5: Age range distribution (%) of FGNET and CACD.

pairs and is carefully annotated by checking both of the associated image and
surrounding web contents.
215 In our problem, all facial images are preprocessed as follows: (1) convert the
images into gray ones if they are RGB images; (2) detect the locations of the
faces in the images using Viola-Jones face detector [48] and locate the 83 land-
marks using Face++ API [49]; (3) align the images to make their eyes located
at the same horizontal positions; (4) crop the images to remove the background
220 and hair region; (5) rescale them by bicubic interpolation and reshape them into
one-dimension vector. All the data are then mapped into [0, 1] and normalized
to have zero mean.

4.2. Parameters setting

In our approach, there are several hyper-parameters to select: input di-

225 mension n, identity feature dimension p, age feature dimension q, noise-related
feature dimension r, the number of bridge neurons k in CAN and dimension of
PCA [46] and LDA [47, 11]. These hyper-parameters setting is given in Table
1. For FGNET and CACD datasets, input dimension n is 35 × 32 and in CAN
training, we choose a ﬁxed learning rate α = 0.0001, mini-batch size m to be 10
230 to perform SGD. For CACD-VS, we use the same parameters setting in CACD.

14
Table 1: The parameters setting in our experiments.
Dataset FGNET CACD
p 2100 2800
q 600 800
CAN
r 300 400
k 500 800
PCA 400 500
Dimension Reduction
LDA 100 120

Table 2: The setting of age feature dimension p, identity feature dimension q, noise-related
feature dimension r based on diﬀerent choices of the number of hidden layer neurons m.
m 1000 2000 3000 4000
p 700 1400 2100 2800
q 200 400 600 800
r 100 200 300 400

For the three datasets, we respectively set iteration epoch maxEpoch to 1,000,
500 and 800. All parameters updating use momentum of 0.9.

4.3. Experiment on FGNET dataset

FGNET is a challenging face aging dataset because it is relatively small and

235 suffers from other significant variations such as pose, illumination and expres-
sion. Some examples of them are shown in Fig. 1. Following the training and
testing split scheme in [6], we use leave-one-image-out strategy for performance
evaluation. In each cross, we use all the remaining face images choosing pairs
from them to form our training set (1,800 training image pairs for each cross
240 and different pairs can refer to the same person). We constrain age gap of each
image pair to be less than 10 years. All the training data are used to learn PCA
and LDA subspaces in each cross.

15
4.3.1. Evaluation metrics
In our experiment on FGNET, we use leave-one-image-out strategy with
245 rank-k identification rates for performance evaluation. Specifically, we leave
one image as the test sample and train the model by using the remaining 1,001
images from which training pairs are selected. We repeat this procedure 1,002
times and took the average as the final identification rates. Cosine similarity
is used to compute matching scores between the test example and remaining
250 images. For rank-k, we sort the matching results from top-1 to top-k for each
test example. Then we can get rank-k identification rates after averaging these
results.

4.3.2. Parameters exploration

There are some parameters influencing the performance of our approach:
255 the number of hidden layer neurons m of each auto-encoder in CAN where
m = p + q + r, PCA dimension dp and LDA dimension dl . We use rank-k
identification rates on FGNET to decide these parameters.
For the number of hidden layer neurons m, we run experiments from 1,000
to 4,000. For each choice of m, the parameters p, q and r are selected with
260 exhaustive research. We give them in Table 2 and keep the number of bridge
neurons k to be 500. Here we use raw age-invariant feature I1 and I2 (extracted
from hidden layers) directly for identification to choose m. Theoretically, more
hidden layer neurons means more complex encoding functions we can learn and
can catch more useful information from inputs. Fig. 6(a) shows face recognition
265 performance on FGNET of different choices of m. As we expect, the less neurons,
the worse performance. We can observe that when m = 3, 000, the performance
is slightly better than that of m = 4, 000, hence we choose m = 3, 000.
Different choices of PCA and LDA parameters are further investigated based
on the above setting with m = 3, 000 in Table 2. For PCA dimension dp , we
270 select it from 100 to 1,000 and for LDA dimension dl we select it from 60 to
300. And we run experiments to seek the best combination of them using rank-1
identification rates on FGNET as metric. Results of different PCA dimensions

16
Table 3: Rank-1 recognition rates (%) of diﬀerent PCA dimensions and LDA dimensions on
FGNET.
dl 60 100 140 180 220 260 300
dp
100 79.0 − − − − − −
200 83.5 82.5 80.8 78.1 − − −
300 83.9 86.0 82.9 80.3 78.3 71.4 −
400 82.2 86.5 85.4 80.6 77.0 71.7 65.4
500 81.4 84.8 82.0 78.9 76.5 72.2 66.4
600 80.6 80.7 81.4 78.4 75.4 74.9 69.5
700 78.3 80.3 80.6 77.5 72.0 73.6 71.3
800 71.7 76.3 78.5 74.0 71.4 70.3 72.9
900 68.2 70.1 73.3 72.4 70.9 67.4 69.3
1000 63.1 66.2 69.7 70.8 67.2 65.9 65.7

and LDA dimensions of rank-1 recognition rates are given in Table 3. Here we
choose (dp , dl ) = (400, 100) as the best settings for dimension reduction and
275 testing.

4.3.3. Eﬀects of dimension reduction strategies

We also study the performance of our approach with different dimension
reduction strategies. We test our model on FGNET with PCA only, LDA only
and PCA+LDA applied on raw age-invariant features I1 and I2 . Concretely, for
280 PCA only, we follow the above setting dp = 400, for LDA only we tune dl to 200,
and for PCA+LDA, we set (dp , dl ) = (400, 100) as above. Cumulative Match
Characteristic (CMC) curves on FGNET of different dimension reduction strate-
gies are shown in Fig. 6(b). We have several observations. First, raw feature
performs badly mainly due to lack of supervised information. Second, there are
285 significant performance improvements after LDA applied, whether or not PCA
is applied, and rank-1 identification rate based on raw features can be improved
from 38.46% to 75.25% after only LDA applied. This demonstrates the effec-
tiveness of supervised learning method combined with our unsupervised CAN
model. Finally, PCA technique can further improve our performance. There-
290 fore, we use raw feature with PCA+LDA strategy in our following experiments.

17
0.65 1

0.6 0.9
Cumulative accuracy

Cumulative accuracy
0.55 0.8

0.5 0.7

0.45 0.6

0.4 0.5

m=1000 raw feature

0.35 m=2000 0.4 raw feature+PCA
m=3000 raw feature+LDA
m=4000 raw feature+PCA+LDA
0.3 0.3
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Rank Rank

(a) (b)

Figure 6: (a) Cumulative Match Characteristic(CMC) curves on FGNET with diﬀerent choices
of the number of hidden layer neurons m of CAN. (b) CMC curves on FGNET with diﬀerent
dimension reduction strategies.

4.3.4. Comparison with state-of-the-art algorithms

We compare our approach with state-of-the-art algorithms including: (1)
a generative model to build face aging space for age-invariant face recognition
[1]; (2) a discriminative model [6]; (3) hidden factor analysis [4], a linear fac-
295 tor analysis method for face recognition; (4) a discriminative method using a
maximum entropy feature descriptor based on identity factor analysis [5]; (5) a
deep face recognition framework called latent factor guided convolutional neural
network (LF-CNN) [32]. (6) a CNN baseline model [32] with same networks as
LF-CNNs without latent identity analysis (LIA). Comparative results are shown
300 in Table 4.
Among all the compared algorithms in Table 4, it can be seen that our ap-
proach obtains competitive results. As we can see, our nonlinear factor analysis
method with CAN is superior over other linear factor analysis methods in [4, 5].
The performance of our approach is inferior to that of LF-CNNs [32] which has
305 the top performance. One possible reason is that although we leverage LDA
technique to make the identity features more discriminative, the proposed un-
supervised CAN model is still less discriminative than the supervised LF-CNNs.
Another possible reason is that our CAN adopts a shallow neural network ar-

18
Table 4: Rank-1 recognition rates of our approach compared with state-of-the-art algorithms
on FGNET.
Algorithms Recognition Rate
Park et al. (2010) [1] 37.4%
Li et al. (2011) [6] 47.5%
HFA (2013) [4] 69.0%
MEFA (2015) [5] 76.2%
CNN-baseline (2016) [32] 84.4%
LF-CNNs (2016) [32] 88.1%
Our approach 86.5%

Table 5: Performance of our approach compared with state-of-the-art algorithms on diﬀerent

age groups in FGNET.
Age group Amount CNN-baseline LF-CNNs Ours
0-4 193 51.81% 60.10% 60.27%
5-10 218 84.86% 88.53% 87.39%
11-16 201 91.04% 94.03% 92.63%
17-24 182 94.51% 97.80% 95.47%
25-69 208 99.04% 99.52% 98.01%

chitecture. Generally speaking, the performance of shallow networks is worse

310 than that of deep networks.
Further, it is desirable to investigate our approach on different age groups
in FGNET. Following the age groups setting in [32], rank-1 recognition rates
of our approach on different age groups in FGNET are given in Table 5. From
Table 5, we can see that our approach still yields good performance. This proves
315 the effectiveness of our approach.
Finally, some failed retrieval results in FGNET are given in Fig. 7. We can
see some incorrect rank-1 results are even more similar to the probe images than
the corresponding ground-truth images. On the other hand, face recognition
fails due to some other variations like illumination, expression, etc.

19
Probe Images

47 22 0 10 3 9 20

Rank-1 Results

26 29 1 12 2 8 13

Ground Truth

51 26 2 15 5 11 23

Figure 7: Some failed retrieval results in FGNET. The ﬁrst row shows the probe faces and
the second row shows the incorrect rank-1 retrieval results using our approach. The third row
presents the ground-truth images corresponding to the probes.

320 4.3.5. Eﬀects of aging and de-aging operator

We leverage transfer reconstruction neural network (see Fig. 4) to investigate
aging and de-aging operator of CAN. For aging operator, given an image pair
as input (x1 , x2 ), we replace the decomposed age feature A2 of x2 with aging
fitting output Â2 to reconstruct x2 . The goodness of the age feature from the
325 hidden layer can be shown from the output of the reconstructed result x̂2 . For
de-aging operator, the process is completely similar from an opposite direction.
Here we use CAN trained with CACD dataset (see Section 4.4) to visualize
some reconstructed results in FGNET. They are shown in Fig. 8. From the
results, we can see the reconstructions are similar to the ground-truth images.
330 This intuitively demonstrates the effectiveness of CAN to fit complex nonlinear
aging and de-aging process.

4.4. Experiment on CACD dataset

In this experiment, we conduct a face retrieval experiment on CACD [13],

the newly largest public available face aging dataset. CACD dataset includes
335 varying illumination, pose variation and makeup.
We follow the experimental settings in [13]. In CACD, 120 celebrities with

20
de-aging aging

5 10 16

de-aging aging

21 28 40

de-aging aging

7 16 21

Figure 8: Some aging and de-aging visualization results in FGNET. Each row represents the
same person. The second and fourth column show the reconstructed outputs. The ﬁrst and
last column show the ground-truth images.

rank 3-5 are chosen as test sets where images taken at 2013 are used as query
images. The remaining images are split into three subsets respectively taken in
2004-2006, 2007-2009 and 2010-2012 as database images. In training, for each
340 one of the remaining 1,880 celebrities in CACD, there are about 80 images taken
at diﬀerent years while the age gap is about 0-10 years. For these remaining
images, we select 20 image pairs of each person to aggregate them as training
set (37,600 pairs). Note that the use of makeup in CACD may confound the age
of an individual. In order to avoid the impact brought by this on our algorithm,
345 we carefully check the corresponding image contents and age labels to get our
training set. All training images are then used to learn PCA+LDA subspaces.
Age gap of each training image pair is constrained between 2 and 7 years.

4.4.1. Evaluation metrics

In our experiment on CACD, we use mean average precision (MAP) as eval-
uation metrics. Cosine distance is used to compute the similarity of two images.
Concretely, let qi ∈ Q be the query images and Q is the query database. For qi ,

21
0.8
HFA
CARC
0.75 GSM-1
GSM-2
Our approach

0.7
MAP
0.65

0.6

0.55

0.5
2004-2006 2007-2009 2010-2012

Figure 9: Face retrieval performance in terms of MAP of our approach compared with state-
of-the-art algorithms on CACD.

the positive images can be expressed as Y1 , Y2 , · · ·, Ymi . We deﬁne Eic as the

retrieval results of qi in a descending order, from the top to Yc . We ﬁrst give
average precision (AP) of qi as below:
1
mi
AP (qi ) = P recision(Eic ), (15)
mi c=1

where P recision(Eic ) means the ratio of positive images in Eic . Then the MAP
of Q can be computed as:
|Q|
1
M AP (Q) = AP (qi ), (16)
|Q| i=1

which is the average of the AP of all query images.

350 4.4.2. Comparison with state-of-the-art algorithms

We compare our approach with state-of-the-art algorithms including HFA [4],
CARC [14] and a generalized similarity model [36] (GSM-1 and GSM-2). Com-
pared with GSM-1, GSM-2 only uses more training samples. Fig. 9 reports
the comparative results. All methods in Fig. 9 are tuned to the best settings
355 according to their papers. The results in Fig. 9 show that our approach out-
performs the others in all three subsets. Note that compared with HFA [4],

22
CARC [14] and GSM-1[36] on the subset with small age gap, both our method
and GSM-2 [36] can achieve competitive performance on the subset with large
age gap. This conﬁrms the superiority of our approach.

360 4.5. Experiment on CACD-VS dataset

CACD-VS dataset contains 4,000 images pairs from 2,000 celebrities. Fol-
lowing the configuration in [14] for face verification, we split CACD-VS into ten
folds and each fold has 400 images pairs (200 positive pairs and 200 negative
pairs) from 200 celebrities. We use one fold for testing and the other nine folds
365 for training. We repeat our experiment on each of the ten folds and report av-
erage results. Concretely, for each run, we use the other nine folds (3,600 image
pairs) to train CAN and learn PCA+LDA subspaces. After we get the identity
feature for each image, cosine similarity is used to compute matching scores be-
tween pairs. The optimal classification threshold is decided by the nine training
370 folds. Performance of our method compared with state-of-the-art algorithms is
reported in Table 6.
From the results reported in Table 6, although our method significantly im-
proves verification accuracy from 85.7% to 92.3% compared with human average
performance, combining the decisions from multiple human can get a higher ac-
375 curacy of 94.2%. It proves that there is still a gap for our method to achieve
human performance. We also add two general deep face recognition methods
for comparison, Deepface [50] and DeepID2 [51]. The result of Deepface is bor-
rowed from [36]. DeepID2 model is pretrained with CACD dataset. As seen
in Table 6, our method still outperforms them. This further demonstrates the
380 specific effectiveness of CAN on face verification with aging variations.

5. Conclusions

In this paper, we propose coupled auto-encoder networks (CAN) and a non-

linear factor analysis method, to address age-invariant face recognition and re-
trieval problem. Through CAN, we can nonlinearly separate identity feature to

23
Table 6: Veriﬁcation accuracy on the CACD-VS dataset.
Method Accuracy
HD-LBP [52] 81.6%
HFA (2013) [4] 84.4%
CARC (2014) [14] 87.6%
Deepface (2014) [50] 85.4%
DeepID2 (2014) [51] 87.2%
DCNN+LBPH (2015) [39] 89.5%
Human, Average (2013) 85.7%
Human, Voting (2015) 94.2%
LF-CNNs (2016) [32] 98.5%
GSM (2016) [36] 89.8%
Our approach 92.3%

385 be age-invariant from one given face image. Experiments on FGNET, CACD
and CACD-VS confirm the effectiveness of our approach.
In the future, we will attempt to incorporate supervised information in CAN
and refine our networks architecture. Cross-database evaluation will be investi-
gated. We will also extend our CAN model to tackle face recognition problems
390 with other variations like expression, illumination and pose.

Acknowledgements

This work was supported in part by the National Natural Science Foundation
of China (61375038) and Applied Basic Research Programs of Sichuan Science
and Technology Department (2016JY0088).

395 References

[1] U. Park, Y. Tong, A. K. Jain, Age-invariant face recognition, Pattern Anal-

ysis and Machine Intelligence, IEEE Transactions on 32 (5) (2010) 947–954.

24
[2] J.-X. Du, C.-M. Zhai, Y.-Q. Ye, Face aging simulation and recognition
based on nmf algorithm with sparseness constraints, Neurocomputing 116
400 (2013) 250–259.

[3] Z. Li, D. Gong, X. Li, D. Tao, Aging face recognition: A hierarchical

learning model based on local patterns selection, IEEE Transactions on
Image Processing 25 (5) (2016) 2146–2154.

[4] D. Gong, Z. Li, D. Lin, J. Liu, X. Tang, Hidden factor analysis for age
405 invariant face recognition, in: Proceedings of the IEEE International Con-
ference on Computer Vision, 2013, pp. 2872–2879.

[5] D. Gong, Z. Li, D. Tao, J. Liu, X. Li, A maximum entropy feature descriptor
for age invariant face recognition, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2015, pp. 5289–5297.

410 [6] Z. Li, U. Park, A. K. Jain, A discriminative model for age invariant face
recognition, Information Forensics and Security, IEEE Transactions on 6 (3)
(2011) 1028–1037.

[7] H. Ling, S. Soatto, N. Ramanathan, D. W. Jacobs, Face veriﬁcation across

age progression using discriminative methods, Information Forensics and
415 Security, IEEE Transactions on 5 (1) (2010) 82–91.

[8] D. G. Lowe, Distinctive image features from scale-invariant keypoints, In-

ternational journal of computer vision 60 (2) (2004) 91–110.

[9] T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary

patterns: Application to face recognition, Pattern Analysis and Machine
420 Intelligence, IEEE Transactions on 28 (12) (2006) 2037–2041.

[10] A. R. Barron, Universal approximation bounds for superpositions of a sig-

moidal function, Information Theory, IEEE Transactions on 39 (3) (1993)
930–945.

25
[11] P. N. Belhumeur, J. P. Hespanha, D. J. Kriegman, Eigenfaces vs. ﬁsher-
425 faces: Recognition using class speciﬁc linear projection, Pattern Analysis
and Machine Intelligence, IEEE Transactions on 19 (7) (1997) 711–720.

[12] FG-NET Aging Database. http://www.fgnet.rsunit.com.

[13] B.-C. Chen, C.-S. Chen, W. H. Hsu, Cross-age reference coding for age-
invariant face recognition and retrieval, in: Computer Vision–ECCV 2014,
430 Springer, 2014, pp. 768–783.

[14] B.-C. Chen, C.-S. Chen, W. H. Hsu, Face recognition and retrieval using
cross-age reference coding with cross-age celebrity dataset, Multimedia,
IEEE Transactions on 17 (6) (2015) 804–815.

[15] A. Montillo, H. Ling, Age regression from faces using random forests, in:
435 Image Processing (ICIP), 2009 16th IEEE International Conference on,
IEEE, 2009, pp. 2465–2468.

[16] G. Guo, G. Mu, Y. Fu, T. S. Huang, Human age estimation using bio-
inspired features, in: Computer Vision and Pattern Recognition, 2009.
CVPR 2009. IEEE Conference on, IEEE, 2009, pp. 112–119.

440 [17] Y. Fu, T. S. Huang, Human age estimation with regression on discrimi-
native aging manifold, Multimedia, IEEE Transactions on 10 (4) (2008)
578–584.

[18] S. K. Zhou, B. Georgescu, X. S. Zhou, D. Comaniciu, Image based regres-

sion using boosting method, in: Computer Vision, 2005. ICCV 2005. Tenth
445 IEEE International Conference on, Vol. 1, IEEE, 2005, pp. 541–548.

[19] S. Yan, H. Wang, X. Tang, T. S. Huang, Learning auto-structured regressor

from uncertain nonnegative labels, in: Computer Vision, 2007. ICCV 2007.
IEEE 11th International Conference on, IEEE, 2007, pp. 1–8.

[20] J. Wang, Y. Shang, G. Su, X. Lin, Age simulation for face recognition, in:
450 Pattern Recognition, 2006. ICPR 2006. 18th International Conference on,
Vol. 3, IEEE, 2006, pp. 913–916.

26
[21] N. Ramanathan, R. Chellappa, Face veriﬁcation across age progression,
Image Processing, IEEE Transactions on 15 (11) (2006) 3349–3361.

[22] G. Guo, Y. Fu, C. R. Dyer, T. S. Huang, Image-based human age esti-

455 mation by manifold learning and locally adjusted robust regression, Image
Processing, IEEE Transactions on 17 (7) (2008) 1178–1188.

[23] X. Geng, Z.-H. Zhou, K. Smith-Miles, Automatic age estimation based

on facial aging patterns, Pattern Analysis and Machine Intelligence, IEEE
Transactions on 29 (12) (2007) 2234–2240.

460 [24] Y. H. Kwon, N. D. V. Lobo, Age classiﬁcation from facial images, in:
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94.,
1994 IEEE Computer Society Conference on, IEEE, 1994, pp. 762–767.

[25] A. Lanitis, C. Draganova, C. Christodoulou, Comparing diﬀerent classiﬁers

for automatic age estimation, Systems, Man, and Cybernetics, Part B:
465 Cybernetics, IEEE Transactions on 34 (1) (2004) 621–628.

[26] J. Lu, V. E. Liong, J. Zhou, Cost-sensitive local binary feature learning

for facial age estimation, IEEE Transactions on Image Processing 24 (12)
(2015) 5356–5368.

[27] J. Suo, X. Chen, S. Shan, W. Gao, Learning long term face aging patterns
470 from partially dense aging databases, in: Computer Vision, 2009 IEEE
12th International Conference on, IEEE, 2009, pp. 622–629.

[28] A. Lanitis, C. J. Taylor, T. F. Cootes, Toward automatic simulation of

aging eﬀects on face images, Pattern Analysis and Machine Intelligence,
IEEE Transactions on 24 (4) (2002) 442–455.

475 [29] J. Suo, S.-C. Zhu, S. Shan, X. Chen, A compositional and dynamic model
for face aging, Pattern Analysis and Machine Intelligence, IEEE Transac-
tions on 32 (3) (2010) 385–401.

27
[30] N. Tsumura, N. Ojima, K. Sato, M. Shiraishi, H. Shimizu, H. Nabeshima,
S. Akazaki, K. Hori, Y. Miyake, Image-based skin color and texture anal-
480 ysis/synthesis by extracting hemoglobin and melanin information in the
skin, ACM Transactions on Graphics (TOG) 22 (3) (2003) 770–779.

[31] J. Lu, V. E. Liong, X. Zhou, J. Zhou, Learning compact binary face de-
scriptor for face recognition, IEEE transactions on pattern analysis and
machine intelligence 37 (10) (2015) 2041–2056.

485 [32] Y. Wen, Z. Li, Y. Qiao, Age invariant deep face recognition, in: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
2016.

[33] Y. Li, G. Wang, L. Lin, H. Chang, A deep joint learning approach for
age invariant face veriﬁcation, in: Computer Vision, Springer, 2015, pp.
490 296–305.

[34] S. Bianco, Large age-gap face veriﬁcation by feature injection in deep net-
works, arXiv preprint arXiv:1602.06149.

[35] H. El Khiyari, H. Wechsler, et al., Face recognition across time lapse us-
ing convolutional neural networks, Journal of Information Security 7 (03)
495 (2016) 141.

[36] L. Lin, G. Wang, W. Zuo, F. Xiangchu, L. Zhang, Cross-domain visual

matching via generalized similarity measure and feature learning.

[37] L. Liu, C. Xiong, H. Zhang, Z. Niu, M. Wang, S. Yan, Deep aging face ver-
iﬁcation with large gaps, Multimedia, IEEE Transactions on 18 (1) (2016)
500 64–75.

[38] J. Lu, V. E. Liong, G. Wang, P. Moulin, Joint feature learning for face
recognition, IEEE Transactions on Information Forensics and Security
10 (7) (2015) 1371–1383.

28
[39] H. Zhai, C. Liu, H. Dong, Y. Ji, Y. Guo, S. Gong, Face veriﬁcation across
505 aging based on deep convolutional networks and local binary patterns, in:
Intelligence Science and Big Data Engineering. Image and Video Data En-
gineering, Springer, 2015, pp. 341–350.

[40] G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data

with neural networks, Science 313 (5786) (2006) 504–507.

510 [41] Y. Bengio, Learning deep architectures for ai, Foundations and trends
R in

Machine Learning 2 (1) (2009) 1–127.

[42] M. Kan, S. Shan, H. Chang, X. Chen, Stacked progressive auto-encoders

(spae) for face recognition across poses, in: Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition, 2014, pp. 1883–1890.

515 [43] Y. Liu, X. Hou, J. Chen, C. Yang, G. Su, W. Dou, Facial expression
recognition and generation using sparse autoencoder, in: Smart Comput-
ing (SMARTCOMP), 2014 International Conference on, IEEE, 2014, pp.
125–130.

[44] Y. Zhang, R. Liu, S. Zhang, M. Zhu, Occlusion-robust face recognition

520 using iterative stacked denoising autoencoder, in: Neural Information Pro-
cessing, Springer, 2013, pp. 352–359.

[45] Y. LeCun, L. Bottou, Y. Bengio, P. Haﬀner, Gradient-based learning ap-

plied to document recognition, Proceedings of the IEEE 86 (11) (1998)
2278–2324.

525 [46] M. A. Turk, A. P. Pentland, Face recognition using eigenfaces, in: Com-
puter Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE
Computer Society Conference on, IEEE, 1991, pp. 586–591.

[47] X. Wang, X. Tang, A uniﬁed framework for subspace face recognition,

Pattern Analysis and Machine Intelligence, IEEE Transactions on 26 (9)
530 (2004) 1222–1228.

29
[48] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple
features, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001.
Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1,
IEEE, 2001, pp. I–511.

535 [49] Megvii: Face++. http://www.faceplusplus.com. Accessed 2014-3-7.

[50] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, Deepface: Closing the gap to

human-level performance in face veriﬁcation, in: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2014, pp. 1701–
1708.

540 [51] Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by
joint identiﬁcation-veriﬁcation, in: Advances in Neural Information Pro-
cessing Systems, 2014, pp. 1988–1996.

[52] D. Chen, X. Cao, F. Wen, J. Sun, Blessing of dimensionality: High-

dimensional feature and its eﬃcient compression for face veriﬁcation, in:
545 Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2013, pp. 3025–3032.

Recent Generative Adversarial Approach I
No ratings yet
Recent Generative Adversarial Approach I
28 pages
Iv-Cse-B Batch 2
No ratings yet
Iv-Cse-B Batch 2
19 pages
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework and A New Benchmark
No ratings yet
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework and A New Benchmark
16 pages
Deep Learning Report
No ratings yet
Deep Learning Report
21 pages
Age ProgressionRegression by Conditional Adversari
No ratings yet
Age ProgressionRegression by Conditional Adversari
9 pages
Recent Generative Adversarial Approach in Face Aging and Dataset Review
No ratings yet
Recent Generative Adversarial Approach in Face Aging and Dataset Review
24 pages
Upload - Files - Doi 10.5455 Jjcit.71 16248593561629794456 39
No ratings yet
Upload - Files - Doi 10.5455 Jjcit.71 16248593561629794456 39
14 pages
1 s2.0 S1319157821002536 Main
No ratings yet
1 s2.0 S1319157821002536 Main
11 pages
SSRN 4043509
No ratings yet
SSRN 4043509
7 pages
Face Verification Across Age Progression Using Enhanced Convolution Neural Network
No ratings yet
Face Verification Across Age Progression Using Enhanced Convolution Neural Network
13 pages
1 s2.0 S1319157820304821 Main
No ratings yet
1 s2.0 S1319157820304821 Main
7 pages
Ioegc 12 131 12188
No ratings yet
Ioegc 12 131 12188
8 pages
Identity-Preserving Aging of Face Images Via Laten
No ratings yet
Identity-Preserving Aging of Face Images Via Laten
10 pages
1 A Discriminative Model For Age Invariant Face Recognition
No ratings yet
1 A Discriminative Model For Age Invariant Face Recognition
10 pages
Deep Learning Pre-Trained Model
No ratings yet
Deep Learning Pre-Trained Model
12 pages
10.1007@s10462 020 09855 0
No ratings yet
10.1007@s10462 020 09855 0
35 pages
Chae Mva2023
No ratings yet
Chae Mva2023
6 pages
Real Time Gender and Age Prediction Using Deep Lea
No ratings yet
Real Time Gender and Age Prediction Using Deep Lea
5 pages
Guo 2016
No ratings yet
Guo 2016
6 pages
Age Estimation From Facial Image Using Convolutional Neural NetworkCNN
No ratings yet
Age Estimation From Facial Image Using Convolutional Neural NetworkCNN
4 pages
Developing A Neural Network-Based Method For Faster Face Recognition by Training & Simulation
No ratings yet
Developing A Neural Network-Based Method For Faster Face Recognition by Training & Simulation
10 pages
Face Recognition System Using Self Organizing Feature Map and Appearance Based Approach
No ratings yet
Face Recognition System Using Self Organizing Feature Map and Appearance Based Approach
6 pages
(IJCST-V7I4P15) :rohini G.Bhaisare
No ratings yet
(IJCST-V7I4P15) :rohini G.Bhaisare
4 pages
Face Recognition Using Eigen Faces and Artificial
No ratings yet
Face Recognition Using Eigen Faces and Artificial
7 pages
Ijett V71i5p243
No ratings yet
Ijett V71i5p243
26 pages
Facefeatures
No ratings yet
Facefeatures
5 pages
Face Recognition Using Eigenfaces
No ratings yet
Face Recognition Using Eigenfaces
6 pages
Principal Component Analysis (PCA) With Back Propogation Neural Network (BPNN) For Face Recognition System
No ratings yet
Principal Component Analysis (PCA) With Back Propogation Neural Network (BPNN) For Face Recognition System
6 pages
Aging Face Recognition Report
No ratings yet
Aging Face Recognition Report
22 pages
Zhang 2017
No ratings yet
Zhang 2017
9 pages
2022 Article 1807
No ratings yet
2022 Article 1807
20 pages
Yang Learning Face Age CVPR 2018 Paper
No ratings yet
Yang Learning Face Age CVPR 2018 Paper
9 pages
Modelling The Process of Ageing
No ratings yet
Modelling The Process of Ageing
6 pages
An Efficient Face Recognition Method Based On CNN
No ratings yet
An Efficient Face Recognition Method Based On CNN
4 pages
Face Recognition Based On Convolutional Neural Network: Jiahao Zhao
No ratings yet
Face Recognition Based On Convolutional Neural Network: Jiahao Zhao
11 pages
Age and Gender Classification Using Convolutional Neural Networks
No ratings yet
Age and Gender Classification Using Convolutional Neural Networks
9 pages
A Deep Learning Based Approach For Real Time Face Recognition System
No ratings yet
A Deep Learning Based Approach For Real Time Face Recognition System
4 pages
Age Wrinkle
No ratings yet
Age Wrinkle
8 pages
Automated Attendance System With Multi-Faces Using Convolution Neural Network (CNN)
No ratings yet
Automated Attendance System With Multi-Faces Using Convolution Neural Network (CNN)
6 pages
Face Recognition Using Neural Network and Eigenvalues With Distinct Block Processing
No ratings yet
Face Recognition Using Neural Network and Eigenvalues With Distinct Block Processing
12 pages
Electronics 12 02095 1
No ratings yet
Electronics 12 02095 1
15 pages
Age Estimation
No ratings yet
Age Estimation
16 pages
Age and Gender Classification Using CNN CVPR2015
No ratings yet
Age and Gender Classification Using CNN CVPR2015
9 pages
Face Recognition Using Principal Component Analysis
No ratings yet
Face Recognition Using Principal Component Analysis
6 pages
Automatic Age and Gender Estimation Using Deep Learning and Extreme Learning Machine
No ratings yet
Automatic Age and Gender Estimation Using Deep Learning and Extreme Learning Machine
11 pages
Irjet V4i5112 PDF
No ratings yet
Irjet V4i5112 PDF
5 pages
Face Recognition Using Modified Histogram of Oriented Gradients and Convolutional Neural Networks
No ratings yet
Face Recognition Using Modified Histogram of Oriented Gradients and Convolutional Neural Networks
17 pages
Feature Extraction Based Face Recognition, Gender and Age Classification
No ratings yet
Feature Extraction Based Face Recognition, Gender and Age Classification
10 pages
Title: Real-Time Face Detection and Recognition in Video: Problem Statement
No ratings yet
Title: Real-Time Face Detection and Recognition in Video: Problem Statement
10 pages
Notes
No ratings yet
Notes
34 pages
Facial Recognition Based On Enhanced Neural Network
No ratings yet
Facial Recognition Based On Enhanced Neural Network
10 pages
Chapter-V Automotive Communication Protocols
No ratings yet
Chapter-V Automotive Communication Protocols
251 pages
Face Recognition System Using Genetic Algorithm
No ratings yet
Face Recognition System Using Genetic Algorithm
8 pages
Teoh 2021 J. Phys. Conf. Ser. 1755 012006
No ratings yet
Teoh 2021 J. Phys. Conf. Ser. 1755 012006
10 pages
A Comprehensive Survey On Face Recognition and Image Retrieval For Event-Based Applications
No ratings yet
A Comprehensive Survey On Face Recognition and Image Retrieval For Event-Based Applications
5 pages
Review of Face Recognition System Using MATLAB: Navpreet Kaur
No ratings yet
Review of Face Recognition System Using MATLAB: Navpreet Kaur
4 pages
Prakash2019 - Face Recognition
No ratings yet
Prakash2019 - Face Recognition
4 pages
Enhanced Face Recognition Algorithm Using PCA With Artificial Neural Networks
No ratings yet
Enhanced Face Recognition Algorithm Using PCA With Artificial Neural Networks
9 pages
Face Recognition Using Principal Component Analysis and Artificial Neural Network of Facial Images Datasets in Soft Computing
No ratings yet
Face Recognition Using Principal Component Analysis and Artificial Neural Network of Facial Images Datasets in Soft Computing
7 pages
3D Face Recognition Based On Deep Learning - 8816269 PDF
No ratings yet
3D Face Recognition Based On Deep Learning - 8816269 PDF
6 pages
Artificial Intelligence in Object Detection-Report
No ratings yet
Artificial Intelligence in Object Detection-Report
6 pages
Machine Vision Toolbox For MATLABr3
No ratings yet
Machine Vision Toolbox For MATLABr3
189 pages
Digital Electronics
No ratings yet
Digital Electronics
17 pages
Project Exhibition 2
No ratings yet
Project Exhibition 2
42 pages
Practical Buck Design Revised 2018
No ratings yet
Practical Buck Design Revised 2018
103 pages
A Comparison of SIFT and Harris Conner Features For Correspondence Points Matching
No ratings yet
A Comparison of SIFT and Harris Conner Features For Correspondence Points Matching
4 pages
Bag of Words
No ratings yet
Bag of Words
72 pages
Basic Image Opeartion
No ratings yet
Basic Image Opeartion
13 pages
Missing Child Identification System
No ratings yet
Missing Child Identification System
85 pages
Advanced Driver Assistant System: Zihui Liu Chen Zhu Department of Electrical Engineering
No ratings yet
Advanced Driver Assistant System: Zihui Liu Chen Zhu Department of Electrical Engineering
7 pages
20mis0102 VL2022230501859 Pe003
No ratings yet
20mis0102 VL2022230501859 Pe003
37 pages
20th European Signal Processing Conference (EUSIPCO 2012) Bucharest, Romania, August 27 - 31, 2012
No ratings yet
20th European Signal Processing Conference (EUSIPCO 2012) Bucharest, Romania, August 27 - 31, 2012
5 pages
MCC JCC PDF
No ratings yet
MCC JCC PDF
32 pages
WANTED3
No ratings yet
WANTED3
31 pages
Mur Montiel Tardos TRO15
No ratings yet
Mur Montiel Tardos TRO15
18 pages
Automatic Fruit Classification Using Random Forest Algorithm: Abstract-The Aim of This Paper Is To Develop An Effective
No ratings yet
Automatic Fruit Classification Using Random Forest Algorithm: Abstract-The Aim of This Paper Is To Develop An Effective
5 pages
Iris Recognition System Using Matlab: A Project Report
No ratings yet
Iris Recognition System Using Matlab: A Project Report
47 pages
Ai Resos
No ratings yet
Ai Resos
16 pages
WANTED4
No ratings yet
WANTED4
9 pages
Survey Paper - Training Memorablity Score
No ratings yet
Survey Paper - Training Memorablity Score
10 pages
Real Time Doors and Windows Recognition in Opencv Using
No ratings yet
Real Time Doors and Windows Recognition in Opencv Using
8 pages
Recent Advance in Content-Based Image Retrieval: A Literature Survey
No ratings yet
Recent Advance in Content-Based Image Retrieval: A Literature Survey
22 pages
Automated Bank Cheque Processing System
No ratings yet
Automated Bank Cheque Processing System
8 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
QCCE: Quality Constrained Co-Saliency Estimation For Common Object Detection
No ratings yet
QCCE: Quality Constrained Co-Saliency Estimation For Common Object Detection
4 pages
Selecting Image Pairs For SFM by Introducing Jaccard Similarity
No ratings yet
Selecting Image Pairs For SFM by Introducing Jaccard Similarity
7 pages
True Orthophoto Generation From Uav Images Implementation of
No ratings yet
True Orthophoto Generation From Uav Images Implementation of
7 pages
Collaborative Clustering Approach Based On Dempster-Shafer Theory For Bag-of-Visual-Words Codebook Generation
No ratings yet
Collaborative Clustering Approach Based On Dempster-Shafer Theory For Bag-of-Visual-Words Codebook Generation
11 pages
Jisamar - Application of Oriented Fast and Rotated Brief (Orb) and Bruteforce Hamming in Library Opencv For Classification of Plants
No ratings yet
Jisamar - Application of Oriented Fast and Rotated Brief (Orb) and Bruteforce Hamming in Library Opencv For Classification of Plants
9 pages
ThungYang ClassificationOfTrashForRecyclabilityStatus Report
No ratings yet
ThungYang ClassificationOfTrashForRecyclabilityStatus Report
6 pages
Augmented Reality For Aircraft Maintenance Training and Operations Support
No ratings yet
Augmented Reality For Aircraft Maintenance Training and Operations Support
6 pages
OpenSeqSLAM (Matlab)
No ratings yet
OpenSeqSLAM (Matlab)
7 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
From Everand
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
Fouad Sabry
No ratings yet

WANTED3

Uploaded by

WANTED3

Uploaded by

Author’s Accepted Manuscript

Age invariant face recognition and retrieval by

Chenfei Xu, Qihe Liu, Mao Ye

Chenfei Xu, Qihe Liu, Mao Ye∗

Age-invariant face recognition and retrieval is a challenging problem on face

Preprint submitted to Elsevier October 18, 2016

pearance at different ages which significantly increase the recognition difficulty.

100 3. Proposed approaches

In this section, we describe the proposed approaches. We ﬁrst overview the

Ŵn1 Ŵu1 Ŵv1 Ha1 Ha2 Ŵv2 Ŵu2 Ŵn2

Wn1 Wu1 Wv1 Hd2 Hd1 Wv2 Wu2 Wn2

… Training Image Pairs

where x represents inputs and σ(·) is a nonlinear function deﬁned by CAN.

1. Basic Reconstruction: This step respectively reconstructs the facial im-

3.2. Basic reconstruction

for i = 1, 2, · · ·, N . W1 ∈ Rm×n is a weight matrix where m is the number

150 3.3. Transfer

After performing step 1, we impose constraints in the hidden layer repre-

Ai2 = fv2 (xi2 ) = s(Wv2 xi2 + bv2 ), (6)

for i = 1, 2, · · ·, N , where fv2 is a nonlinear function forced to encode age feature

Âi2 = Fa (Ai1 ), (9)

Iij = fuj (xij ) = s(Wuj xij + buj ), (10)

for i = 1, 2, · · ·, N, j = 1, 2. In the above equation, fuj is an identity encoding

x̂i2 = sl (Ŵv2 Âi2 + Ŵu2 Ii2 + c2 ), (11)

Algorithm 1 CAN Training

Since our CAN training is an unsupervised learning method, the extracted

3.5. Matching method

Figure 5: Age range distribution (%) of FGNET and CACD.

4.2. Parameters setting

In our approach, there are several hyper-parameters to select: input di-

4.3. Experiment on FGNET dataset

FGNET is a challenging face aging dataset because it is relatively small and

4.3.2. Parameters exploration

4.3.3. Eﬀects of dimension reduction strategies

m=1000 raw feature

4.3.4. Comparison with state-of-the-art algorithms

Table 5: Performance of our approach compared with state-of-the-art algorithms on diﬀerent

chitecture. Generally speaking, the performance of shallow networks is worse

320 4.3.5. Eﬀects of aging and de-aging operator

4.4. Experiment on CACD dataset

In this experiment, we conduct a face retrieval experiment on CACD [13],

4.4.1. Evaluation metrics

the positive images can be expressed as Y1 , Y2 , · · ·, Ymi . We deﬁne Eic as the

which is the average of the AP of all query images.

350 4.4.2. Comparison with state-of-the-art algorithms

360 4.5. Experiment on CACD-VS dataset

In this paper, we propose coupled auto-encoder networks (CAN) and a non-

[1] U. Park, Y. Tong, A. K. Jain, Age-invariant face recognition, Pattern Anal-

[3] Z. Li, D. Gong, X. Li, D. Tao, Aging face recognition: A hierarchical

[7] H. Ling, S. Soatto, N. Ramanathan, D. W. Jacobs, Face veriﬁcation across

[8] D. G. Lowe, Distinctive image features from scale-invariant keypoints, In-

[9] T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary

[10] A. R. Barron, Universal approximation bounds for superpositions of a sig-

[12] FG-NET Aging Database. http://www.fgnet.rsunit.com.

[18] S. K. Zhou, B. Georgescu, X. S. Zhou, D. Comaniciu, Image based regres-

[19] S. Yan, H. Wang, X. Tang, T. S. Huang, Learning auto-structured regressor

[22] G. Guo, Y. Fu, C. R. Dyer, T. S. Huang, Image-based human age esti-

[23] X. Geng, Z.-H. Zhou, K. Smith-Miles, Automatic age estimation based

[25] A. Lanitis, C. Draganova, C. Christodoulou, Comparing diﬀerent classiﬁers

[26] J. Lu, V. E. Liong, J. Zhou, Cost-sensitive local binary feature learning

[28] A. Lanitis, C. J. Taylor, T. F. Cootes, Toward automatic simulation of

[36] L. Lin, G. Wang, W. Zuo, F. Xiangchu, L. Zhang, Cross-domain visual

[40] G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data

Machine Learning 2 (1) (2009) 1–127.

[42] M. Kan, S. Shan, H. Chang, X. Chen, Stacked progressive auto-encoders

[44] Y. Zhang, R. Liu, S. Zhang, M. Zhu, Occlusion-robust face recognition

[45] Y. LeCun, L. Bottou, Y. Bengio, P. Haﬀner, Gradient-based learning ap-

[47] X. Wang, X. Tang, A uniﬁed framework for subspace face recognition,

535 [49] Megvii: Face++. http://www.faceplusplus.com. Accessed 2014-3-7.

[50] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, Deepface: Closing the gap to

[52] D. Chen, X. Cao, F. Wen, J. Sun, Blessing of dimensionality: High-

You might also like